The number π calculation by spigot algorithm benchmark

The table below consists of the best results for the listed computers. It is open for the further expansion. So if anybody has faster implementation of spigot algorithm for one of systems listed below then please inform the author of this page (to vol.litwr at gmail dot com) and it should be updated. The same is true for the results of other systems missed in the table.

This algorithm speed depends very much on the speed of integer division. So systems with hardware division have a big advantage. The algoritm computations are 16/32-bit with unsigned integers, it gives advantages to 16/32-systems.

The results are time intervals in seconds for the calculation of 100, 1000, 3000 digits. They also contain the upper limits to the number of digits for programs used. The data are sorted by time for 3000 digits.

Every program is satisfying four restrictions: 1) it measures time; 2) it uses an OS function to print digits, it prints 4 digits a time synchronously with the calculation of them; 3) it uses less than 64 KB RAM for the code and data; 4) it utilizes all available RAM below 64 KB limit to get the maximum number of calculated digits, so it is forbidden to restrict artificially the maximum number of digits.

It is guaranteed that all drivers in use are the fastest, different versions of the drivers have the same speed but they may be different in size and minor features.

CPU frequencies given are maximum available (known to me) for computers listed below. They are actual frequencies used during measurements.

FORMAT: Data for CPU and IO timing parts are calculated, so they are only some approximations. They are less accurate for display outputs because timings for the vertical scrolling are not taken into account. They have also some small deviations (about 5%) for tcp/ip connections because of not linear timing for the char output in this case. For some systems, the data was acquired from hardware. Approximated results are shown in blue. The red color is used in cases where the approximation is based on heuristics.


1The automatic DRAM regeneration doesn't work at this clock (@1.78MHz) so it is possible that some hardware and software can't work with this speed. So, for normal system work, the maximum clock frequency is about 1.34 MHz, or 75% of that that used for the results given in the table.

2This computer has only 20 KB of RAM. So the result for 3000 digits is an approximation. Theoretically the MC-10 can use up to 36 KB of RAM but this computer was manufactored for a too short period and a proper memory expansion was not made for it.

3This system doesn't have a ROM routine to print characters on its screen, so this driver contains such a routine.

4The PC-98 standart system timer accuracy is only 1s. So tenths and hundredths of second are approximations.

The results above allow to calculate CPU efficiency for 1 Mhz. The next table contains approximate values of efficiency reciprocals (ER). These values are calculated by multiplication of the time (CPU time only without IO) of the calculation of 3000 digits by the CPU frequency. The ER values are gotten for pi-spigot which uses base 16-bit integer arithmetic. The best ER value for each CPU is taken. The ER value reflects the efficiency of a CPU electronics.

The next table contains details about the tested systems in a chronological order.


The next table contains the sizes of the main loops and the whole programs.
ProgramCPUMain loopTotal
vax-bsd-1VAX34816*4
vax-vms-2VAX34915*2
ibm370cms-1IBM/37036750
bbc-pandora-23201637992
bbc-panos-232016371044
mac601-1PPC601521840+192=2032*4
msdos-5808654635—640
ibmpc-8808654678
atarist30-106803054722
amiga1200-136802054812
mac20-668020541315
atarist-106800056728
amiga-136800056816
ql-36800856366+192=558*1
mac-668000561311
macppc-1PowerPC561840+196=2036*4
pc386-98038657665
geneve-mdos-1TMS9995621001*2
ti99-4a-2eaTMS990062520+626=1146*1
dragon-6309-4630997281+496=777*1
pdp11-unix7-eis-1PDP-11/EIS170636*4
rt11-eis-12ofPDP-11/EIS170869*2
dragon-66809303810+484=1294*1
mc10-168033382516*8
arm2-2ARM25161475
arm1-4ARM15401680
bk11-7PDP-116461446
bk10-7PDP-116461446
pdp11-unix7-noeis-1PDP-117341258*4
rt11-noeis-7PDP-117341476*2
scpu64-765816250+768=1018*5[1764+228+2]=1994*7
apple2gs-165816250+768=1018*51906+416=2322*1
bbc-z80-3Z80748+768=1516*51792+292=2084*1
msx-3Z80748+768=1516*51799+460=2265*1
cpm22bios-7Z80748+768=1516*52270—2283
trsdos-m1-1Z80748+768=1516*52284
trsdos-m3-1Z80748+768=1516*52286
cpc-cpm3-10Z80748+768=1516*52287
trsdos-m4-1Z80748+768=1516*52299
cpc-5Z80748+768=1516*51804+579=2383*1
cpm22bdos-7Z80748+768=1516*52266—2548
zx-1Z80748+768=1516*54666*3
abc800-1Z80748+768=1516*51804+579=4684*8
tandy100-18085992+768=1760*52054+378=2432*1
cpm22bios-8080-280801285+768=2053*52804
vic20-365022353+768=3121*5*6[3497+264+2]=3773*7
pet-165022353+768=3121*5*6[3498+263+2]=3773*7
plus4-1365022353+768=3121*5*6[3501+526+2]=4029*7
c64-1365022353+768=3121*5*6[3668+359+2]=4029*7
c128-1265022353+768=3121*5*6[3668+359+2]=4029*7
cbm2-165092353+768=3121*5*6[3611+421+2]=4034*7
bbc-865022353+768=3121*5*6[3484]+461+[177]=4122*7
apple2e-265022353+768=3121*5*63731+437=4168*1
atari800xl-665022353+768=3121*5*63515+[1291+177]=4983*7
apple2c-65c02-265C022866+768=3634*5*64234+432=4666*1
Square brackets combine part sizes that are parts of one file.
*1 it is a sum of Asssember + Basic code sizes.
*2 it is a size of the pure code without a header
*3 it is a size of the TAP-file
*4 it is a size of the assembler + C code without a loader and C-libraries
*5 code size + the size of multiplication lookup table
*6 the 6502 code may be shorter (1871+768=2639) – this makes it 1-2% slower
*7 it is a sum of Asssember + Basic + header/loader code sizes.
*8 it is a size of Basic text that contains embedded ML-code.

Thanks a lot to the people who helped: bqt, ivagor, perestoronin, BigEd, tricky, MMS, Thorham, meynaf, saimo, Don_Adan, mizapf, modrobert, a/b, ...


Download the latest sources and executables (v75 [183 KB]) pack.

Sources Archive