The number π calculation by spigot algorithm benchmark

The table below consists of the best results for the listed computers. It is open for the further expansion. So if anybody has faster implementation of spigot algorithm for one of systems listed below then please inform the author of this page (to vol.litwr at gmail dot com) and it should be updated. The same is true for the results of other systems missed in the table.

This algorithm speed depends very much on the speed of integer division. So systems with hardware division have a big advantage. The algoritm computations are 16/32-bit with unsigned integers, it gives advantages to 16/32-systems.

The results are time intervals in seconds for the calculation of 100, 1000, 3000 digits. They also contain the upper limits to the number of digits for programs used. The data are sorted by time for 3000 digits.

Every program is satisfying four restrictions: 1) it measures time; 2) it uses an OS function to print digits, it prints 4 digits a time synchronously with the calculation of them; 3) it uses less than 64 KB RAM for the code and data; 4) it utilizes all available RAM below 64 KB limit to get the maximum number of calculated digits, so it is forbidden to restrict artificially the maximum number of digits.

It is guaranteed that all drivers in use are the fastest, different versions of the drivers have the same speed but they may be different in size and minor features.

CPU frequencies given are maximum available (known to me) for computers listed below. They are actual frequencies used during measurements.

FORMAT: Data for CPU and IO timing parts are calculated, so they are only some approximations. They are less accurate for display outputs because timings for the vertical scrolling are not taken into account. They have also some small deviations (about 5%) for tcp/ip connections because of not linear timing for the char output in this case. For some systems, the data was acquired from hardware. Approximated results are shown in blue. The red color is used in cases where the approximation is based on heuristics.

¹The automatic DRAM regeneration doesn't work at this clock (@1.78MHz) so it is possible that some hardware and software can't work with this speed. So, for normal system work, the maximum clock frequency is about 1.34 MHz, or 75% of that that used for the results given in the table.

²This computer has only 20 KB of RAM. So the result for 3000 digits is an approximation. Theoretically the MC-10 can use up to 36 KB of RAM but this computer was manufactured for a too short period and a proper memory expansion was not made for it.

³This system doesn't have a ROM routine to print characters on its screen, so this driver contains such a routine.

⁴The PC-98 standart system timer accuracy is only 1s. So tenths and hundredths of second are approximations.

The results above allow to calculate CPU efficiency for 1 Mhz. The next table contains approximate values of efficiency reciprocals (ER). These values are calculated by multiplication of the time (CPU time only without IO) of the calculation of 3000 digits by the CPU frequency. The ER values are gotten for pi-spigot which uses base 16-bit integer arithmetic. The best ER value for each CPU is taken. The ER value reflects the efficiency of a CPU electronics.

The next table contains details about the tested systems in a chronological order.

The next table contains the sizes of the main loops and the whole programs.

Program CPU Main loop Total
vax-bsd-1 VAX 34 816^*4
vax-vms-2 VAX 34 915^*2
ibm370cms-1 IBM/370 36 750
bbc-pandora-2 32016 37 992
bbc-panos-2 32016 37 1044
mac601-1 PPC601 52 1840+192=2032^*4
msdos-5 8086 54 635—640
ibmpc-8 8086 54 678
atarist30-10 68030 54 722
amiga1200-13 68020 54 812
mac20-6 68020 54 1315
atarist-10 68000 56 728
amiga-13 68000 56 816
ql-3 68008 56 366+192=558^*1
mac-6 68000 56 1311
macppc-1 PowerPC 56 1840+196=2036^*4
pc386-9 80386 57 665
geneve-mdos-1 TMS9995 62 1001^*2
ti99-4a-2ea TMS9900 62 520+626=1146^*1
dragon-6309-5 6309 97 281+581=862^*1
pdp11-unix7-eis-12of PDP-11/EIS 170 628+400=1028^*4
rt11-eis-13of PDP-11/EIS 170 869^*2
rsx11-eis-2of PDP-11/EIS 170 1109^*2
dragon-7 6809 397 1051+484=1535^*1
bk11-bos-9 PDP-11 370 1586
bk10-9 PDP-11 370 1616
bk11-9 PDP-11 370 1652
pdp11-unix7-noeis-8 PDP-11 378 1704+400=2104^*4
rt11-noeis-11 PDP-11 378 2212^*2
rsx11-noeis-2 PDP-11 378 2416^*2
mc10-2 6803 448 3135^*8
arm2-2 ARM2 516 1475
arm1-4 ARM1 540 1680
scpu64-7 65816 250+768=1018^*5 [1764+228+2]=1994^*7
apple2gs-1 65816 250+768=1018^*5 1906+416=2322^*1
msx-9 Z80 725+768=1493^*5 1799+452=2251^*1
abc800-4 Z80 725+768=1493^*5 4431^*8
zx-5 Z80 725+768=1493^*5 4666^*3
bbc-z80-6 Z80 1078+768=1846^*5 2048+289=2337^*1
cpm22bios-18 Z80 1078+768=1846^*5 2534—2570
trsdos-m1-4 Z80 1078+768=1848^*5 2544
trsdos-m3-4 Z80 1078+768=1848^*5 2546
cpc-cpm3-14 Z80 1078+768=1846^*5 2544
trsdos-m4-4 Z80 1078+768=1848^*5 2559
cpc-11 Z80 1078+768=1848^*5 2060+575=2635^*1
c128-z80-3 Z80 1078+768=1848^*5 [2304+511+2]=2817^*7
cpm22bdos-18 Z80 1078+768=1848^*5 2541—2807
tandy100-4 8085 1425+768=2193^*5 2469+415=2884^*1
cpm22bios-8080-7 8080 1832+768=2600^*5 3320–3325
vic20-4 6502 1847+768=2615^*5*6 [3015+264+2]=3281^*7
pet-3 6502 1847+768=2615^*5*6 [3016+263+2]=3281^*7
bbc-9 6502 1847+768=2615^*5*6 [2978]+462+[177]=3617^*7
atari800xl-7 6502 1847+768=2615^*5*6 3189+[1326+177]=4692^*7
plus4-15 6502 2353+768=3121^*5*6 [3505+526+2]=4033^*7
c64-15 6502 2353+768=3121^*5*6 [3717+570+2]=4289^*7
c128-15 6502 2353+768=3121^*5*6 [3731+556+2]=4289^*7
cbm2-2 6509 2353+768=3121^*5*6 [3622+423+2]=4047^*7
apple2e-4 6502 2353+768=3121^*5*6 3735+437=4272^*1
bbc-65c02-9 6502 2866+768=3634^*5*6 [3603]+463+[177]=4243^*7
apple2c-65c02-4 65C02 2866+768=3634^*5*6 4243+432=4675^*1
Square brackets combine part sizes that are parts of one file.
^*1 it is a sum of Asssember + Basic code sizes.
^*2 it is a size of the pure code without a header
^*3 it is a size of the TAP-file
^*4 it is a size of the assembler + C code without a loader and C-libraries
^*5 code size + the size of multiplication lookup table
^*6 the 6502 code may be shorter (1871+768=2639) – this makes it 1-2% slower
^*7 it is a sum of Asssember + Basic + header/loader code sizes.
^*8 it is a size of Basic text that contains embedded ML-code.

Thanks a lot to the people who helped: bqt, ivagor, perestoronin, blackmirror, BigEd, tricky, MMS, Thorham, meynaf, saimo, Don_Adan, mizapf, modrobert, a/b, ...

Download the sources and executables (v77 [196 KB]) pack.

Sources Archive