The table below consists of the best results for the listed computers. It is open for the further expansion. So if anybody has faster implementation of spigot algorithm for one of systems listed below then please inform the author of this page (to vol.litwr at gmail dot com) and it should be updated. The same is true for the results of other systems missed in the table.
This algorithm speed depends very much on the speed of integer division. So systems with hardware division have a big advantage. The algoritm computations are 16/32bit with unsigned integers, it gives advantages to 16/32systems.
The results are time intervals in seconds for the calculation of 100, 1000, 3000 digits. They also contain the upper limits to the number of digits for programs used. The data are sorted by time for 3000 digits.
Every program is satisfying four restrictions: 1) it measures time; 2) it uses an OS function to print digits, it prints 4 digits a time synchronously with the calculation of them; 3) it uses less than 64 KB RAM for the code and data; 4) it utilizes all available RAM below 64 KB limit to get the maximum number of calculated digits, so it is forbidden to restrict artificially the maximum number of digits.
CPU frequencies given are maximum available (known to me) for computers listed below. They are actual frequencies used during measurements.
FORMAT: Data for CPU and IO timing parts are calculated, so they are only some approximations. They are less accurate for display outputs because timings for the vertical scrolling are not taken into account. They have also some small deviations (about 5%) for tcp/ip connections because of not linear timing for the char output in this case. For some systems, the data was acquired from hardware. Approximated results are shown in blue. The red color is used in cases where the approximation is based on heuristics.
^{1}The automatic DRAM regeneration doesn't work at this clock (@1.78MHz) so it is possible that some hardware and software can't work with this speed. So, for normal system work, the maximum clock frequency is about 1.34 MHz, or 75% of that that used for the results given in the table.
^{2}This computer has only 20 KB of RAM. So the result for 3000 digits is an approximation. Theoretically the MC10 can use up to 36 KB of RAM but this computer was manufactored for too short period and a proper memory expansion was not made for it.
^{3}This system doesn't have a ROM routine to print characters on its screen, so this driver contains such a routine.
The results above allow to calculate CPU efficiency for 1 Mhz. The next table contains approximate values of efficiency reciprocals (ER). These values are calculated by multiplication of the time (CPU time only without IO) of the calculation of 3000 digits by the CPU frequency. The ER values are gotten for pispigot which uses base 16bit integer arithmetic. The best ER value for each CPU is taken. The ER value reflects the efficiency of a CPU electronics.
The next table contains details about the tested systems in the chronological order.
The next table contains the sizes of the main loops and the whole programs.

Square brackets combine part sizes that are parts of one file. ^{*1} it is a sum of Asssember + Basic code sizes. ^{*2} it is a size of the pure code without a header ^{*3} it is a size of the TAPfile ^{*4} it is a size of the assembler + C code without a loader and Clibraries ^{*5} code size + the size of multiplication lookup table ^{*6} the 6502 code may be shorter (1871+768=2639) – this makes it 12% slower ^{*7} it is a sum of Asssember + Basic + header/loader code sizes. ^{*8} it is a size of Basic text that contains embedded MLcode. 
Thanks a lot to the people who helped: bqt, ivagor, perestoronin, BigEd, tricky, MMS, Thorham, meynaf, ...
Download the latest sources and executables (v68 [177 KB]) pack.