Speed
I have many vices. One of them is collecting old computers. To me, those are the 8 bit systems that were popular in the late 70’s and early 80’s. This last week, I was able to get a nice collection of Atari 800 and 400 machines from a local craigslist entry. Those are fun machines and bring me back to learning programming for the first time. My first computers was a TRS–80 Model 1, Level 1. However, the first machine I had access to that had “real graphics” was an Atari 800 in the computer lab at my middle school. I loved playing with the graphics and remember learning all sorts of tricks to make it faster.
I got to playing with the Atari’s and typed in some Basic programs just to see the thing do its thing. I remember them being fast back in the day. Well here I am 30 some odd years later with a computer that would seem as something out of a far future world to my little self then (4 processors! 4GB of RAM! 256GB of Solid State Disk! Wireless networking to the world at 50Mbit! Megapixel display with 32bits of color depth/pixel! On my lap! At it weighs less than 2 1/2 LBS! Seriously? That’s can’t possibly be! Oh – and that’s just my laptop. Don’t forget I have a “real” computer too.) Those ATARI’s were not fast.
Being the geek that I am, I had to see how much faster we have it today. I poked around the net for a bit and found an implementation of the Sieve of Eratosthenes on this site. I entered it into my Atari and it did come in at just around 5 1/2 minutes. I had to test it against my current computers, so I downloaded a copy of Chipmunk Basic. It seemed like a fair test to compare an interpreted basic to an interpreted basic. Here’s the basic version I wrote up:
10 dim flag(8191)
15 for a = 1 to 1000
20 count = 0
30 for i = 1 to 8191
40 flag(i)=1
50 next i
60 for i = 0 to 8190
70 if flag(i+1)=0 then goto 150
80 prime = i+i+3
90 k=i+prime
100 if k > 8190 then goto 140
110 flag(k+1)=0
120 k=k+prime
130 goto 100
140 count = count + 1
150 next i
160 rem print count
170 next a
180 print a;"iterations"
The big difference between my version and the ATARI version is that I had to run my version for 1000 iterations for me to get meaningful timings. The results?
[juan:~]$ time basic t.b
1001 iterations
basic t.b 7.63s user 0.00s system 99% cpu 7.639 total
That works out to be that my laptop is about 43,000 times faster than that ATARI. On one core. Let’s see what it’s like on all cores:
[juan:~]$ for i in {1..4}
for> do
for> time basic t.b &
for> done
[2] 30012
[3] 30013
[4] 30015
[5] 30017
[juan:~]$ 1001 iterations
basic t.b 18.80s user 0.03s system 98% cpu 19.029 total
[2] done time basic t.b
[juan:~]$ 1001 iterations
basic t.b 18.76s user 0.03s system 98% cpu 19.069 total
[3] done time basic t.b
[juan:~]$ 1001 iterations
basic t.b 18.80s user 0.02s system 98% cpu 19.069 total
[5] + done time basic t.b
[juan:~]$ 1001 iterations
basic t.b 18.79s user 0.03s system 98% cpu 19.097 total
[4] + done time basic t.b
Or roughly 70,000 times faster.
But wait. That site that had the listing for the Basic version also had one for one in Action! (which was a compiled language for ATARI’s). That version ran in about 1.5 seconds according to the Analog article ( I don’t have the Action! package to verify). Well I couldn’t not measure that too. So I wrote a C version of the Sieve. It’s a very dumb version intended to match the basic one as closely as possible:
#include <stdio.h>
int sieve()
{
int flag[8192];
int i,count,k,prime;
for(i=0;i<8192;i++) {
flag[i]=1;
}
count=0;
for(i=0;i<8190;i++) {
if (flag[i]) {
prime=i+i+3;
k=i+prime;
while(k<=8190) {
flag[k]=0;
k+=prime;
}
count++;
}
}
return count;
}
int main()
{
int c,i;
for (i=0;i<=100000;i++)
c=sieve();
printf("found %d, %d times\n",c,i-1);
}
[/c]
It turns out that this compiled version is so fast that I had to run is 100,000 times to get measurable results:
[juan:~]$ gcc -O4 t.c -o t
[juan:~]$ time ./t
found 1899, 100000 times
./t 4.17s user 0.00s system 99% cpu 4.177 total
And to do it on all the cores:
[juan:~]$ for i in {1..4}
do
time ./t &
done
[2] 30449
[3] 30451
[4] 30452
[5] 30454
[juan:~]$ found 1899, 100000 times
./t 7.44s user 0.01s system 95% cpu 7.832 total
[5] + exit 25 time ./t
[juan:~]$ found 1899, 100000 times
./t 7.51s user 0.01s system 95% cpu 7.850 total
[3] - exit 25 time ./t
[juan:~]$ found 1899, 100000 times
./t 7.46s user 0.01s system 94% cpu 7.891 total
[2] - exit 25 time ./t
[juan:~]$ found 1899, 100000 times
./t 7.49s user 0.01s system 94% cpu 7.897 total
[4] + exit 25 time ./t
Or roughly about 80,000 times faster.
All that on my laptop while I’m sitting in bed. Running on batteries.
The future is cool.