Numbers-to-words benchmarks on various systems

Contents

Earlier I wrote about stress testing GNU dbm and Berkeley DB databases, and of using a program to generate the required millions of records. For keys the program outputs numbers and for data writes out the numbers as English words:

Number (key) In English (value)
o zero
1 one
2 two
3 three
(over 123 million rows not shown)
123456788 one hundred twenty-three million, four hundred fifty-six thousand, seven hundred eighty-eight
123456789 one hundred twenty-three million, four hundred fifty-six thousand, seven hundred eighty-nine
123456790 one hundred twenty-three million, four hundred fifty-six thousand, seven hundred ninety
(and so on)

I wrote the original numbers-to-words program in awk. When running it to feed records to Berkeley DB I thought it could be a limiting factor to the test; that is, Berkeley DB could possibly insert more records per second than the awk program could generate.

So I re-wrote it in C. It was, unsurprisingly, a lot faster, by 8½ times. As it turned out the awk program wasn’t the limiting factor: by itself the program could generate 180,000 records per second, but my bdb-tool program could insert only 144,000 per second.

Being curious, I followed up the C program with a Perl program, just to see how it would compare. Perl was about 27% faster than awk.

Then I got carried away. Over the course of the past week I’ve implemented the alogorithm in a bunch of other languages to see how well they performed.

Results for various implementations and computers

The results and my comments are summarized below.

Raw numbers: thousands of lines generated per second

In this table, the “baseline” number is based on timing the Perl program when generating 3 million numbers, then using that time to estimate how many numbers the program can produce in about 30 seconds, rounded to the nearset 100,000.

Most of the numbers below are k/s (thousands of numbers per second), with the exception of a couple suffixed with “/s” (numbers per second.)

Computer Baseline (n/30s) C Awk Java Lua Perl PHP Python Ruby Tcl Bash
raven 13,300,000 5981 377 345 272 434 768 293 279 97 10.2
penguin 9,300,000 3757 201 242 208 303 411 145 4.9
sparrow (F23) 7,100,000 2863 349 144 169 243 403 112 134 67 5.1
sparrow (F31) 7,100,000 2840 311 137 164 287 362 116 158 68 6.4
duck 5,400,000 2385 128 122 116 174 290 83 3.8
heron 4,200,000 1785 97 106 101 151 72 3.4
dkb 4,200,000 1264 135 59 137 70 89 2.9
auk 3,700,000 1521 95 91 131 56 2.7
netvista 1,900,000 653 44 35 40 73 88 27 33 19 1.1
oldmary 1,200,000 537 21 20 31 49 33 21 12 13 612/s
Raspberry Pi 3 600,000 214 39 12 20 26 11 14 6.7 519/s
Raspberry Pi 1 200,000 166 23 7.7 11 4.0 3.9 170/s

Relative performance (Perl=100)

This table shows how fast each implementation ran relative to the Perl program. If the score is greater than 100 it ran faster; less than 100 it ran more slowly.

Computer C Awk Java Lua Perl PHP Python Ruby Tcl Bash
raven 1378 87 79 63 100 177 68 64 22 2
penguin 1240 66 80 69 100 136 48 2
sparrow (F23) 1178 144 59 70 100 166 46 55 28 2
sparrow (F31) 990 108 48 57 100 126 40 55 24 2
duck 1371 74 70 67 100 167 48 2
heron 1182 64 70 67 100 48 2
dkb 923 99 43 100 51 65 2
auk 1161 73 69 100 43 2
netvista 895 60 48 55 100 121 37 45 26 2
oldmary 1096 43 41 63 100 67 43 24 27 2
Raspberry Pi 3 823 150 46 77 100 42 54 26 2
Raspberry Pi 1 1509 209 70 100 36 35 2

Running the tests

A typical test is run like this:

RECORDS=5000000; time ./numbers-to-words.pl $RECORDS >/dev/null

Each implementation can be passed the number of records to generate on the command line. The output is sent to /dev/null because we’re interested only in how many numbers can be generated every second and not the actual output (which is actually really boring to read!)

Of course, I wasn’t content just to run the programs myself on the command line, primarily because I wanted to run them on almost a dozen different computers. So I put together a fancy shell script.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
#!/bin/bash
LANGUAGE_LIST='C Awk Java Lua Perl PHP Python Ruby Tcl Bash'
TRIAL_COUNT=5
WITH_PIPEVIEW=false     # Don't use 'true': it imparts a penalty that varies
                        # with the language being tested
ESC=$'\e'
LOG_FN="benchmark.log"

B_START_TIME=$SECONDS
echo "$(date +'%T  %F') Benchmmark started (WITH_PIPEVIEW=$WITH_PIPEVIEW)" >>$LOG_FN

# First determine a value that Perl can use to run for about 30 seconds.
# Benchmark 3 million to start
if [ -f .benchmark-baseline ]
then
    BASELINE="$(< .benchmark-baseline)"
    echo "$(date +%T)  Baseline set to $BASELINE/30s from .benchmark-baseline" | tee -a $LOG_FN
else
    echo "Determining baseline (3 million numbers):"
    START_TIME=$(date +%s.%N)
    RECORDS=3000000; ./numbers-to-words.pl $RECORDS | pv -ls$RECORDS >/dev/null
    END_TIME=$(date +%s.%N)
    ELAPSED=$(echo "scale=2; $END_TIME/1-$START_TIME/1" | bc)
    BASELINE=$(echo "90000000/$ELAPSED/100000*100000" | bc)
    echo -n "$ESC[2A$ESC[42C"
    echo "${ELAPSED} seconds"
    echo "$ESC[KPerl program can generate approximately $BASELINE numbers in 30 seconds"
    #BASELINE=$((30000000/ELAPSED/100000*100000))
    #echo "Perl program can generate approximately $RECORDS numbers in 10 seconds"
    echo -n "$ESC[K"
    echo "$(date +%T)  Baseline 3,000,000 records in $ELAPSED seconds: $BASELINE/30s" >>$LOG_FN
    echo $BASELINE >.benchmark-baseline
fi
echo

# Test the languages
for P_LANG in $LANGUAGE_LIST
do
    # Determine how to run the program
    INTERPRETER="${P_LANG,,}"
    [ "${P_LANG}" == 'Tcl' ] && INTERPRETER='tclsh'
    [ "${P_LANG}" == 'C' ]   && INTERPRETER='gcc'
    if ! which $INTERPRETER &>/dev/null
    then
        echo "$P_LANG: cannot locate interpreter or compiler '$INTERPRETER'" | tee -a $LOG_FN
        continue
    fi

    # Determine the file extension for the language's program file
    PROGRAM="numbers-to-words"
    RECORDS=$BASELINE
    EXT=".$INTERPRETER"
    [ "$P_LANG" == 'Bash' ] && EXT='.sh' RECORDS=$((BASELINE/50))
    [ "$P_LANG" == 'C' ] && EXT='.c' RECORDS=$((BASELINE*10))
    [ "$P_LANG" == 'Java' ] && EXT='.class'
    [ "$P_LANG" == 'Perl' ] && EXT='.pl'
    [ "$P_LANG" == 'Python' ] && EXT='.py'
    [ "$P_LANG" == 'Ruby' ] && EXT='.rb'
    [ "$P_LANG" == 'Tcl' ] && EXT='.tcl' RECORDS=$((BASELINE/5))
    if [ ! "$EXT" ]; then
        echo "Missing file extension for $P_LANG"
        continue
    fi

    # Some languages need a bit of help
    AWK_PARAM=''
    if [ $P_LANG == 'Awk' ]; then
        INTERPRETER='awk -f'
        which mawk &>/dev/null && INTERPRETER='mawk -f'
        AWK_PARAM='-v max='
    fi
    if [ "${P_LANG}" == 'C' ]; then
        gcc -O3 -o numbers-to-words numbers-to-words.c
        INTERPRETER='./'
        EXT=''
    fi
    if [ "${P_LANG}" == 'Java' ]; then
        which javac &>/dev/null && javac numbers-to-words.java
        PROGRAM='numbersToWords'
        EXT=''  # java assumes .class extension
    fi
    [ "${P_LANG}" == 'PHP' -a -x /opt/php/bin/php ] && INTERPRETER='/opt/php/bin/php'
    $[ "$INTERPRETER" == './' ] || INTERPRETER="$INTERPRETER "

    # Run a quick test run to ensure the program works
    RUN="${INTERPRETER}${PROGRAM}${EXT} ${AWK_PARAM}$RECORDS"
    echo -e "\n$(date +%T)  $P_LANG ($RUN):" >>$LOG_FN
    eval "${RUN/$RECORDS/1} >/tmp/numbers-to-words.tmp"
    if ! grep -q '1 "one"' /tmp/numbers-to-words.tmp
    then
        echo "Error running \"$RUN\" -- skipping" | tee -a $LOG_FN
        rm -f /tmp/numbers-to-words.tmp
        continue
    fi
    rm -f /tmp/numbers-to-words.tmp

    # Run five trials
    TOTAL_SEC=0
    for TRIAL in $(seq $TRIAL_COUNT)
    do
        echo "$P_LANG trial $TRIAL ($RUN)"
        TIMEFORMAT="$ESC[$((16*(TRIAL-1)))C%2R seconds$ESC[80D$ESC[A$ESC[K$ESC[2A"
        START_TIME=$(date +%s.%N)
        if $WITH_PIPEVIEW
        then
            time $RUN | pv -ls$RECORDS >/dev/null
        else
            echo " ($(date +%T) Running; please wait)"
            time $RUN >/dev/null
        fi
        END_TIME=$(date +%s.%N)
        ELAPSED=$(echo "scale=2; $END_TIME/1-$START_TIME/1" | bc)
        R_S="$(echo "scale=2; $RECORDS/$ELAPSED" | bc)"
        echo "$(date +%T)    Trial $TRIAL: $START_TIME to $END_TIME, elapsed=$ELAPSED, $R_S records/second" >>$LOG_FN
        TOTAL_SEC=$(echo "scale=2; $TOTAL_SEC+$ELAPSED" | bc)
    done

    # Analyse the results and display
    [ "$TOTAL_SEC" == 0 ] && TOTAL_SEC=-1
    TOTAL_NUM=$((RECORDS * TRIAL))
    echo -n "$ESC[K$P_LANG:$ESC[10D$ESC[9C"
    if [ $TOTAL_NUM -lt 1000000 ]
    then
        L=${#TOTAL_NUM} P=$((L-3))
        [ $L -lt 4 ] && T=$TOTAL_NUM || T="${TOTAL_NUM:0:$P},${TOTAL_NUM:$P}"
        echo -n "$T numbers in $TOTAL_SEC seconds, "
    else
        T="$(echo "scale=1; $TOTAL_NUM/1000000" | bc) million"
        echo -n "${T/.0/} numbers in $TOTAL_SEC seconds, "
    fi
    echo "average $(echo "$TOTAL_NUM/$TOTAL_SEC/1000" | bc)k/sec"
    echo -n "$ESC[B$ESC[K$ESC[A"
    R_S="$(echo "$TOTAL_NUM/$TOTAL_SEC" | bc)"
    echo -e "$(date +%T)  ${T/.0/} numbers in $TOTAL_SEC seconds, average $R_S records/second" >>$LOG_FN
done

echo -e "\n$(date +%T)  Benchmark ended; running time $((SECONDS-B_START_TIME)) seconds" >>$LOG_FN
echo -e "________________________________________________________________________________\n" >>$LOG_FN

Perl: the reference implementation

Although I wrote the original program in awk, Perl is my “go to” language for pretty much everything beyond what I consider to be “trivial.” Therefore the reference implementation for my numbers to words program is Perl.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#!/usr/bin/perl -w
# Translate numbers to English words. For example:
#  1234567: one million, two hundred thirty-four thousand, five hundred sixty-seven
# Default is to generate numbers from 0 to 1,000,000. The last number can be
# passed as a parameter to the program.
use strict;
use integer;

my @number = qw(
    zero      one      two      three   four  five
    six       seven    eight    nine    ten   eleven
    twelve    thirteen fourteen fifteen sixteen
    seventeen eighteen nineteen
);
my @tens = qw(x x twenty thirty forty fifty sixty seventy eighty ninety hundred);
my @multiplier = qw(x thousand million billion);

# Handle zero as a special case
print "store 0 \"zero\"\n";

# Figuring out the thousands/million/billions on every number is expensive,
# so we use the approach of looping 1 .. 999; 1,000 .. 999,999; 1,000,000 ..
# 999,999,999 etc. $N tracks the number; $exponent tracks the thousands
my ($N, $exponent) = (0, 0);

# Process the full list of numbers from 1 to last_number
my $last_number = $ARGV[0] ? $ARGV[0] : 1000000;
while ($N < $last_number) {
    $exponent++;

    # Process the current "thousands" (0 .. 999, 1000 .. 9999, etc)
    my $max = $last_number < 1000**$exponent-1 ? $last_number : 1000**$exponent-1;
    while ($N++ < $max) {
        my $text;

        # Process a group (e.g. for 1,234,567, the groups are '1', '234', and '567')
        for (my $e = $exponent-1; $e >=0; $e--) {
            my $group = $N/1000**$e % 1000;
            $text .= ',' if $group;

            # Determine the hundreds
            $text .= ' ' . $number[$group/100] . ' hundred' if $group > 99;

            # Determine the words for the last two digits of the group
            my $i = $group % 100;
            $text .= ' ' . ($i >= 20 ? $tens[$i/10] . ($i%10 ? '-' . $number[$i%10] : '') : $number[$i]) if $i;

            # Add "thousand" or "million" if needed
            $text .= " $multiplier[$e]" if $e && $group;
        }
        # For speed, the generated text ends up with a preceding ", " (preventing
        # it requires a lot of checks that slow down the program.) Here we remove
        # that unwanted text before sending it to output.
        print "store $N \"", substr($text,2), "\"\n";
    }
    $N--;
}

C: the cumbersome, brittle speed demon

When it comes to this task, there’s no doubt C is the speed winner. Across all computers it generally outperformed the runner-up by a factor of 8 and then some. But that blazing speed comes at a cost: C programs are often prone to nasty bugs due to its lack of memory protection. And in my experience, writing and debugging C programs takes longer due to a more primitive standard library and crashes caused by the aforementioned memory issues.

The code here is noticeably different. Most of the programs take a left-to-right approach when building the text string. For example, the number 12,345,678 is typically handled in three groups: 12, then 345, and finally 678. But the original C program I wrote did it in reverse, staring at 678, then working with 345, and finally 12. It also builds the text string from right to left.

For most languages the left-to-right approach is a few percent faster than the reverse. However, with C the opposite is true: the fastest version I could write for the left-to-right approach was 15% slower than the right-to-left.

/* Print numbers as words */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void main(int argc, char **argv)
{
    long max = 1000000;
    char *number[] = {
        NULL,        "one",      "two",      "three",   "four", "five",
        "six",       "seven",    "eight",    "nine",    "ten", "eleven",
        "twelve",    "thirteen", "fourteen", "fifteen", "sixteen",
        "seventeen", "eighteen", "nineteen"
    };
    char *tens[] = {
        NULL, NULL, "twenty", "thirty", "forty", "fifty", "sixty",
        "seventy", "eighty", "ninety", "hundred"
    };
    char *multiplier[] = { NULL, "thousand,", "million,", "billion," };
    char text[128];
    long N;
    int Q, exponent, i, group, l, t, need_comma;

    if (argc > 1)
        max = atol(argv[1]);

    printf("0 \"zero\"\n");     /* Handle zero as a special case */

    for (N=1; N <= max; N++) {
        /* The text field is built starting at the end and going backwards */
        memset(text, ' ', 127);
        text[127] = '"';
        t = 127;

        /* Process the number in groups of three, right-to-left */
        exponent = 0;
        need_comma = 0;
        Q = N;
        while (Q) {
            group = Q % 1000;

            /* Add "thousand" or "million" if needed */
            if (exponent && group) {
                l = strlen(multiplier[exponent]);
                if (need_comma) {
                    t -= l+1;
                    memcpy(text+t, multiplier[exponent], l);
                } else {
                    t -= l-1;
                    memcpy(text+t, multiplier[exponent], l-1);
                }
                t--;
            }

            /* Determine the words for the last two digits of the group */
            i = group % 100;
            if (i) {
                if (i < 20) {
                    l = strlen(number[i]); t -= l; memcpy(text+t, number[i], l);
                } else {
                    if (i%10) {     /* i doesn't end with zero */
                        l = strlen(number[i%10]); t -= l; memcpy(text+t, number[i%10], l);
                        text[--t] = '-';
                    }
                    l = strlen(tens[i/10]); t -= l; memcpy(text+t, tens[i/10], l);
                }
                need_comma = 1;
            }

            /* Determine the hundreds */
            if (group > 99) {
                if (i) t--;     /* Add a space after "hundred" if last two digits > 00 */
                t -= 8; memcpy(text+t, " hundred", 8);
                l = strlen(number[group/100]); t -= l; memcpy(text+t, number[group/100], l);
                need_comma = 1;
            }

            /* Move to the next group of thousands */
            Q /= 1000;
            exponent++;
        }

        /* "printf("%s", string) is actually pretty expensive. You don't notice
         * it when you're printing only a few lines, but do it a hundred million
         * times and you'll see it's about 15% slower than the code presented
         * here. */
        /* printf("%li \"%s\n", N, text+t); */
        printf("%li \"", N);
        puts(text+t);
    }
}

awk: unexpectedly zippy; mawk even more so

The very first implementation of numbers-to-words was written in awk. After working on the Perl and C versions, I re-wrote the awk program to make use of what I had learned.

For such a simple language, awk is actually unexpectedly fast, performing at a respectable 55% to 70% the speed of the Perl program.

Then the Raspberry Pi gave me a surprise. The Pi 3 is usually three to four times faster than the Pi 1, but on the Pi 1 the awk program was a touch faster than the 3! Of course I had to find out why, and discovered the Pi 1 was using an implementation called mawk instead of Gnu’s gawk.

mawk is 80% faster than Gnu’s. When I installed it on sparrow to test, it outran the Perl program by nearly 50% (331k records/second vs Perl’s 225.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#!/usr/bin/awk -f
# You can pass "-v max=<number>" to override the default maximum of 1,000,000
BEGIN {
    number[0] = "zero";       number[1] = "one";       number[2] = "two"; number[3] = "three"
    number[4] = "four";       number[5] = "five";      number[6] = "six"; number[7] = "seven"
    number[8] = "eight";      number[9] = "nine";      number[10] = "ten"
    number[11] = "eleven";    number[12] = "twelve";   number[13] = "thirteen"
    number[14] = "fourteen";  number[15] = "fifteen";  number[16] = "sixteen"
    number[17] = "seventeen"; number[18] = "eighteen"; number[19] = "nineteen"

    tens[2] = "twenty"; tens[3] = "thirty";   tens[4] = "forty";  tens[5] = "fifty"
    tens[6] = "sixty";  tens[7] = "seventy";  tens[8] = "eighty";   tens[9] = "ninety"

    multiplier[1] = "thousand"; multiplier[2] = "million"; multiplier[3] = "billion"

    print "0 \"zero\""  # Handle zero as a special case

    N = 0; exponent = 0
    last_number = max ? max : 1000000
    while (N < last_number) {
        exponent++;
        max = last_number < 1000^exponent-1 ? last_number : 1000^exponent-1
        while (N++ < max) {
            text = ""
            for (e=exponent-1; e>=0; e--) {
                group = int(N/1000^e) % 1000
                if (group) { text = text "," }

                # Determine the hundreds
                if (group > 99) { text = text " " number[int(group/100)] " hundred" }

                # Determine the words for the last two digits of the group
                i = group % 100;
                if (i) {
                    text = text " " (i >= 20 ? tens[int(i/10)] (i%10 ? "-" number[i%10] : "") : number[i])
                }

                # Add "thousand" or "million" if needed
                if (e && group) { text = text " " multiplier[e] }
            }
            print N " \"" substr(text, 3) "\""
        }
        N--
    }
}

Java: the amazing slowpoke

To be honest, I was stunned to see how badly Java performed at this task. Despite having had years to tweak the language and the VM, Java was in second last place among the major languages I tried–only Python performed worse. (I’m not including Tcl or bash because to my knowledge companies don’t write multi-million line projects in those languages.)

For the Java implementation, I chose to use StringBuilder because the memory is reclaimed when the object gos out of scope after each number, reducing the burden on the garbage collector.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
/* To run:
 * javac numbers-to-words.java      # Compile into a class file
 * java numbersToWords              # Run it
 */

class numbersToWords
{
    public static void main(String argv[])
    {
        String[] number = {
            "zero",      "one",      "two",      "three",   "four",  "five",
            "six",       "seven",    "eight",    "nine",    "ten",   "eleven",
            "twelve",    "thirteen", "fourteen", "fifteen", "sixteen",
            "seventeen", "eighteen", "nineteen"
        };
        String[] tens = {
            null, null, "twenty", "thirty", "forty",  "fifty",
            "sixty",   "seventy", "eighty", "ninety"
        };
        String[] multiplier = { null, "thousand", "million", "billion" };

        int lastNumber;
        try {
            lastNumber = Integer.parseInt(argv[0]);
        } catch(ArrayIndexOutOfBoundsException ex) {
            lastNumber = 1000000;
        }

        // Handle zero as a special case
        System.out.println("0 \"zero\"");

        int exponent = 0;
        int N = 0;
        while (N < lastNumber) {
            exponent++;
            int max = lastNumber < (int)Math.pow(1000, exponent)-1
                ? lastNumber
                : (int)Math.pow(1000, exponent)-1;
            while (N++ < max) {
                StringBuilder text = new StringBuilder();
                for (int e = exponent-1; e >=0; e--) {
                    int group = (int)(N/Math.pow(1000, e)) % 1000;
                    if (group > 0) text.append(",");

                    // Determine the hundreds
                    if (group > 99) text.append(" " + (number[(int)(group/100)] + " hundred"));

                    // Determine the words for the last two digits of the group
                    int i = group % 100;
                    if (i > 0)
                        text.append(" "+(i < 20 ? number[i]
                            : tens[(int)(i/10)] + (i%10 > 0 ? "-"+number[i%10] : "")));

                    // Add "thousand" or "million" if needed
                    if (e>0 && group>0) text.append(" " + multiplier[e]);
                }
                System.out.printf("%s \"%s\"\n", N, text.toString().substring(2));
            }
            N--;
        }
    }
}

Lua: a respectable performer

Lua is a nice little scripting language, but is probably not robust enough for serious project work (for example, it doesn’t have an error handling mechanism.) In my testing it performed just a shade behind gawk.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#!/usr/bin/lua
number = {
    'one',     'two',       'three',    'four',     'five',
    'six',     'seven',     'eight',    'nine',     'ten',
    'eleven',  'twelve',    'thirteen', 'fourteen', 'fifteen',
    'sixteen', 'seventeen', 'eighteen', 'nineteen'
}
tens = {
    nil, 'twenty', 'thirty', 'forty', 'fifty', 'sixty',
    'seventy', 'eighty', 'ninety', 'hundred'
}
multiplier = { 'thousand', 'million', 'billion' }
if arg[1] then last_number = arg[1]+0 else last_number = 1000000 end

print ("0 \"zero\"")
N = 0
exponent = 0

while N < last_number do
    exponent = exponent + 1
    max = last_number < 1000^exponent-1 and last_number or 1000^exponent-1
    while N < max do
        N = N + 1
        text = ''
        for e = exponent-1, -1, -1 do
            group = math.floor(N/1000^e) % 1000
            if group > 0 then text = text..',' end

            -- Determine the hundreds
            if group > 99 then text = text..' '..number[math.floor(group/100)]..' hundred' end

            -- Determine the words for the last two digits of the group
            i = group % 100
            if i > 0 then
                if i < 20 then
                    text = text..' '..number[i]
                else
                    text = text..' '..tens[math.floor(i/10)]
                    if i%10 > 0 then text = text..'-'..number[i%10] end
                end
            end

            -- Add "thousand" or "million" if needed (j tells us if we need a trailing comma)
            if e > 0 and group > 0 then text = text..' '..multiplier[e] end
        end
        io.write(N..' "'..string.sub(text,3).."\"\n")
    end
end

PHP: Pretty-Hyper Performance

For all the disdain shown PHP on tech sites like Slashdot, it’s a powerful if rather quirky language. And for an interpreted language it’s fast, with PHP 7 running 20% to 30% faster than Perl. Perhaps other languages could learn a thing or two from the Zend engine.

PHP 5, not so much. Because it’s running an old version of Fedora, sparrow had only PHP 5 on it. It ran the code below at only half the speed of Perl. As a test, I downloaded PHP 7.3 from php.net and built it. That one performed much better, doing 343k records/second on the same source code, an 84% improvement over PHP 5 and a full 50% faster than Perl.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#!/usr/bin/php -f
<?php
$number = array(
    'zero',      'one',      'two',      'three',   'four',  'five',
    'six',       'seven',    'eight',    'nine',    'ten',   'eleven',
    'twelve',    'thirteen', 'fourteen', 'fifteen', 'sixteen',
    'seventeen', 'eighteen', 'nineteen'
);
$tens = array(
    NULL, NULL, 'twenty', 'thirty', 'forty',  'fifty',
    'sixty',   'seventy', 'eighty', 'ninety'
);
$multiplier = array(NULL, 'thousand', 'million', 'billion');

# Handle zero as a special case
echo "0 \"zero\"\n";

$N = 0; $exponent = 0;

$last_number = $argv[1] ? $argv[1] : 1000000;
while ($N < $last_number) {
    $exponent++;
    $max = $last_number < pow(1000, $exponent)-1 ? $last_number : pow(1000, $exponent)-1;
    while ($N++ < $max) {
        $text = '';
        for ($e = $exponent-1; $e >=0; $e--) {
            $group = intval($N/pow(1000, $e)) % 1000;
            if ($group > 0) { $text .= ','; }

            # Determine the hundreds
            if ($group > 99) { $text .= ' '.($number[intval($group/100)].' hundred'); }

            # Determine the words for the last two digits of the group
            $j = $group % 100;
            if ($j) {
                $text .= ' '.($j>=20 ? $tens[intval($j/10)].($j%10 ? '-'.$number[$j%10] : '') : $number[$j]);
            }

            # Add "thousand" or "million" if needed
            if ($e && $group) { $text .= ' '.$multiplier[$e]; }
        }
        printf("%s \"%s\"\n", $N, substr($text,2));
    }
    $N--;
}
?>

Python: maybe they should have named it Snail

Python was a big surprise. It’s the “in” thing these days, but for this task it’s at the bottom of the heap (outside of Tcl and bash.) Typically it runs at only half the speed of the Perl program.

This source code is the unuusal right-to-left version. The Python implementation of the more common left-to-right version is even slower.

At least it’s readable. 😊

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#!/usr/bin/python3
import sys

number = [ None, 'one', 'two', 'three', 'four', 'five', 'six', 'seven',
    'eight', 'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen',
    'fifteen', 'sixteen', 'seventeen', 'eighteen', 'nineteen' ]
tens = [ None, None, 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy',
    'eighty', 'ninety', 'hundred' ]
multiplier = [ None, 'thousand,', 'million,', 'billion,' ]

try:
    last_num = int(sys.argv[1])
except IndexError:
    last_num = 1000000

# Handle zero as a special case
print("0 \"zero\"")

for N in range(1, last_num+1):
    text = ""

    # Process the number in groups of three, right-to-left
    exponent = 0
    Q = N
    need_comma = False
    while Q:
        group = Q % 1000

        # Add "thousand" or "million" if needed (j tells us if we need a trailing comma)
        if exponent and group:
            x = multiplier[exponent]
            text = (x if need_comma else x[0:-1]) + ' ' + text

        # Determine the words for the last two digits of the group
        j = group % 100
        if j:
            text = (number[j] if j < 20 \
            else (tens[int(j/10)] + ('-'+number[j%10] if j%10 else '')))+' '+text
            need_comma = True

        # Determine the hundreds
        if group > 99:
            text = number[int(group/100)] + ' hundred ' + text
            need_comma = True

        # Move to the next group of thousands
        Q = int(Q/1000)
        exponent += 1

    print('%i "%s"' % (N, text[0:-1]))

Ruby: not that much of a gem

Ruby was another disappointing language, typically coming in third last just before Java and Python. On netvista it was actually slower than Python, probably because CentOS 6 (the last 32 bit version of CentOS) packages Ruby version 1.8.7 instead of 2.2.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#!/usr/bin/ruby
number = [
  nil,          "one",      "two",      "three",   "four", "five",
  "six",       "seven",    "eight",    "nine",    "ten", "eleven",
  "twelve",    "thirteen", "fourteen", "fifteen", "sixteen",
  "seventeen", "eighteen", "nineteen"
]
tens = [
  nil, nil, "twenty", "thirty", "forty", "fifty", "sixty",
   "seventy", "eighty", "ninety", "hundred"
]
multiplier = [ nil, "thousand", "million", "billion" ]

# Handle zero as a special casse
print("0 \"zero\"\n")

last_number = ARGV[0].nil? ? 1000000 : ARGV[0].to_i

n = 0
exponent = 0
while (n < last_number)
  exponent += 1
  max = last_number < 1000**exponent-1 ? last_number : 1000**exponent-1
  while (n < max)
    n += 1
    text = ''
    e = exponent - 1
    while e >= 0 do
      group = (n/1000**e).to_i % 1000
      if (group > 0) then text << ',' end

      # Determine the hundreds
      if group > 99 then text << ' ' << number[(group/100).to_i] <<  ' hundred' end

      # Determine the words for the last two digits of the group
      i = group % 100
      if i > 0
        j = i % 10
        text << ' ' << (i < 20 ? number[i] : (tens[(i/10).to_i] + (j > 0 ? "-#{number[j]}" : '')))
      end

      # Add "thousand" or "million"
      if e > 0 and group > 0 then text << ' ' << multiplier[e] end

      # Move to the next group of thousands
      e -= 1
    end
    print "#{n} \"" << text[2..-1] << "\"\n"
  end
end

Tcl: not the right tool for this job

Despite its feature set and the fact it apparently is compiled down to bytecode before execution, Tcl performed very poorly on this task, running at only a quarter of the speed of the Perl program. Its poorest performance was actually on my fastest computer (raven), where its speed was only 19% that of Perl’s.

Of all the scripting languages here, Tcl has the most unusual syntax. Many operations and functions are implmented as comands, and getting the result from a command requires enclosing it and its options in [ and ], much like bash’s $( ... ) syntax.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#!/usr/bin/tclsh
set number [list \
    nil       one      two      three   four  five \
    six       seven    eight    nine    ten   eleven \
    twelve    thirteen fourteen fifteen sixteen \
    seventeen eighteen nineteen ]
set tens [list \
    undef undef twenty thirty forty fifty sixty \
    seventy eighty ninety hundred ]
set multiplier [list undef thousand million billion]

# Handle zero as a special case
puts "0 \"zero\""

set last_number [expr { $argc > 0 ? $argv : 1000000 }]

incr last_number
set N 1
set exponent 0
while { $N < $last_number } {
    incr exponent
    set max [expr {$last_number < 1000**$exponent-1 ? $last_number : 1000**$exponent-1}]

    while { $N < $max } {
        set text ""

        set e [expr {$exponent - 1}]
        while { $e >= 0 } {
            set group [expr {$N / 1000**$e % 1000}]
            if { $group > 0 } { append text "," }

            # Determine the hundreds
            if { $group > 99 } {
                append text " " [lindex $number [expr {$group/100}]] " hundred"
            }

            # Determine the words for the last two digits of the group
            set i [expr {$group % 100}]
            if { $i < 20 } {
                if { $i > 0 } { append text " " [lindex $number $i] }
            } else {
                append text " " [lindex $tens [expr {$i/10}]]
                if { [expr {$i%10}] > 0 } {
                    append text "-" [lindex $number [expr {$i%10}]]
                }
            }

            # Add "thousand" or "million"
            if { $e > 0 && $group > 0 } { append text " " [lindex $multiplier $e] }
            incr e -1
        }
        set text [string trimleft $text ", "]
        puts "$N \"$text\""
        incr N
    }
}

bash: redefining “slow”

The man page (not the info pages) for bash has this to say:

Bugs

It’s too big and too slow.

“Slow” it is, at least for this task. The bash script I wrote to implement the words to text algorithm sits very solidly in last place on the speed tests, running on averge at a mere 2% of the speed of the Perl program. On raven the compiled C program is 555 times faster than bash.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#!/bin/bash

NUMBER=(
    undef     one      two      three   four  five
    six       seven    eight    nine    ten   eleven
    twelve    thirteen fourteen fifteen sixteen
    seventeen eighteen nineteen
)
TENS=(
    undef undef twenty thirty forty fifty sixty
    seventy eighty ninety hundred
)
MULTIPLIER=(undef thousand million billion)
[ "$1" ] && LAST_NUMBER=$1 || LAST_NUMBER=1000000

# Handle zero as a special case
echo '0 "zero"'

N=0
EXPONENT=0

while [ $N -lt $LAST_NUMBER ]
do
    EXPONENT=$((EXPONENT+1))
    I=$((1000**$EXPONENT-1))
    [ $LAST_NUMBER -lt $I ] && MAX=$LAST_NUMBER || MAX=$I
    while [ $((N++)) -lt $MAX ]
    do
        TEXT=""
        E=$((EXPONENT-1))
        while [ $E -ge 0 ]
        do
            GROUP=$((N/1000**E % 1000))
            [ "$GROUP" -gt 0 ] && TEXT="$TEXT,"

            # Determine the hundreds
            [ $GROUP -gt 99 ] && TEXT="$TEXT ${NUMBER[$((GROUP/100))]} hundred"

            # Determine the words for the last two digits of the group
            I=$((GROUP % 100))
            if [ $I -gt 0 ]
            then
                if [ $I -lt 20 ]
                then
                    TEXT="$TEXT ${NUMBER[$I]}"
                else
                    TEXT="$TEXT ${TENS[$((I/10))]}"
                    I_MOD_10=$((I % 10))
                    [ $I_MOD_10 -gt 0 ] && TEXT="$TEXT-${NUMBER[$I_MOD_10]}"
                fi
            fi

            # Add "thousand" or "million" if needed ($j tells us if we need a trailing comma)
            [ $E -gt 0 -a $GROUP != 0 ] && TEXT="$TEXT ${MULTIPLIER[$E]}"
            E=$((E-1))
        done
        echo "$N \"${TEXT:2}\""
    done
    N=$((N-1))
done