Numbers-to-words benchmarks on various systems

Brian

2019-11-08 21:10

Contents

Results for various implementations and computers
- Raw numbers: thousands of lines generated per second
- Relative performance (Perl=100)
Running the tests
Perl: the reference implementation
C: the cumbersome, brittle speed demon
awk: unexpectedly zippy; mawk even more so
Java: the amazing slowpoke
Lua: a respectable performer
PHP: Pretty-Hyper Performance
Python: maybe they should have named it Snail
Ruby: not that much of a gem
Tcl: not the right tool for this job
bash: redefining “slow”

Earlier I wrote about stress testing GNU dbm and Berkeley DB databases, and of using a program to generate the required millions of records. For keys the program outputs numbers and for data writes out the numbers as English words:

Number (key)	In English (value)
o	zero
1	one
2	two
3	three
(over 123 million rows not shown)
123456788	one hundred twenty-three million, four hundred fifty-six thousand, seven hundred eighty-eight
123456789	one hundred twenty-three million, four hundred fifty-six thousand, seven hundred eighty-nine
123456790	one hundred twenty-three million, four hundred fifty-six thousand, seven hundred ninety
(and so on)

I wrote the original numbers-to-words program in awk. When running it to feed records to Berkeley DB I thought it could be a limiting factor to the test; that is, Berkeley DB could possibly insert more records per second than the awk program could generate.

So I re-wrote it in C. It was, unsurprisingly, a lot faster, by 8½ times. As it turned out the awk program wasn’t the limiting factor: by itself the program could generate 180,000 records per second, but my bdb-tool program could insert only 144,000 per second.

Being curious, I followed up the C program with a Perl program, just to see how it would compare. Perl was about 27% faster than awk.

Then I got carried away. Over the course of the past week I’ve implemented the alogorithm in a bunch of other languages to see how well they performed.

Results for various implementations and computers

The results and my comments are summarized below.

Raw numbers: thousands of lines generated per second

In this table, the “baseline” number is based on timing the Perl program when generating 3 million numbers, then using that time to estimate how many numbers the program can produce in about 30 seconds, rounded to the nearset 100,000.

Most of the numbers below are k/s (thousands of numbers per second), with the exception of a couple suffixed with “/s” (numbers per second.)

Computer	Baseline (n/30s)	C	Awk	Java	Lua	Perl	PHP	Python	Ruby	Tcl	Bash
raven	13,300,000	5981	377	345	272	434	768	293	279	97	10.2
penguin	9,300,000	3757	201	242	208	303	411	145			4.9
sparrow (F23)	7,100,000	2863	349	144	169	243	403	112	134	67	5.1
sparrow (F31)	7,100,000	2840	311	137	164	287	362	116	158	68	6.4
duck	5,400,000	2385	128	122	116	174	290	83			3.8
heron	4,200,000	1785	97	106	101	151		72			3.4
dkb	4,200,000	1264	135	59		137		70	89		2.9
auk	3,700,000	1521	95		91	131		56			2.7
netvista	1,900,000	653	44	35	40	73	88	27	33	19	1.1
oldmary	1,200,000	537	21	20	31	49	33	21	12	13	612/s
Raspberry Pi 3	600,000	214	39	12	20	26		11	14	6.7	519/s
Raspberry Pi 1	200,000	166	23		7.7	11		4.0	3.9		170/s

Relative performance (Perl=100)

This table shows how fast each implementation ran relative to the Perl program. If the score is greater than 100 it ran faster; less than 100 it ran more slowly.

Computer	C	Awk	Java	Lua	Perl	PHP	Python	Ruby	Tcl	Bash
raven	1378	87	79	63	100	177	68	64	22	2
penguin	1240	66	80	69	100	136	48			2
sparrow (F23)	1178	144	59	70	100	166	46	55	28	2
sparrow (F31)	990	108	48	57	100	126	40	55	24	2
duck	1371	74	70	67	100	167	48			2
heron	1182	64	70	67	100		48			2
dkb	923	99	43		100		51	65		2
auk	1161	73		69	100		43			2
netvista	895	60	48	55	100	121	37	45	26	2
oldmary	1096	43	41	63	100	67	43	24	27	2
Raspberry Pi 3	823	150	46	77	100		42	54	26	2
Raspberry Pi 1	1509	209		70	100		36	35		2

Running the tests

A typical test is run like this:

RECORDS=5000000; time ./numbers-to-words.pl $RECORDS >/dev/null

Each implementation can be passed the number of records to generate on the command line. The output is sent to /dev/null because we’re interested only in how many numbers can be generated every second and not the actual output (which is actually really boring to read!)

Of course, I wasn’t content just to run the programs myself on the command line, primarily because I wanted to run them on almost a dozen different computers. So I put together a fancy shell script.

#!/bin/bash
LANGUAGE_LIST='C Awk Java Lua Perl PHP Python Ruby Tcl Bash'
TRIAL_COUNT=5
WITH_PIPEVIEW=false     # Don't use 'true': it imparts a penalty that varies
                        # with the language being tested
ESC=$'\e'
LOG_FN="benchmark.log"

B_START_TIME=$SECONDS
echo "$(date +'%T  %F') Benchmmark started (WITH_PIPEVIEW=$WITH_PIPEVIEW)" >>$LOG_FN

# First determine a value that Perl can use to run for about 30 seconds.
# Benchmark 3 million to start
if [ -f .benchmark-baseline ]
then
    BASELINE="$(< .benchmark-baseline)"
    echo "$(date +%T)  Baseline set to $BASELINE/30s from .benchmark-baseline" | tee -a $LOG_FN
else
    echo "Determining baseline (3 million numbers):"
    START_TIME=$(date +%s.%N)
    RECORDS=3000000; ./numbers-to-words.pl $RECORDS | pv -ls$RECORDS >/dev/null
    END_TIME=$(date +%s.%N)
    ELAPSED=$(echo "scale=2; $END_TIME/1-$START_TIME/1" | bc)
    BASELINE=$(echo "90000000/$ELAPSED/100000*100000" | bc)
    echo -n "$ESC[2A$ESC[42C"
    echo "${ELAPSED} seconds"
    echo "$ESC[KPerl program can generate approximately $BASELINE numbers in 30 seconds"
    #BASELINE=$((30000000/ELAPSED/100000*100000))
    #echo "Perl program can generate approximately $RECORDS numbers in 10 seconds"
    echo -n "$ESC[K"
    echo "$(date +%T)  Baseline 3,000,000 records in $ELAPSED seconds: $BASELINE/30s" >>$LOG_FN
    echo $BASELINE >.benchmark-baseline
fi
echo

# Test the languages
for P_LANG in $LANGUAGE_LIST
do
    # Determine how to run the program
    INTERPRETER="${P_LANG,,}"
    [ "${P_LANG}" == 'Tcl' ] && INTERPRETER='tclsh'
    [ "${P_LANG}" == 'C' ]   && INTERPRETER='gcc'
    if ! which $INTERPRETER &>/dev/null
    then
        echo "$P_LANG: cannot locate interpreter or compiler '$INTERPRETER'" | tee -a $LOG_FN
        continue
    fi

    # Determine the file extension for the language's program file
    PROGRAM="numbers-to-words"
    RECORDS=$BASELINE
    EXT=".$INTERPRETER"
    [ "$P_LANG" == 'Bash' ] && EXT='.sh' RECORDS=$((BASELINE/50))
    [ "$P_LANG" == 'C' ] && EXT='.c' RECORDS=$((BASELINE*10))
    [ "$P_LANG" == 'Java' ] && EXT='.class'
    [ "$P_LANG" == 'Perl' ] && EXT='.pl'
    [ "$P_LANG" == 'Python' ] && EXT='.py'
    [ "$P_LANG" == 'Ruby' ] && EXT='.rb'
    [ "$P_LANG" == 'Tcl' ] && EXT='.tcl' RECORDS=$((BASELINE/5))
    if [ ! "$EXT" ]; then
        echo "Missing file extension for $P_LANG"
        continue
    fi

    # Some languages need a bit of help
    AWK_PARAM=''
    if [ $P_LANG == 'Awk' ]; then
        INTERPRETER='awk -f'
        which mawk &>/dev/null && INTERPRETER='mawk -f'
        AWK_PARAM='-v max='
    fi
    if [ "${P_LANG}" == 'C' ]; then
        gcc -O3 -o numbers-to-words numbers-to-words.c
        INTERPRETER='./'
        EXT=''
    fi
    if [ "${P_LANG}" == 'Java' ]; then
        which javac &>/dev/null && javac numbers-to-words.java
        PROGRAM='numbersToWords'
        EXT=''  # java assumes .class extension
    fi
    [ "${P_LANG}" == 'PHP' -a -x /opt/php/bin/php ] && INTERPRETER='/opt/php/bin/php'
    $[ "$INTERPRETER" == './' ] || INTERPRETER="$INTERPRETER "

    # Run a quick test run to ensure the program works
    RUN="${INTERPRETER}${PROGRAM}${EXT} ${AWK_PARAM}$RECORDS"
    echo -e "\n$(date +%T)  $P_LANG ($RUN):" >>$LOG_FN
    eval "${RUN/$RECORDS/1} >/tmp/numbers-to-words.tmp"
    if ! grep -q '1 "one"' /tmp/numbers-to-words.tmp
    then
        echo "Error running \"$RUN\" -- skipping" | tee -a $LOG_FN
        rm -f /tmp/numbers-to-words.tmp
        continue
    fi
    rm -f /tmp/numbers-to-words.tmp

    # Run five trials
    TOTAL_SEC=0
    for TRIAL in $(seq $TRIAL_COUNT)
    do
        echo "$P_LANG trial $TRIAL ($RUN)"
        TIMEFORMAT="$ESC[$((16*(TRIAL-1)))C%2R seconds$ESC[80D$ESC[A$ESC[K$ESC[2A"
        START_TIME=$(date +%s.%N)
        if $WITH_PIPEVIEW
        then
            time $RUN | pv -ls$RECORDS >/dev/null
        else
            echo " ($(date +%T) Running; please wait)"
            time $RUN >/dev/null
        fi
        END_TIME=$(date +%s.%N)
        ELAPSED=$(echo "scale=2; $END_TIME/1-$START_TIME/1" | bc)
        R_S="$(echo "scale=2; $RECORDS/$ELAPSED" | bc)"
        echo "$(date +%T)    Trial $TRIAL: $START_TIME to $END_TIME, elapsed=$ELAPSED, $R_S records/second" >>$LOG_FN
        TOTAL_SEC=$(echo "scale=2; $TOTAL_SEC+$ELAPSED" | bc)
    done

    # Analyse the results and display
    [ "$TOTAL_SEC" == 0 ] && TOTAL_SEC=-1
    TOTAL_NUM=$((RECORDS * TRIAL))
    echo -n "$ESC[K$P_LANG:$ESC[10D$ESC[9C"
    if [ $TOTAL_NUM -lt 1000000 ]
    then
        L=${#TOTAL_NUM} P=$((L-3))
        [ $L -lt 4 ] && T=$TOTAL_NUM || T="${TOTAL_NUM:0:$P},${TOTAL_NUM:$P}"
        echo -n "$T numbers in $TOTAL_SEC seconds, "
    else
        T="$(echo "scale=1; $TOTAL_NUM/1000000" | bc) million"
        echo -n "${T/.0/} numbers in $TOTAL_SEC seconds, "
    fi
    echo "average $(echo "$TOTAL_NUM/$TOTAL_SEC/1000" | bc)k/sec"
    echo -n "$ESC[B$ESC[K$ESC[A"
    R_S="$(echo "$TOTAL_NUM/$TOTAL_SEC" | bc)"
    echo -e "$(date +%T)  ${T/.0/} numbers in $TOTAL_SEC seconds, average $R_S records/second" >>$LOG_FN
done

echo -e "\n$(date +%T)  Benchmark ended; running time $((SECONDS-B_START_TIME)) seconds" >>$LOG_FN
echo -e "________________________________________________________________________________\n" >>$LOG_FN

Perl: the reference implementation

Although I wrote the original program in awk, Perl is my “go to” language for pretty much everything beyond what I consider to be “trivial.” Therefore the reference implementation for my numbers to words program is Perl.

#!/usr/bin/perl -w
# Translate numbers to English words. For example:
#  1234567: one million, two hundred thirty-four thousand, five hundred sixty-seven
# Default is to generate numbers from 0 to 1,000,000. The last number can be
# passed as a parameter to the program.
use strict;
use integer;

my @number = qw(
    zero      one      two      three   four  five
    six       seven    eight    nine    ten   eleven
    twelve    thirteen fourteen fifteen sixteen
    seventeen eighteen nineteen
);
my @tens = qw(x x twenty thirty forty fifty sixty seventy eighty ninety hundred);
my @multiplier = qw(x thousand million billion);

# Handle zero as a special case
print "store 0 \"zero\"\n";

# Figuring out the thousands/million/billions on every number is expensive,
# so we use the approach of looping 1 .. 999; 1,000 .. 999,999; 1,000,000 ..
# 999,999,999 etc. $N tracks the number; $exponent tracks the thousands
my ($N, $exponent) = (0, 0);

# Process the full list of numbers from 1 to last_number
my $last_number = $ARGV[0] ? $ARGV[0] : 1000000;
while ($N < $last_number) {
    $exponent++;

    # Process the current "thousands" (0 .. 999, 1000 .. 9999, etc)
    my $max = $last_number < 1000**$exponent-1 ? $last_number : 1000**$exponent-1;
    while ($N++ < $max) {
        my $text;

        # Process a group (e.g. for 1,234,567, the groups are '1', '234', and '567')
        for (my $e = $exponent-1; $e >=0; $e--) {
            my $group = $N/1000**$e % 1000;
            $text .= ',' if $group;

            # Determine the hundreds
            $text .= ' ' . $number[$group/100] . ' hundred' if $group > 99;

            # Determine the words for the last two digits of the group
            my $i = $group % 100;
            $text .= ' ' . ($i >= 20 ? $tens[$i/10] . ($i%10 ? '-' . $number[$i%10] : '') : $number[$i]) if $i;

            # Add "thousand" or "million" if needed
            $text .= " $multiplier[$e]" if $e && $group;
        }
        # For speed, the generated text ends up with a preceding ", " (preventing
        # it requires a lot of checks that slow down the program.) Here we remove
        # that unwanted text before sending it to output.
        print "store $N \"", substr($text,2), "\"\n";
    }
    $N--;
}

C: the cumbersome, brittle speed demon

When it comes to this task, there’s no doubt C is the speed winner. Across all computers it generally outperformed the runner-up by a factor of 8 and then some. But that blazing speed comes at a cost: C programs are often prone to nasty bugs due to its lack of memory protection. And in my experience, writing and debugging C programs takes longer due to a more primitive standard library and crashes caused by the aforementioned memory issues.

The code here is noticeably different. Most of the programs take a left-to-right approach when building the text string. For example, the number 12,345,678 is typically handled in three groups: 12, then 345, and finally 678. But the original C program I wrote did it in reverse, staring at 678, then working with 345, and finally 12. It also builds the text string from right to left.

For most languages the left-to-right approach is a few percent faster than the reverse. However, with C the opposite is true: the fastest version I could write for the left-to-right approach was 15% slower than the right-to-left.

/* Print numbers as words */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void main(int argc, char **argv)
{
    long max = 1000000;
    char *number[] = {
        NULL,        "one",      "two",      "three",   "four", "five",
        "six",       "seven",    "eight",    "nine",    "ten", "eleven",
        "twelve",    "thirteen", "fourteen", "fifteen", "sixteen",
        "seventeen", "eighteen", "nineteen"
    };
    char *tens[] = {
        NULL, NULL, "twenty", "thirty", "forty", "fifty", "sixty",
        "seventy", "eighty", "ninety", "hundred"
    };
    char *multiplier[] = { NULL, "thousand,", "million,", "billion," };
    char text[128];
    long N;
    int Q, exponent, i, group, l, t, need_comma;

    if (argc > 1)
        max = atol(argv[1]);

    printf("0 \"zero\"\n");     /* Handle zero as a special case */

    for (N=1; N <= max; N++) {
        /* The text field is built starting at the end and going backwards */
        memset(text, ' ', 127);
        text[127] = '"';
        t = 127;

        /* Process the number in groups of three, right-to-left */
        exponent = 0;
        need_comma = 0;
        Q = N;
        while (Q) {
            group = Q % 1000;

            /* Add "thousand" or "million" if needed */
            if (exponent && group) {
                l = strlen(multiplier[exponent]);
                if (need_comma) {
                    t -= l+1;
                    memcpy(text+t, multiplier[exponent], l);
                } else {
                    t -= l-1;
                    memcpy(text+t, multiplier[exponent], l-1);
                }
                t--;
            }

            /* Determine the words for the last two digits of the group */
            i = group % 100;
            if (i) {
                if (i < 20) {
                    l = strlen(number[i]); t -= l; memcpy(text+t, number[i], l);
                } else {
                    if (i%10) {     /* i doesn't end with zero */
                        l = strlen(number[i%10]); t -= l; memcpy(text+t, number[i%10], l);
                        text[--t] = '-';
                    }
                    l = strlen(tens[i/10]); t -= l; memcpy(text+t, tens[i/10], l);
                }
                need_comma = 1;
            }

            /* Determine the hundreds */
            if (group > 99) {
                if (i) t--;     /* Add a space after "hundred" if last two digits > 00 */
                t -= 8; memcpy(text+t, " hundred", 8);
                l = strlen(number[group/100]); t -= l; memcpy(text+t, number[group/100], l);
                need_comma = 1;
            }

            /* Move to the next group of thousands */
            Q /= 1000;
            exponent++;
        }

        /* "printf("%s", string) is actually pretty expensive. You don't notice
         * it when you're printing only a few lines, but do it a hundred million
         * times and you'll see it's about 15% slower than the code presented
         * here. */
        /* printf("%li \"%s\n", N, text+t); */
        printf("%li \"", N);
        puts(text+t);
    }
}

awk: unexpectedly zippy; mawk even more so

The very first implementation of numbers-to-words was written in awk. After working on the Perl and C versions, I re-wrote the awk program to make use of what I had learned.

For such a simple language, awk is actually unexpectedly fast, performing at a respectable 55% to 70% the speed of the Perl program.

Then the Raspberry Pi gave me a surprise. The Pi 3 is usually three to four times faster than the Pi 1, but on the Pi 1 the awk program was a touch faster than the 3! Of course I had to find out why, and discovered the Pi 1 was using an implementation called mawk instead of Gnu’s gawk.

mawk is 80% faster than Gnu’s. When I installed it on sparrow to test, it outran the Perl program by nearly 50% (331k records/second vs Perl’s 225.)

#!/usr/bin/awk -f
# You can pass "-v max=<number>" to override the default maximum of 1,000,000
BEGIN {
    number[0] = "zero";       number[1] = "one";       number[2] = "two"; number[3] = "three"
    number[4] = "four";       number[5] = "five";      number[6] = "six"; number[7] = "seven"
    number[8] = "eight";      number[9] = "nine";      number[10] = "ten"
    number[11] = "eleven";    number[12] = "twelve";   number[13] = "thirteen"
    number[14] = "fourteen";  number[15] = "fifteen";  number[16] = "sixteen"
    number[17] = "seventeen"; number[18] = "eighteen"; number[19] = "nineteen"

    tens[2] = "twenty"; tens[3] = "thirty";   tens[4] = "forty";  tens[5] = "fifty"
    tens[6] = "sixty";  tens[7] = "seventy";  tens[8] = "eighty";   tens[9] = "ninety"

    multiplier[1] = "thousand"; multiplier[2] = "million"; multiplier[3] = "billion"

    print "0 \"zero\""  # Handle zero as a special case

    N = 0; exponent = 0
    last_number = max ? max : 1000000
    while (N < last_number) {
        exponent++;
        max = last_number < 1000^exponent-1 ? last_number : 1000^exponent-1
        while (N++ < max) {
            text = ""
            for (e=exponent-1; e>=0; e--) {
                group = int(N/1000^e) % 1000
                if (group) { text = text "," }

                # Determine the hundreds
                if (group > 99) { text = text " " number[int(group/100)] " hundred" }

                # Determine the words for the last two digits of the group
                i = group % 100;
                if (i) {
                    text = text " " (i >= 20 ? tens[int(i/10)] (i%10 ? "-" number[i%10] : "") : number[i])
                }

                # Add "thousand" or "million" if needed
                if (e && group) { text = text " " multiplier[e] }
            }
            print N " \"" substr(text, 3) "\""
        }
        N--
    }
}

Java: the amazing slowpoke

To be honest, I was stunned to see how badly Java performed at this task. Despite having had years to tweak the language and the VM, Java was in second last place among the major languages I tried–only Python performed worse. (I’m not including Tcl or bash because to my knowledge companies don’t write multi-million line projects in those languages.)

For the Java implementation, I chose to use StringBuilder because the memory is reclaimed when the object gos out of scope after each number, reducing the burden on the garbage collector.

/* To run:
 * javac numbers-to-words.java      # Compile into a class file
 * java numbersToWords              # Run it
 */

class numbersToWords
{
    public static void main(String argv[])
    {
        String[] number = {
            "zero",      "one",      "two",      "three",   "four",  "five",
            "six",       "seven",    "eight",    "nine",    "ten",   "eleven",
            "twelve",    "thirteen", "fourteen", "fifteen", "sixteen",
            "seventeen", "eighteen", "nineteen"
        };
        String[] tens = {
            null, null, "twenty", "thirty", "forty",  "fifty",
            "sixty",   "seventy", "eighty", "ninety"
        };
        String[] multiplier = { null, "thousand", "million", "billion" };

        int lastNumber;
        try {
            lastNumber = Integer.parseInt(argv[0]);
        } catch(ArrayIndexOutOfBoundsException ex) {
            lastNumber = 1000000;
        }

        // Handle zero as a special case
        System.out.println("0 \"zero\"");

        int exponent = 0;
        int N = 0;
        while (N < lastNumber) {
            exponent++;
            int max = lastNumber < (int)Math.pow(1000, exponent)-1
                ? lastNumber
                : (int)Math.pow(1000, exponent)-1;
            while (N++ < max) {
                StringBuilder text = new StringBuilder();
                for (int e = exponent-1; e >=0; e--) {
                    int group = (int)(N/Math.pow(1000, e)) % 1000;
                    if (group > 0) text.append(",");

                    // Determine the hundreds
                    if (group > 99) text.append(" " + (number[(int)(group/100)] + " hundred"));

                    // Determine the words for the last two digits of the group
                    int i = group % 100;
                    if (i > 0)
                        text.append(" "+(i < 20 ? number[i]
                            : tens[(int)(i/10)] + (i%10 > 0 ? "-"+number[i%10] : "")));

                    // Add "thousand" or "million" if needed
                    if (e>0 && group>0) text.append(" " + multiplier[e]);
                }
                System.out.printf("%s \"%s\"\n", N, text.toString().substring(2));
            }
            N--;
        }
    }
}

Lua: a respectable performer

Lua is a nice little scripting language, but is probably not robust enough for serious project work (for example, it doesn’t have an error handling mechanism.) In my testing it performed just a shade behind gawk.

#!/usr/bin/lua
number = {
    'one',     'two',       'three',    'four',     'five',
    'six',     'seven',     'eight',    'nine',     'ten',
    'eleven',  'twelve',    'thirteen', 'fourteen', 'fifteen',
    'sixteen', 'seventeen', 'eighteen', 'nineteen'
}
tens = {
    nil, 'twenty', 'thirty', 'forty', 'fifty', 'sixty',
    'seventy', 'eighty', 'ninety', 'hundred'
}
multiplier = { 'thousand', 'million', 'billion' }
if arg[1] then last_number = arg[1]+0 else last_number = 1000000 end

print ("0 \"zero\"")
N = 0
exponent = 0

while N < last_number do
    exponent = exponent + 1
    max = last_number < 1000^exponent-1 and last_number or 1000^exponent-1
    while N < max do
        N = N + 1
        text = ''
        for e = exponent-1, -1, -1 do
            group = math.floor(N/1000^e) % 1000
            if group > 0 then text = text..',' end

            -- Determine the hundreds
            if group > 99 then text = text..' '..number[math.floor(group/100)]..' hundred' end

            -- Determine the words for the last two digits of the group
            i = group % 100
            if i > 0 then
                if i < 20 then
                    text = text..' '..number[i]
                else
                    text = text..' '..tens[math.floor(i/10)]
                    if i%10 > 0 then text = text..'-'..number[i%10] end
                end
            end

            -- Add "thousand" or "million" if needed (j tells us if we need a trailing comma)
            if e > 0 and group > 0 then text = text..' '..multiplier[e] end
        end
        io.write(N..' "'..string.sub(text,3).."\"\n")
    end
end

PHP: Pretty-Hyper Performance

For all the disdain shown PHP on tech sites like Slashdot, it’s a powerful if rather quirky language. And for an interpreted language it’s fast, with PHP 7 running 20% to 30% faster than Perl. Perhaps other languages could learn a thing or two from the Zend engine.

PHP 5, not so much. Because it’s running an old version of Fedora, sparrow had only PHP 5 on it. It ran the code below at only half the speed of Perl. As a test, I downloaded PHP 7.3 from php.net and built it. That one performed much better, doing 343k records/second on the same source code, an 84% improvement over PHP 5 and a full 50% faster than Perl.

#!/usr/bin/php -f
<?php
$number = array(
    'zero',      'one',      'two',      'three',   'four',  'five',
    'six',       'seven',    'eight',    'nine',    'ten',   'eleven',
    'twelve',    'thirteen', 'fourteen', 'fifteen', 'sixteen',
    'seventeen', 'eighteen', 'nineteen'
);
$tens = array(
    NULL, NULL, 'twenty', 'thirty', 'forty',  'fifty',
    'sixty',   'seventy', 'eighty', 'ninety'
);
$multiplier = array(NULL, 'thousand', 'million', 'billion');

# Handle zero as a special case
echo "0 \"zero\"\n";

$N = 0; $exponent = 0;

$last_number = $argv[1] ? $argv[1] : 1000000;
while ($N < $last_number) {
    $exponent++;
    $max = $last_number < pow(1000, $exponent)-1 ? $last_number : pow(1000, $exponent)-1;
    while ($N++ < $max) {
        $text = '';
        for ($e = $exponent-1; $e >=0; $e--) {
            $group = intval($N/pow(1000, $e)) % 1000;
            if ($group > 0) { $text .= ','; }

            # Determine the hundreds
            if ($group > 99) { $text .= ' '.($number[intval($group/100)].' hundred'); }

            # Determine the words for the last two digits of the group
            $j = $group % 100;
            if ($j) {
                $text .= ' '.($j>=20 ? $tens[intval($j/10)].($j%10 ? '-'.$number[$j%10] : '') : $number[$j]);
            }

            # Add "thousand" or "million" if needed
            if ($e && $group) { $text .= ' '.$multiplier[$e]; }
        }
        printf("%s \"%s\"\n", $N, substr($text,2));
    }
    $N--;
}
?>

Python: maybe they should have named it Snail

Python was a big surprise. It’s the “in” thing these days, but for this task it’s at the bottom of the heap (outside of Tcl and bash.) Typically it runs at only half the speed of the Perl program.

This source code is the unuusal right-to-left version. The Python implementation of the more common left-to-right version is even slower.

At least it’s readable. 😊

#!/usr/bin/python3
import sys

number = [ None, 'one', 'two', 'three', 'four', 'five', 'six', 'seven',
    'eight', 'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen',
    'fifteen', 'sixteen', 'seventeen', 'eighteen', 'nineteen' ]
tens = [ None, None, 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy',
    'eighty', 'ninety', 'hundred' ]
multiplier = [ None, 'thousand,', 'million,', 'billion,' ]

try:
    last_num = int(sys.argv[1])
except IndexError:
    last_num = 1000000

# Handle zero as a special case
print("0 \"zero\"")

for N in range(1, last_num+1):
    text = ""

    # Process the number in groups of three, right-to-left
    exponent = 0
    Q = N
    need_comma = False
    while Q:
        group = Q % 1000

        # Add "thousand" or "million" if needed (j tells us if we need a trailing comma)
        if exponent and group:
            x = multiplier[exponent]
            text = (x if need_comma else x[0:-1]) + ' ' + text

        # Determine the words for the last two digits of the group
        j = group % 100
        if j:
            text = (number[j] if j < 20 \
            else (tens[int(j/10)] + ('-'+number[j%10] if j%10 else '')))+' '+text
            need_comma = True

        # Determine the hundreds
        if group > 99:
            text = number[int(group/100)] + ' hundred ' + text
            need_comma = True

        # Move to the next group of thousands
        Q = int(Q/1000)
        exponent += 1

    print('%i "%s"' % (N, text[0:-1]))

Ruby: not that much of a gem

Ruby was another disappointing language, typically coming in third last just before Java and Python. On netvista it was actually slower than Python, probably because CentOS 6 (the last 32 bit version of CentOS) packages Ruby version 1.8.7 instead of 2.2.

#!/usr/bin/ruby
number = [
  nil,          "one",      "two",      "three",   "four", "five",
  "six",       "seven",    "eight",    "nine",    "ten", "eleven",
  "twelve",    "thirteen", "fourteen", "fifteen", "sixteen",
  "seventeen", "eighteen", "nineteen"
]
tens = [
  nil, nil, "twenty", "thirty", "forty", "fifty", "sixty",
   "seventy", "eighty", "ninety", "hundred"
]
multiplier = [ nil, "thousand", "million", "billion" ]

# Handle zero as a special casse
print("0 \"zero\"\n")

last_number = ARGV[0].nil? ? 1000000 : ARGV[0].to_i

n = 0
exponent = 0
while (n < last_number)
  exponent += 1
  max = last_number < 1000**exponent-1 ? last_number : 1000**exponent-1
  while (n < max)
    n += 1
    text = ''
    e = exponent - 1
    while e >= 0 do
      group = (n/1000**e).to_i % 1000
      if (group > 0) then text << ',' end

      # Determine the hundreds
      if group > 99 then text << ' ' << number[(group/100).to_i] <<  ' hundred' end

      # Determine the words for the last two digits of the group
      i = group % 100
      if i > 0
        j = i % 10
        text << ' ' << (i < 20 ? number[i] : (tens[(i/10).to_i] + (j > 0 ? "-#{number[j]}" : '')))
      end

      # Add "thousand" or "million"
      if e > 0 and group > 0 then text << ' ' << multiplier[e] end

      # Move to the next group of thousands
      e -= 1
    end
    print "#{n} \"" << text[2..-1] << "\"\n"
  end
end

Tcl: not the right tool for this job

Despite its feature set and the fact it apparently is compiled down to bytecode before execution, Tcl performed very poorly on this task, running at only a quarter of the speed of the Perl program. Its poorest performance was actually on my fastest computer (raven), where its speed was only 19% that of Perl’s.

Of all the scripting languages here, Tcl has the most unusual syntax. Many operations and functions are implmented as comands, and getting the result from a command requires enclosing it and its options in [ and ], much like bash’s $( ... ) syntax.

#!/usr/bin/tclsh
set number [list \
    nil       one      two      three   four  five \
    six       seven    eight    nine    ten   eleven \
    twelve    thirteen fourteen fifteen sixteen \
    seventeen eighteen nineteen ]
set tens [list \
    undef undef twenty thirty forty fifty sixty \
    seventy eighty ninety hundred ]
set multiplier [list undef thousand million billion]

# Handle zero as a special case
puts "0 \"zero\""

set last_number [expr { $argc > 0 ? $argv : 1000000 }]

incr last_number
set N 1
set exponent 0
while { $N < $last_number } {
    incr exponent
    set max [expr {$last_number < 1000**$exponent-1 ? $last_number : 1000**$exponent-1}]

    while { $N < $max } {
        set text ""

        set e [expr {$exponent - 1}]
        while { $e >= 0 } {
            set group [expr {$N / 1000**$e % 1000}]
            if { $group > 0 } { append text "," }

            # Determine the hundreds
            if { $group > 99 } {
                append text " " [lindex $number [expr {$group/100}]] " hundred"
            }

            # Determine the words for the last two digits of the group
            set i [expr {$group % 100}]
            if { $i < 20 } {
                if { $i > 0 } { append text " " [lindex $number $i] }
            } else {
                append text " " [lindex $tens [expr {$i/10}]]
                if { [expr {$i%10}] > 0 } {
                    append text "-" [lindex $number [expr {$i%10}]]
                }
            }

            # Add "thousand" or "million"
            if { $e > 0 && $group > 0 } { append text " " [lindex $multiplier $e] }
            incr e -1
        }
        set text [string trimleft $text ", "]
        puts "$N \"$text\""
        incr N
    }
}

bash: redefining “slow”

The man page (not the info pages) for bash has this to say:

Bugs

It’s too big and too slow.

“Slow” it is, at least for this task. The bash script I wrote to implement the words to text algorithm sits very solidly in last place on the speed tests, running on averge at a mere 2% of the speed of the Perl program. On raven the compiled C program is 555 times faster than bash.

#!/bin/bash

NUMBER=(
    undef     one      two      three   four  five
    six       seven    eight    nine    ten   eleven
    twelve    thirteen fourteen fifteen sixteen
    seventeen eighteen nineteen
)
TENS=(
    undef undef twenty thirty forty fifty sixty
    seventy eighty ninety hundred
)
MULTIPLIER=(undef thousand million billion)
[ "$1" ] && LAST_NUMBER=$1 || LAST_NUMBER=1000000

# Handle zero as a special case
echo '0 "zero"'

N=0
EXPONENT=0

while [ $N -lt $LAST_NUMBER ]
do
    EXPONENT=$((EXPONENT+1))
    I=$((1000**$EXPONENT-1))
    [ $LAST_NUMBER -lt $I ] && MAX=$LAST_NUMBER || MAX=$I
    while [ $((N++)) -lt $MAX ]
    do
        TEXT=""
        E=$((EXPONENT-1))
        while [ $E -ge 0 ]
        do
            GROUP=$((N/1000**E % 1000))
            [ "$GROUP" -gt 0 ] && TEXT="$TEXT,"

            # Determine the hundreds
            [ $GROUP -gt 99 ] && TEXT="$TEXT ${NUMBER[$((GROUP/100))]} hundred"

            # Determine the words for the last two digits of the group
            I=$((GROUP % 100))
            if [ $I -gt 0 ]
            then
                if [ $I -lt 20 ]
                then
                    TEXT="$TEXT ${NUMBER[$I]}"
                else
                    TEXT="$TEXT ${TENS[$((I/10))]}"
                    I_MOD_10=$((I % 10))
                    [ $I_MOD_10 -gt 0 ] && TEXT="$TEXT-${NUMBER[$I_MOD_10]}"
                fi
            fi

            # Add "thousand" or "million" if needed ($j tells us if we need a trailing comma)
            [ $E -gt 0 -a $GROUP != 0 ] && TEXT="$TEXT ${MULTIPLIER[$E]}"
            E=$((E-1))
        done
        echo "$N \"${TEXT:2}\""
    done
    N=$((N-1))
done

Numbers-to-words benchmarks on various systems

Results for various implementations and computers

Raw numbers: thousands of lines generated per second

Relative performance (Perl=100)

Running the tests

Perl: the reference implementation

C: the cumbersome, brittle speed demon

awk: unexpectedly zippy; mawk even more so

Java: the amazing slowpoke

Lua: a respectable performer

PHP: Pretty-Hyper Performance

Python: maybe they should have named it Snail

Ruby: not that much of a gem

Tcl: not the right tool for this job

bash: redefining “slow”

About

Recent Posts

Archives

Results for various implementations and computers

Raw numbers: thousands of lines generated per second

Relative performance (Perl=100)

Running the tests

Perl: the reference implementation

C: the cumbersome, brittle speed demon

awk: unexpectedly zippy; mawk even more so

Java: the amazing slowpoke

Lua: a respectable performer

PHP: Pretty-Hyper Performance

Python: maybe they should have named it Snail

Ruby: not that much of a gem

Tcl: not the right tool for this job

bash: redefining “slow”

About

Recent Posts

Archives

Tags