How well does a GNU dbm database scale?

While working with linked lists to mimic Brtieve keys for QRetail, I idly thought, “how well does a Berkeley or GNU database scale?” I’m not talking a little database with a few thousand keys in it; I’m thinking up in the hundreds of millions. For example, a database of citizens of a large country or group of countries; or a year’s worth of call detail records for a large telephone company with eight million subscribers, each of whom could make any number of calls each day; or a year’s worth of credit card transactions for a major credit supplier.

The first problem I had was getting a dataset with the required quantity of records. I hit on the idea of using a database of numbers converted to English words:

Number (key) In English (value)
o zero
1 one
2 two
3 three
(over 123 million rows not shown)
123456788 one hundred twenty-three million, four hundred fifty-six thousand, seven hundred eighty-eight
123456789 one hundred twenty-three million, four hundred fifty-six thousand, seven hundred eighty-nine
123456790 one hundred twenty-three million, four hundred fifty-six thousand, seven hundred ninety
(and so on)

I wrote a Perl program to generate the input file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#!/usr/bin/perl -w
use strict;
use integer;

my @number = qw(
    undef     one      two      three   four  five
    six       seven    eight    nine    ten   eleven
    twelve    thirteen fourteen fifteen sixteen
    seventeen eighteen nineteen
);
my @tens = qw(x x twenty thirty forty fifty sixty seventy eighty ninety);
my @multiplier = (undef, 'thousand,', 'million,', 'billion,');

for (my $N=0; $N <= ($ARGV[0] ? $ARGV[0] : 1000000); $N++) {
    my $text = '';

    # Handle zero as a special case
    print("store 0 \"zero\"\n"), next if $N == 0;

    # Process the number in groups of three, right-to-left
    my ($Q, $exponent, $j) = ($N, 0, 0);
    while ($Q) {
        my $group = $Q % 1000;

        # Add "thousand" or "million" if needed ($j tells us if we need a trailing comma)
        $text = ($j ? $multiplier[$exponent] : substr($multiplier[$exponent], 0, -1))." $text"
            if $exponent && $group;

        # Determine the words for the last two digits of the group
        $j = $group % 100;
        $text = ($j >= 20 ? $tens[$j/10] . ($j % 10 ? '-' . $number[$j % 10] : '')
            : $number[$j]) . " $text" if $j;

        # Determine the hundreds
        $text = ($number[$group/100] . ' hundred')." $text" if $group > 99;

        # Move to the next group of thousands
        $Q /= 1000;
        $exponent++;
        $j = $group;    # (use j to pass this group's value to the next iteration)
    }
    my $sp = '';
    print "store $N \"", substr($text,0,-1), "\"\n";
}

That done, it was a simple (but initially futile) task to generate the database:

[brian@raven numbers]$ RECORDS=125000000; time ./numbers.pl -vmax=$RECORDS | pv -ls$RECORDS |
    gdbmtool --newdb numbers.db
  74.5M 12:14:23 [1.69k/s] [======================================>                        ] 59%
gdbmtool: interrupt

As can be seen from the above, I cancelled the run after 12 hours and only 74 million records. It turned out the old disc on which I was running the program was nearing its end of life: smartctl showed numerous errors when I ran it to check the drive.

I replaced the drive with a 240 GB SSD (that really should be running in sparrow but which to now I haven’t made the time to set up the latest Fedora on it) and I ran the program again:

[brian@raven numbers]$ RECORDS=125000000; time ./numbers.pl -vmax=$RECORDS | pv -ls$RECORDS |
    gdbmtool --newdb numbers.db
 112.1M 10:13:36 537/s] [===========================================================>      ] 89% ETA 2:33:08
gdbmtool: interrupt

I let the program run for only ten hours before I cancelled it. (The program was running in an ssh shell from my laptop. I needed to take the laptop somewhere and becasue I wasn’t running screen on raven I lost both ssh and raven’s shell when I disconnected the laptop from my network.)

At the time I cancelled the program, it had inserted 112,063,312 records over 10.25 hours, for an overall average of 3,044 records/second. The file size was 18.5 GiB (19,864,502,272 bytes.)

Two things are notable from the above:

  • At 112 million records, the GNU database was handling less than 1,000 new records a second. That’s way down from the average of 175,000/second at the outset.
  • The ETA is very optimistic. With 12.9 million records to go and only 537 new records/second, the ETA is more like 6 hours and 50 minutes, and that doesn’t account for the distinct possibility later inserts will be even slower.

Another test I ran started with 131,072 (217) records and doubled from there. Here are the results:

Records Elapsed Time Average records/second
131,072 (217) 0:00:01 131,072
262,144 (218) 0:00:02 131,072
524,288 (219) 0:00:03 174,763
1,048,576 (220) 0:00:06 174,763
2,097,152 (221) 0:00:13 161,319
4,194,304 (222) 0:00:31 135,300
8,388,608 (223) 0:01:19 106,183
16,777,216 (224) 0:07:31 37,200
33,554,432 (225) 0:12:55 43,296
67,108,864 (226) 4:33:02 4,096
112,063,312 (226+) 10:13:36 3,044

Aside from a curious anomaly at 33.5 million records, where the average number of records/second went up, (possibly due to not having to allocate as much new space to the file and less effort to re-balance the B-tree) the more records added to the file the slower they went. This is especially true for the jump from 33.5 to 67.1 million records. Although the number of records only doubled, the elapsed time jumped by more than an order of magnitude (it took 21 times longer for just twice as many records.)

Finally, I ran the program again on raven, this time using screen so I could disconnect as needed. Here is the result:

[brian@raven numbers]$ RECORDS=125000000; time ./numbers.pl -vmax=$RECORDS | pv -ls$RECORDS |
    gdbmtool --newdb numbers.db
 125M 33:12:02 [1.05k/s] [============================================================>] 100%

real    1992m4.133s
user    227m6.004s
sys     608m43.781s

The final file size was 20.6 GiB (22,154,166,272 bytes.)

Under screen I noticed the program ran more slowly than expected, possibly because the system lowers a program’s priority when screen is disconnected. I later ran the program on the console and saw a 53% improvement in the number of adds/second:

[brian@raven numbers]$ RECORDS=125000000; time ./numbers.pl -vmax=$RECORDS | pv -ls$RECORDS |
    gdbmtool --newdb numbers.db
 125M 21:43:12 [1.60k/s] [============================================================>] 100%

real    1303m15.037s
user    217m51.004s
sys     579m21.597s