How well does a Berkeley database scale?

Once I got the basics of bdb-tool working, I decided to see how well it (and, of course, the Berkeley database itself) worked.

The initial runs were vey encouraging, averaging over 80,000 record inserts a second even up 32 million records. So I decided to go for broke and and try all 125 million. There were two twists with this test:

  • Because raven was busy re-running the 125 million record test with GNU dbm, I ran the test on sparrow
  • Because I don’t have a lot of free disk space on sparrow, I attached a USB-3 drive (hardware-wise, an unusual combination of SSD and spinning platter) and stored the database there.
[brian@sparrow bdb-tool]$ DB_FN=/mnt/HDD_1TB_USB3/numbers.db; rm -f $DB_FN
[brian@sparrow bdb-tool]$ RECORDS=125000000; time nice ionice -c3 /var/tmp/numbers.awk -vmax=$RECORDS |
    pv -ls$RECORDS | ./bdb-tool --newdb $DB_FN
 125M 0:25:23 [82.1k/s] [=======================================================================>] 100% 

real    25m23.889s
user    39m17.914s
sys     2m13.861s

25 minutes! That’s on a slower computer with a slower hard drive.

I later ran the script on my home server penguin, and it took 19 minutes 20 seconds to complete. The final file size was 21 GiB.

And finally on raven:

RECORDS=$((125*1000000)); time ./numbers-to-words $RECORDS|pv -ls$RECORDS|./bdb-tool numbers.bdb
 125M 0:07:01 [ 296k/s] [=======================================================================>] 100%

real    7m2.233s
user    7m10.390s
sys     0m40.810s