Elapsed time for MultiMarkdown to format a large file
I’ve known for a while now that MultiMarkdown is an inefficient program that scales poorly. As a test, I used the notebook I wrote at Olinia for the year 2016, which has 35,629 lines and weighs in at just over 1.6 MB. After removing all but the first line of the Table of Contents from the file, I used a script to successively cut the file in half until the number of lines was reduced to under 50. I then re-generated the Table of Contents and used MultiMarkdown to format each file.
Elapsed time for MultiMarkdown to format files approximately doubling in size
Lines | raven | Dell Laptop | penguin | HP Pavilion | Raspberry Pi 3 | Acer T180 | Raspberry Pi 1 | |
---|---|---|---|---|---|---|---|---|
N | 35 | 0.0s | 0.0s | 0.3s | 0.0s | 0.5s | 0.0s | 1.5s |
O | 69 | 0.0s | 0.1s | 0.0s | 0.1s | 0.8s | 0.0s | 1.9s |
P | 140 | 0.1s | 0.2s | 0.1s | 0.1s | 1.4s | 0.1s | 3.0s |
Q | 284 | 0.1s | 0.2s | 0.1s | 0.3s | 1.3s | 0.2s | 4.6s |
R | 566 | 0.2s | 0.4s | 0.2s | 0.5s | 2.0s | 0.5s | 7.0s |
S | 1130 | 0.5s | 0.9s | 0.4s | 0.8s | 6.3s | 1.7s | 11.3s |
T | 2268 | 1.6s | 2.6s | 1.9s | 2.8s | 18.7s | 0.8s | 34.2s |
U | 4517 | 3.1s | 5.2s | 4.6s | 6.8s | 31.1s | 8.0s | 1m19.5s |
V | 9017 | 12.7s | 23.1s | 42.2s | 54.3s | 2m30.0s | 1m38.3s | 9m47.1s |
W | 17914 | 42.2s | 1m08.5s | 1m45.7s | 1m59.9s | 7m04.9s | 13m46.6s | 21m13.0s |
Y | 35629 | 7m08.3s | 10m50.8s | 20m33.7s | 21m17.0s | 60m41.8s | 1h54m43.4s | 3h22m25.5s |
Here’s the script I used to generate the files and run MultiMarkdown:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
#!/bin/bash LINE_COUNT=$(wc -l Notebook.b.2016.text | cut -f1 -d' ') ALPHABET="YWVUTSRQPONMLKHIJGFEDCBA" TEMP_FN="/r/Notebook.2016.tmp" # REMINDER: Remove all but the first line of the Table of Contents from # Notebook.b.2016.text LINE_COUNT=$((LINE_COUNT * 2)) N=-1 while [ $LINE_COUNT -gt 50 ] do N=$((N+1)); L=${ALPHABET:$N:1} TO_FN="/r/Notebook.2016.$L.text" LINE_COUNT=$((LINE_COUNT/2)) echo "$N $TO_FN ($LINE_COUNT lines)" head -n $LINE_COUNT Notebook.b.2016.text >$TEMP_FN genTOC.pl $TEMP_FN &>/dev/null TOC_FIRST_LINE=$(grep -n '^Table of Contents' $TEMP_FN | tail -n 1 | sed 's/\([0-9]\+\).*/\1/') TOC_FIRST_LINE=$((TOC_FIRST_LINE + 2)) TOC_LAST_LINE=$(grep -n '^ \+[0-9]\+ \+[A-Z]' $TEMP_FN | tail -n 1 | sed 's/\([0-9]\+\).*/\1/') sed "${TOC_FIRST_LINE},${TOC_LAST_LINE}s/^/ /" $TEMP_FN >$TO_FN done rm -f $TEMP_FN export TIMEFORMAT='real %1lR (%1R seconds)' cd /r for FILE in Notebook.2016.[A-Z].text do echo -e "_____\n" echo $FILE OUT_FN=${FILE/.2016/}; OUT_FN=${OUT_FN/.text/.html} time MultiMarkdown.pl $FILE >$OUT_FN done |