Elapsed time for MultiMarkdown to format a large file

Brian

2018-02-06 19:54

I’ve known for a while now that MultiMarkdown is an inefficient program that scales poorly. As a test, I used the notebook I wrote at Olinia for the year 2016, which has 35,629 lines and weighs in at just over 1.6 MB. After removing all but the first line of the Table of Contents from the file, I used a script to successively cut the file in half until the number of lines was reduced to under 50. I then re-generated the Table of Contents and used MultiMarkdown to format each file.

Elapsed time for MultiMarkdown to format files approximately doubling in size

	Lines	raven	Dell Laptop	penguin	HP Pavilion	Raspberry Pi 3	Acer T180	Raspberry Pi 1
N	35	0.0s	0.0s	0.3s	0.0s	0.5s	0.0s	1.5s
O	69	0.0s	0.1s	0.0s	0.1s	0.8s	0.0s	1.9s
P	140	0.1s	0.2s	0.1s	0.1s	1.4s	0.1s	3.0s
Q	284	0.1s	0.2s	0.1s	0.3s	1.3s	0.2s	4.6s
R	566	0.2s	0.4s	0.2s	0.5s	2.0s	0.5s	7.0s
S	1130	0.5s	0.9s	0.4s	0.8s	6.3s	1.7s	11.3s
T	2268	1.6s	2.6s	1.9s	2.8s	18.7s	0.8s	34.2s
U	4517	3.1s	5.2s	4.6s	6.8s	31.1s	8.0s	1m19.5s
V	9017	12.7s	23.1s	42.2s	54.3s	2m30.0s	1m38.3s	9m47.1s
W	17914	42.2s	1m08.5s	1m45.7s	1m59.9s	7m04.9s	13m46.6s	21m13.0s
Y	35629	7m08.3s	10m50.8s	20m33.7s	21m17.0s	60m41.8s	1h54m43.4s	3h22m25.5s

Here’s the script I used to generate the files and run MultiMarkdown:

#!/bin/bash
LINE_COUNT=$(wc -l Notebook.b.2016.text | cut -f1 -d' ')
ALPHABET="YWVUTSRQPONMLKHIJGFEDCBA"
TEMP_FN="/r/Notebook.2016.tmp"

# REMINDER: Remove all but the first line of the Table of Contents from
# Notebook.b.2016.text

LINE_COUNT=$((LINE_COUNT * 2))
N=-1
while [ $LINE_COUNT -gt 50 ]
do
    N=$((N+1)); L=${ALPHABET:$N:1}
    TO_FN="/r/Notebook.2016.$L.text"

    LINE_COUNT=$((LINE_COUNT/2))
    echo "$N $TO_FN ($LINE_COUNT lines)"
    head -n $LINE_COUNT Notebook.b.2016.text >$TEMP_FN

    genTOC.pl $TEMP_FN &>/dev/null
    TOC_FIRST_LINE=$(grep -n '^Table of Contents' $TEMP_FN |
        tail -n 1 | sed 's/\([0-9]\+\).*/\1/')
    TOC_FIRST_LINE=$((TOC_FIRST_LINE + 2))
    TOC_LAST_LINE=$(grep -n '^ \+[0-9]\+ \+[A-Z]' $TEMP_FN |
        tail -n 1 | sed 's/\([0-9]\+\).*/\1/')
    sed "${TOC_FIRST_LINE},${TOC_LAST_LINE}s/^/  /" $TEMP_FN >$TO_FN
done
rm -f $TEMP_FN

export TIMEFORMAT='real %1lR (%1R seconds)'
cd /r
for FILE in Notebook.2016.[A-Z].text
do
    echo -e "_____\n"
    echo $FILE
    OUT_FN=${FILE/.2016/}; OUT_FN=${OUT_FN/.text/.html}
    time MultiMarkdown.pl $FILE >$OUT_FN
done

Elapsed time for MultiMarkdown to format a large file

About

Recent Posts

Archives

About

Recent Posts

Archives

Tags