Introduction to Python Markdown

Brian

2018-10-16 21:46

Python Markdown is a very nice implementation. It’s not blazingly fast, first because Python itself isn’t very fast (compared to Perl) and secondly because it takes a multiphase approach. With no extensions enabled, there are 17 steps (called processors) in four phases, and each processor examines the entire document. Extensions may add processors to one or more phases; for example, the footnotes extension has processors that run in all four phases.

In the list below there are both standard processors identified by their tag in bold, and extensions, which are italicized. Currently Python Markdown ships with 16 standard extensions, plus an additional one named extra, a special-purpose extension that imports seven of the most useful ones. (In the list below these are noted with (Extra).)

Phase 1: Pre-processors munge the input text:

normalize_whitespace: normalize whitespace for consistant parsing
gentoc_remove: remove a Table of Contents section generated by genTOC (mine)
meta: process a metadata block
auc_headers: Automatic Underline Headers preprocessor (mine)
fenced_code: process and pygmentize code fenced code blocks (Extra)
html_block (if safeMode is not ‘escape’)
footnotes: process footnotes (Extra)
abbr: process abbreviations (Extra)
reference: remove reference definitions from text and store for later use

Phase 2: Block parser processors parse the high-level structural elements of the pre-processed text into an ElementTree:

admonition: Adds rST-style admonitions
markdown_in_html: Process markdown inside <html> blocks (Extra)
empty: process blocks that are empty or start with an empty line
indent: process children of list items
def_list (indent): process indentations for definition lists (Extra)
auc_headers: process Automatic Underline Headers (mine)
code: process code blocks
tables: process Markdown tables (Extra)
hashheader: process hash headers (ATX headers)
setextheader: process Setext-style headers
hr: Process horizontal rules (HTML <hr>)
sane_lists: sanely process ordered and unordered lists
def_list: process definition lists (Extra)
olist: process ordered list blocks
ulist: process umordered list blocks
quote: process block-quote blocks
paragraph: process Paragraph blocks

Phase 3: Tree processors are run against the ElementTree:

footnotes: process footnotes (Extra)
autoxref: Link [inline references] or [inline references] to headers (mine)
codehilite: Pygmentize code blocks
inline: apply inlines such as italics, bold, code, etc
- backtick
- escape
- reference
- link
- image_link
- image_reference
- short_reference
- autolink
- automail
- linebreak
- html
- entity
- wikilinks (extension)
- not_strong
- em_strong
- strong_em
- strong
- emphasis
- emphasis2
- smart_strong (extension)
- smarty (extension)
- nl2br (extension)
attrlist: add attributes from Markdown to HTML objects (Extra)
inline: called a second time?
toc: add IDs to headers and create a Table of Contents
prettify: add linebreaks to the HTML document

Phase 4: Post-processors are run against the text after the ElementTree has been serialized into text:

raw_html: Restore raw html to the document
h1h2_uplinks: Add “TOC” and “Top” links to H1 and H2 headers
amp_substitute: Restore valid entities
footnotes: footnotes post-processor (Extra)
unescape: Restore escaped chars

Introduction to Python Markdown

About

Recent Posts

Archives

About

Recent Posts

Archives

Tags