New Python Markdown extension cell_row_span

Contents

Background

I wrote this extension when I discovered the existing spantables extension was broken on Python Markdown version 3, and also didn’t work under Python 2. I submitted a bug to the author, and this was his reply:

This extension is a modified version of the original tables extension. I see that the tables extension was heavily modified. Does the latest tables extension work for you? If so, then I might solve this problem by taking that version and re-adding the span functionality.

The phrase “I might solve this problem” is ambiguous: does it mean he will do the work, or is it a short hand for “If I were you, I would solve this problem …”?

In addition, as the author noted, the spantables extension is a modified version of the original tables source. While it worked, it’s not really the best way to go about it. I solved the problem by subclassing the tables extension. I let the TableProcessor do its job–creating the ElementTree structure for the table–then my code runs to modify that structure to span cells and rows according to the user’s markdown text.

Here’s the documentation for the extension.

Summary

Adds spanning for rows and cells in tables.

Syntax

Example:

  | Column 1                | Col 2 | Big row span   |
  |:-----------------------:|-------| -------------- |
  | r1_c1 spans two cols           || One large cell |
  | r2_c1 spans two rows    | r2_c2 |                |
  |_^                      _| r3_c2 |                |
  |    ______          | r4_c2 |_              _|

The example renders as:

      .--------------------------------------------------.
      |        Column 1         | Col 2 |  Big row span  |
      |---------------------------------+----------------|
      |      r1_c1 spans two cols       |                |
      |---------------------------------|                |
      |  r2_c1 spans two rows   | r2_c2 |                |
      |                         |-------| One large cell |
      |                         | r3_c2 |                |
      |-------------------------+-------|                |
      |          ____           | r4_c2 |                |
      `--------------------------------------------------'

~~~

To span cells across multiple columns, end them with two or more consecutive vertical bars. Cells to the left will be merged together, as many cells are there are bars. In the example above, there are two bars at the end of cell 2 on row 1, so the two cells to the left of it (numbers 1 and 2) are merged.

To span cells across rows, fill the cell on the last row with at least two underscores, one at the start and the other at the end of its content, and no other characters than spaces, underscores, ^ or =. This is referred to as the marker. The cell with the marker and all the empty cells above it to the first non-empty cell will be made into a single cell, with the content of the non-empty cell. See column 3 (“Big row span”) in the example.

By default the contents are vertically aligned in the middle of the cell. To align to the top, include at least one ^ character in the marker between the two underscores; for example, |_^^^_| or simply |_^ _|. See row 2 in column 1 of the example, which is merged with row 3 and aligned at the top. To align to the bottom, use at least one = character between the underscores; for example, |_ = _|. Including both ^ and = in a marker raises a ValueError exception.

Note: If this extension finds a cell with at least two underscores and no other characters other than spaces, ^ or =, it assumes it’s a row span marker and attempts to process it. If you need a cell that looks like a marker (generally one with only underscores in it), add the text  as well—this extension won’t process it as a row span marker and Markdown will change the  to a space.

Bug in Markdown 2.6

Python Markdown 2.6 does not process the following table correctly:

  | Column 1 | Column 2 | Column 3 | Column 4 |
  | -------- | -------- | -------- | -------- |
  | r1,c1    | r1,c2    | r1,c3    | r1,c4    |
  | r2,c1              || r2,c3    | r2,c4    |

The table should be rendered as follows:

      .-------------------------------------------.
      | Column 1 | Column 2 | Column 3 | Column 4 |
      |----------+----------+----------+----------|
      | r1,c1    | r1,c2    | r1,c3    | r1,c4    |
      |----------+----------+----------+----------|
      | r2,c1               | r2,c3    | r2,c4    |
      `-------------------------------------------'

~~~

Instead it comes out as:

      .-------------------------------------------.
      | Column 1 | Column 2 | Column 3 | Column 4 |
      |----------+----------+----------+----------|
      | r1,c1    | r1,c2    | r1,c3    | r1,c4    |
      |----------+----------+----------+----------|
      | r2,c1               | r2,c4    |          |
      `-------------------------------------------'

~~~

The bug is in the tables extension, not this one. If you’re having problems getting a table with column spans to work correctly in Markdown 2.6, try replacing || with |~~|, as follows:

  | Column 1 | Column 2 | Column 3 | Column 4 |
  | -------- | -------- | -------- | -------- |
  | r1,c1    | r1,c2    | r1,c3    | r1,c4    |
  | r2,c1            |~~| r2,c3    | r2,c4    |

The table extension processes the above correctly, and this extension recognizes a cell containing only ~~ as an empty cell. (I chose ~~ because I can’t think of a reason anyone would use that in a cell.) If you want to use something different, change ~~ to another value in the following line in the code:

RE_empty_cell = re.compile(r'\s*(~~)?\s*$')

Keep in mind that many characters have special meaning in regular expressions. If you use any of the following characters in the expression, preceed them with a backslash (“\“) to avoid problems:

. + * | ? $ ( ) [ ] { }

Usage

See Extensions for general extension usage. Use cell_row_span as the name of the extension. You must include the tables extension before this one, or this extension will not be run.

This extension does not accept any special configuration options.

See https://python-markdown.github.io/extensions/tables/ for documentation on the tables extension.

License: BSD

Python code

from markdown.extensions import Extension
from markdown.extensions.tables import TableProcessor
from markdown.util import etree
import re


class CellRowSpanExtension(Extension):
    """ Table Cell and Row Span extension """

    def extendMarkdown(self, md, md_globals):
        """ Replace the TableProcessor with an instance of CellRowSpanBlockProcessor """
        if 'table' in md.parser.blockprocessors:
            md.parser.blockprocessors['table'] = CellRowSpanBlockProcessor(md.parser)


class CellRowSpanBlockProcessor(TableProcessor):
    """ Subclass the TableProcessor """

    table_count = 0
    RE_adjacent_bars = re.compile(r'\|(~~)?\|')
    RE_remove_lead_pipe = re.compile(r'^ *\|')  # ... Colonel Mustard in the Library? ;)
    RE_bar_to_bar = re.compile(r'(.*?)\|')
    RE_row_span_marker = re.compile(r'^_[_^= ]*_$')
    RE_valign_top = re.compile(r'\^')
    RE_valign_bottom = re.compile(r'=')
    RE_empty_cell = re.compile(r'\s*(~~)?\s*$')

    """ No test() method needed; the superclass provides it """

    def _update_colspan_attrib(self, text, tr_index, tr, td_remove):
        """ Update 'colspan' attributes in 'td' entries """
        text = self.RE_remove_lead_pipe.sub('', text)   # Remove leading '|' from text
        td_index = 0
        td_last_active_index = 0
        for m in self.RE_bar_to_bar.finditer(text):
            if len(m.group(1)) == 0 or m.group(1) == '~~':  # We found an adjacent cell
                try:
                    td = tr[td_last_active_index]           #  Update 'colspan' on previous cell
                except IndexError:
                    row_content = ''
                    for i in range(len(tr)):
                        c = tr[i].text
                        row_content += "  Cell %i: %s\n" % (i+1, c if c else 'Empty')
                    raise IndexError(
                        'Cannot merge cell beyond end of row '
                        "(one too many '|' characters in row?)\n"
                        'Check row %i of table %i in your document. Row contents:\n%s' % (
                            tr_index+1, self.table_count, row_content
                        )
                    )
                span = 1
                if 'colspan' in td.keys():
                    span = int(td.get('colspan'))
                td.set('colspan', str(span+1))
                td_remove.append( (tr, tr[td_index]) )
            else:
                td_last_active_index = td_index
            td_index += 1

    def _update_rowspan_attrib(self, tbody, tr_index, td_index, td_remove):
        """ Update 'rowspan' attributes in 'td' entries """
        # Look for '^' (vertical align top) or '=' (bottom) in the marker
        marker = tbody[tr_index][td_index].text
        v_align = 'middle'
        if self.RE_valign_top.search(marker):
            v_align = 'top'
        if self.RE_valign_bottom.search(marker):
            if v_align == 'top':
                raise ValueError(
                    'Cannot use both ^ (top) and = (bottom) codes in a row span '
                    'marker\nCheck row %i, column %i in table %i in your '
                    'document' % (tr_index+1, td_index+1, self.table_count)
                )
            v_align='bottom'

        # Starting from the current row, go up the rows and delete columns
        # until we hit a non-empty column (or the start of the table)
        row_num = tr_index
        while (row_num >= 0):
            td = tbody[row_num][td_index]
            if row_num == tr_index or self.RE_empty_cell.match(td.text):
                td_remove.append( (tbody[row_num], td) )
            else:
                break
            row_num -= 1

        # Update the colspan and valign attributes on the row
        td = tbody[row_num][td_index] if row_num >=0 else tbody[0][td_index]
        td.set('rowspan', str(tr_index-row_num+1))
        td.set('valign', v_align)


    def run(self, parent, blocks):
        self.table_count += 1

        # Save the block for later inspection
        rows = blocks[0].split('\n')

        # Let the TableProcessor do its work
        super(CellRowSpanBlockProcessor, self).run(parent, blocks)

        # Scan the original table text for adjacent columns and spanned rows
        td_remove = []  # List of td objects to be removed
        tr_index = 0
        tbody = parent[-1].find('tbody')
        for tr in tbody:
            # Check for adjacent columns
            if self.RE_adjacent_bars.search(rows[tr_index+2]):
                self._update_colspan_attrib(rows[tr_index+2], tr_index, tr, td_remove)
            # Check for spanned rows
            td_index = 0
            for td in tr:
                if self.RE_row_span_marker.match(td.text):
                    self._update_rowspan_attrib(tbody, tr_index, td_index, td_remove)
                td_index += 1
            tr_index += 1

        # Remove unneeded td elements
        for tr, td in td_remove:
            if td in tr:
                tr.remove(td)


def makeExtension(*args, **kwargs):
    return CellRowSpanExtension(*args, **kwargs)