We're using a program called pandoc to convert HTML files in the repository into reStructuredText format. Pandoc can be installed via the command line.

After much trial and error, we found a relatively easy way to convert files within the repository.

The procedure involves several steps, each of which has a specific purpose, so we describe it here.

Rename the HTML file and commit the change

Git's diff algorithm has difficulty tracking a file when said file has been renamed and had its internal markup drastically altered. Thus, before editing the file, we rename it and commit the rename, with a commit message describing the file history. As an example, we'll work on the POP model_mod.html.

$ cd ${DARTROOT}/models/cm1/
$ git mv model_mod.html README.rst
$ git commit
Rename cm1 model_mod.html to README.rst
git mv rename of /models/cm1/model_mod.html to /models/cm1/README.rst
This is a preliminary step to include a README.rst in each model directory.
Use git log --follow to get complete history

Use pandoc to convert two different copies of the file

The default behavior of pandoc typically converts tables into a nearly unusable format, because there are too few columns to adequately render a table.

Here are a few lines of the default table created by pandoc:

As one can see, the nested table in the lower right is too narrow to print model_variables properly.

We can instruct pandoc to convert the file to have more columns, but the reStructuredText specification says that files should have line lengths of 80 characters, with the exception of tables. It doesn't seem possible to instruct pandoc to convert text to a certain width and tables to a different width, so the fastest procedure seems to be to convert two different files and splice them together manually.

$ pandoc --columns=120 -f html -t rst README.rst -o README_120.rst
$ pandoc --columns=80 -f html -t rst README.rst -o README.rst

The above commands create two files, README_120.rst, which has properly formatted tables:

..and README.rst, which has the proper line lengths for text blocks according to the reStructuredText specification.

Note well:

For most HTML files, 120 columns is sufficient to properly render a table. For the cm1 model_mod.html, however, I found it necessary to increase the columns to 180 to get the right-most table cell wide enough. I deleted several columns from the left-most and center table cells to generate the above figure. 

Combining these files manually is non-trivial but is much easier than starting from a single file and fixing the tables by hand (trust me!).

Commit the reformatted README.rst

After splicing the files together we remove README_120.rst and commit README.rst.

$ rm README_120.rst
$ git add README.rst

$ git commit -m "Update cm1 README.rst"


  • No labels