Background

The production of documents (web pages, man pages, pdf, latex, etc.) can be broken into two major steps: creating and publishing. In addition to selecting tools for document creation and publishing, we need to consider what source language to use for writing our documents and for what purpose are we writing each document. Figure1 shows a flow through the two steps of creating and publishing documentation.


Figure 1. Flow diagram for creating and publishing documentation. 

There are two primary styles of source code for the document creations process: programming source code such as C++ and Fortran, and documentation source code such as Markdown and reStructuredText (Fig. 1). The middle oval in Figure 1 lists a few types of documents, and the lower ovals show examples of the formats documents can be presented to the reader.

As I progressed through the investigation, it became clear to me that the tool that we use to create documentation files can be independent from the tool/site that we use to publish documentation. For example, Sphinx (creation) is typically used with ReadTheDocs (publishing) since ReadTheDocs can automatically build from Sphinx input files. However, ReadTheDocs can also handle HTML for input, meaning that the HTML could be created by any process. This is helpful because a desire to have a particular tool doesn't tie us into other tool (e.g, if we like Sphinx a lot, we aren't necessarily tied into ReadTheDocs).

It seems that we need to make the following choices:

  • What documentation source language should we use?
  • What documentation creation tool should we use?
  • What documentation publishing tool/site should we use?

Investigation Results

Source Language

Since we are already committed to C, C++ and Fortran, these are the programming source code that we need to support. I looked at two popular documentation source code languages: Markdown and reStructuredText. To see a comparison of many documentation source languages, click here. Markdown and reStructuredText fall into a category called "Lightweight Markup Languages". HTML is a markup language and it is so bogged down with directives that you can hardly see the text you are attempting to display. Lightweight markup languages came about to fix that issue by being much less verbose with directives. Common features in lightweight markup languages are directives to indicate headings of different levels, lists (bulleted, numbered, etc.), links to webpages, image insertion and character emphasis (bold, italic, underline, etc.).

Markdown

Markdown's strength appears to be its simplicity. It is very easy to learn and use, and it reduces documentation creation almost to just writing text.


To see the whole sample, click here.

The primary drawback of Markdown is that there is no standard, and as a result there are many variants. For example, some variants support embedded LaTex commands and others don't. 

Here is a sample of the Jupyter Notebooks variant of Markdown:



which produces (in Jupyter Notebooks):


reStructuredText

reStructuredText is a bit more complicated than Markdown, but it does have a standard and reStructuredText includes an extension capability for when there is a need to add functionality. It was hard to find any strong disadvantages to reStructuredText, but it is more difficult to learn than Markdown. To see a nice cheat sheet (summary) of the reStructuredText syntax, click here.

Here is the a sample of reStructuredText for a LaTex equation:



and the resulting HTML rendering:


Pandoc

Pandoc is a useful tool that converts between many documentation languages (markdown, reStructuredText, HTML, LaTex, man pages, ...). For details, click here. This could prove to be a useful tool for gaining access to many documentation creation and publishing tools. One thing that pandoc does not provide is a common theme (look and feel), indexing, cross-referncing, or table of contents for a website or a pdf manual.

Summary

In my experience, Markdown was noticeably easier to learn than reStructuredText but reStructuredText is still relatively easy to learn. In my opinion, reStructuredText got the right balance between language simplicity and features (the indexing, table of contents and LaTex support are especially nice).

LanguageProsCons

Markdown

  • Simple
  • Easy to learn
  • Limited features
  • No standard (many variants)
reStructuredText
  • A standard exists
  • Many built-in features
  • Has extension capability for adding features
  • More difficult to learn (compared to Markdown)

I recommend going with reStructuredText as our documentation source code. In the big picture, it's still an easy language to learn and with tools like pandoc and Sphinx we can get to many formats (eg, Markdown, HTML) that are compatible with other tools.

Documentation Creation Tool

There are many choices here and it seems that the key is to stick to tools that use standard input languages. That way if we decide to switch tools, minimal effort is required to make our documentation source compatible with the new tool. For a comparison of different documentation creation tools, click here

Programming Source Code

Doxygen

Doxygen can read C, C++, Fortran, Python and others, and can create HTML, LaTex, man pages and others. The user embeds directives and text inside the programming language's comments to specify what goes into the output documentation. Doxygen has its own variety of Markdown. Doxygen can create numerous documentation types including class inheritance diagrams, code call graphs, directory structure diagrams, man pages and manuals.

Following is a screenshot of the top level page for Eigen documentation.


Natural Docs

Natural Docs is much more specialized than doxygen. It can read a number of languages including C, C++, Fortran and Python, but can only output inheritance diagrams for C++ out of the given list. Natural Docs can automatically glean inheritance diagrams only from C# (C++ requires help from the user via comments in the classes), but they claim that more languages will soon be added to the auto-generated inheritance diagram capability. HTML is the only output documentation type. 

The following is a screenshot of the Natural Docs documentation website.



Documentation Source Code

Sphinx

Sphinx (Python based) reads reStructuredText and can create HTML, LaTex, pdf, and man pages. Sphinx supports themes which are templates that format the output with a consistent look and feel.

The following is the HTML default theme when running Sphix standalone on my iMac.

The theme includes the coloring, background shading for code blocks, the sidebar with the links to associated pages, the search box, etc.


The exact same input creates the following on ReadTheDocs (ReadTheDocs runs Sphinx, but has its own "default" theme).


Jekyll

Jekyll (Ruby based) is actually a tool for creating and maintaing a blog. It's a little bit overkill if we are just interested in creating a documentation web site, but perhaps the blog could be handy. Jekyll can read Markdown (it's preferred flavor is Kramdown), Textile, HTML and Liquid. Jekyll can output HTML and Markdown (perhaps more, but I haven't seen evidence of other formats yet). Jekyll supports themes, but I ran into difficulties when I tried to use something besides the default theme. Jekyll is a popular tool and it's hard to imagine that it doesn't work well, but it does seem to have a steeper learning curve compared to Sphinx.

The default theme in Jekyll looks like a man page:

Summary

Doxygen, Natural Docs, Sphinx and Jekyll are all free.

I did not actually try to install and run Doxygen nor Natural Docs. Doxygen has been around for a long time and Natural Docs has a severly limited feature set. For this investigation, I've had to go to the web and read others' evaluations of the tools (ie, this information might not be a reliable as trying out the tool for myself).

I was able to try Sphinx and Jekyll on my iMac. Of these two Sphinx was much easier to install and run - it pretty much worked right out of the box. Jekyll was easy to install and difficult to use. I think this was due to the fact that I was trying to use a small subset of its capabilities (set of web pages versus an entire blog site) in a standalone fashion. However, its configuration and work flow is a bit cryptic.

ToolProsCons
Doxygen
  • Extensive features
  • Lonstanding tool
  • Can create many documentation types
    • E.g., HTML, man page, manuals
  • Usage can be complicated
    • Probably due to high flexibility in the tool
Natural Docs
  • Simpler to use (compared to Doxygen)
  • Can only create HTML output
  • Limited capability
    • E.g., auto generation of inheritance diagrams only from C#
Sphinx
  • Easy to install and use
  • Can output HTML, man page, manual
    • HTML is indexed and includes a table of contents
  • Popular tool with widespread usage
    • ReadTheDocs uses Sphinx
  • If a feature is missing, you need to add it using the extension capability
    • Not sure how difficult this is (might be easy to do, or easy to find the extension on the web)
Jekyll
  • Can do more than create a web site
    • Manage blog
  • Popular tool with widespread usage
    • GitHub Pages uses Jekyll
  • Difficult to install and configure
  • Can only (to my knowledge) output HTML and Markdown

I recommend going with Sphinx. It's easy to install and use, it creates numerous useful outputs, and it is paired with reStructuredText (of which I also recommended). Sphinx does a great job creating HTML with handy features like a table of contents.


Documentation Publishing Tool

I looked at two popular tools for publising documentation: ReadTheDocs and GitHub Pages. Both of these offer "continuous documentation" where the web sites are automatically rebuilt when you issue a commit into your repository.

ReadTheDocs

ReadTheDocs is run with Sphinx and its preferred input is reStructuredText. ReadTheDocs can handle any input format that Sphinx can which at this point I believe is reStructuredText and Markdown. ReadTheDocs works with multiple repositories (GitHub, BitBucket and GitLab). ReadTheDocs has a free service (readthedocs.org) which only publishes in a public manner, and a business service (readthedocs.com) which is not free but can publish docuementation in a private manner. The business ReadTheDocs services pricing:

  • Basic service
    • $50/month ($600/year)
    • Uses ReadTheDocs themes
    • Uses ReadTheDocs domain for publishing
  • Advances Service
    • $150/month ($1800/year)
    • Can customize themes to fit company's brand
    • Uses company's domain for publishing
  • Both Services include
    • 50 users
    • 50,000 monthly pageviews
    • Unlimited projects (sites)
    • Team managment
      • Logins with permission to edit ("Admins")
      • Logins with read-only permission (Eg, want to allow users to view private docs, but not be able to modify them)

Free version, readthedocs.org

I tried readthedocs.org with my own public GitHub repository (srherbener/ufo_doc_test). This worked well and getting the document published was an easy task. You get the document to build locally using Sphinx, open a free account on readthedocs.org, connect ReadTheDocs to your github account, and the document automatically builds and publishes. The link to your document is: http://github_project_name.readthedocs.io/en/latest/., which is public.

Upon the creation of a new account, the following page appears after you log in.


Click on the "Import a Project" button, and a list of your public GitHub accounts show up:

Click on the "+" sign to the right of the repository you want to build and hit the "Next" button on the page that comes up. This fires off the upload, build (runs Sphinx on the uploaded copy) and publish steps. You get in a queue for the build so it can complete almost instantly to taking a couple minutes. Here's what it looks like while the build is taking place:

The above note "Webhook activated" means that the mechanism that automatically fires off builds when a commit is pushed into GitHub is working. When the build finishes, hit the "View Docs" button and you get:

Note the URL with the GitHub project name embedded within: https://ufo_doc_test.readthedocs.io/en/latest/


Business version, free trial account, readthedocs.com

When I tried readthedocs.com, I ran into an issue where ReadTheDocs would not connect to a private GitHub repository. I made a fork of the UCAR/ufo repository so I could fiddle around with settings. I got help from ReadTheDocs support, but the instructions I was given to form the connection did not work. I sent what I did and the resulting error message back to ReadTheDocs support, but I have not heard from them yet (as of 1/30/18). The only way I could get this version to connect to GitHub was to use my own public repository. The link to the document is: https://github_project_name.readthedocs-hosted.com/en/latest/, which is private.

The process of connecting and building on the Business version is the same. The issue I had was that only my public GitHub repsitories would show up in the list (after hitting the "Import a Project" button). The ReadTheDocs support guy confirmed that the private repositories should also show up in this list. The look of the readthedocs.com web site is different, which is handy to remind you that you are logged into the the Businesss version.

The process is the same, but the look is a little different (enough so to remind you that you are in the Business version).


GitHubPages

GitHub Pages is run with Jekyll and its perferred input is Markdown, but it will also handle HTML. GitHub Pages only works with GitHub. GitHub Pages will always publish documentation in a public manner, even when the GitHub repository is private. GitHub Pages is free and it has the following restrictions:

  • Per site limits:
    • 1GB limit on source
    • 1GB limit on published on the site
    • bandwidth for the site <= 100 GB per month
    • <= 10 builds per hour

GitHub pages is enabled through the settings on your GitHub repository. When you are importing HTML, you get three choices for where to place the HTML files in the repository:

  • In the master branch, top level directory
  • In the master branch, /docs directory 
  • In the "gh-pages" branch, top level directory

I all three cases, an "index.html" file (top page of the web site) needs to exist for GitHub pages to find your web pages. There exists a path to go from reStructuredText input, build with Sphinx, and import the HTML into GitHub pages. It's a bit complicated getting this set up but once you've set it up it is easy to use. The main idea of this path is to use the gh-pages branch option, and to utliize a script that copies the Sphinx output HTML into the top level of the gh-pages branch and push those commits into GitHub.

In your repository, click on the "Settings" button (the one with the gear for the repository settings, not the account settings).

Scroll down the settings page until you get the to the GitHub section. Then select one of the options for where to pick up the HTML:

Summary

Both GitHub Pages and ReadTheDocs are free. However, the free version of ReadTheDocs can only process public GitHub repositories. There is a Business version of ReadTheDocs that can process private GitHub repositories.

ToolProsCons
GitHub Pages
  • Built into GitHub repository
  • Free
  • A bit difficult to setup for reading externally created HTML
    • E.g., Doxygen and Sphinx output HTML
  • A bit restrictive where you can store your document source (or imported HTML)
    • Causes some cluttering if you want to keep everything in the master branch
    • The gh-pages branch solution is a bit awkward since you have to keep gh-pages and master branches in sync
ReadTheDocs
  • Very easy to setup and use
  • Cleaner build flow
    • Designed to use the reStructuredText/Sphinx path
  • The Business free trail vesion would not connect to private GitHub repositories
    • This is supposed to work, so we should be able to get this resolved without too much trouble

We decided to go with ReadTheDocs Basic Business version. We are assuming that either the actual Business version will work with private repositories right off the bat, and if not we will be able to get the connect issue quickly resolved (others would surely have complained by now if this version cannot connect to private repsitories). ReadTheDocs is designed to use the reStructuredText/Sphinx path, therefore this will be a much cleaner path for creating and publishing documentation.


  • No labels