(Reverse) Literate Programming on Jekyll/Github
In this post I show two ways to document a piece of code on Jekyll, the static site generator powering GitHub Pages (including this blog).
The Story Behind the Story
I tried to explain how this piece of code found that (1+2)!! + 3!^4 - 5 = 2011
, but C++ is not famous for being concise and transparent. So I tried to give the general idea, and gave the full source in one block, but it was not very satisfying. Full of remorse, I’m now writing this post about how I should have written the previous post, so I can sleep at night again.
I’ll get some inspiration from literate programming, which I’ll quickly present. Then I’ll show how it could have been used on the previous post. This method won’t work for any language, so I will present another one, based on a pending addition to Jekyll.
Literate Programming
Let me copy-paste the definition from the Wikipedia article about literate programming for you:
The literate programming paradigm, as conceived by Knuth, represents a move away from writing programs in the manner and order imposed by the computer, and instead enables programmers to develop programs in the order demanded by the logic and flow of their thoughts.
Basically, the documentation and the code is written in the same file, but in the order of the natural speech, not in the order expected by the compiler/interpreter, unlike javadoc-alike systems. A preprocessor will extract the code, and reassemble it in the order expected by the compiler/interpreter (hereafter: the processor).
In order to show to the processor that we are serious about not caring about him, literate programming can be done with latex, or a word processor. Yes, I spent an internship coding (Spec#) within Word, and yes, I actually enjoyed it. More recently I discovered Sweave which makes statistical exploration with R enjoyable (for a programmer).
Of course, the rule that any idea has already been implemented (better) and published also applies to literate programming with Markdown (Markdown being the notation used to format this article). In this case, a script comments out all the text that is not marked as code. I guess the author calls this “lightweight literate programming” because the code still has to be written in the order expected by the processor.
Manual Literate Programming On GitHub’s Jekyll
The idea here is to use the #include
directive of the C/C++ preprocessor along with the {% include %}
tag of the Liquid Extensions of Jekyll to build the “interconnected ‘webs’ of macros” of Knuth’s vision of literate programming. The extraction step is manual: each macro will be written in a separate file.
For example, the core search function seems complicated at first:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
Actually it is very simple if we extract the code related to printing progress:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
We can detail separately the formula of the total of candidates, by showing only the content of print-candidate-total.inc
:
1 2 3 4 5 |
|
And we don’t even need to show the content of print-progress.inc
, because it is just uninteresting print instructions.
The Markdown document becomes something like:
1 2 3 4 5 6 7 8 9 10 |
|
This Markdown document is describing how to write a Markdown document describing other files by including them… Way too meta for me. If you too need to lower down the abstraction, you can have a look at the files I’m talking about:
- explanation.markdown is the article describing how the program works
- search.cpp and print-candidate-total.inc are pieces of code explained in the article.
- print-progress.inc, a-lot-of-code-before.inc, and a-lot-of-code-after.inc is the rest of the code.
The whole point of all this is that the code remains compilable (e.g. with g++ -o search search.cpp
) without any transformation, just like the monolithic main.cpp, but the explanation can be a lot more clear. It also means that there is no need to propagate modifications into the description should the code changes or vice versa.
Reverse Literate Programming on Github’s Jekyll
Now the problem is that a lot of programming languages are too elegant to dirty their parsing hands with the include of a preprocessor. Of course, most languages allow the import of some notion of modules, but it’s just the processor trying to dictate us the order in which to describe a program again.
Without heavy weaponry like a real literate programming preprocessor, we will have to call a truce. We’ll still write the code in the order of the processor, but explain it in a human order. Anyway, no one actually believed that I wrote the code of previous section in the presented order.
Instead of writing a story that will be transformed into a program, we will write a program that will be reassembled into a story – hence the reverse literate programming. The feature needed here is to be able to extract parts of the source code. Jekyll’s {% include %}
can only extract the whole, but if this patch makes it through, we will be able to use blocks like:
1 2 3 4 |
|
Where:
source.file
is the source to include (similarly to the{% include %}
tag already in Jekyll),after
is a tag to specify the line after which the content of the file should be included,before
is a tag to specify the first line from which the content of the file will be ignored again.
Example
For example, if I were to document the code of patch, instead of including the whole file containing:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
|
I could simply write a markdown file containing:
1 2 3 4 5 6 |
|
Which would give:
Blablabla, I just copy-pasted include.rb
and added this processing to the
content read from the file:
1 2 3 4 5 6 |
|
Alternatives
I can think of two other ways to specify the part of the file to be included. First, specifying the numbers of the first and last line to keep would be the easiest, but it would mean that the documentation has to be updated each time the file changes.
Second, the a from
/until
pair could specify border lines similarly to the after
/before
pair, except that the content would include these two lines. The problem here is that premature truncations could happen if we want to stop the inclusion on lines such as }
or end
.
In the end, the only assumption in the after
/before
pair is that one can add lines only for the sake of documentation, but I have not seen a programming langage which does not allow comments, so it is always possible to delimit regions.