Literate Perl is a preprocessor for perl programs that let you write in a literate style.
In the literate style, documentation is forefront and embedded within it is the actual code to run. to create a runnable code block within the documentation, which follows standard POD conventions, use =begin code and =end code directives:
=begin code my $foo = 'bar'; $self→populate( 'a' ⇒ 'b', … ); =end code
To create a code example, not indended to be run, use the normal POD verbatim style.
One of the signatures of true Literate Programming is to be able to rearrange code so that it makes sense to the human reader first, and can be manipulated into the proper order for the compiler later. A common convention for deferring code definitions is to embed a descriptive anchor surrounded by doubled angle brackets on its own line.
our $code_anchor = qr/<<(.+?)>>/; our $code_anchor_line = qr/^(\s*)$code_anchor\s*$/;
for example:
sub its_complicated { my( $self, %params ) = @_; «the messy bit» … }
The descriptive anchor is then used later as the text on a POD head directive and any code under that heading is stitched back into the code at the anchor point. When the preprocessor runs it generates two files:
Using the preprocessor on all your code files you can set a consistent policy of pragma use across your whole project. calling the use_strict or use_warnings method with a boolean value lets you tell the preprocessor whether to inject those pragmas at the head of every processed file.
BEGIN { foreach my $pragma (qw( use_strict use_warnings )) { no strict 'refs'; *{ 'Pod::Literate::' ⋅ $pragma } = sub { my ( $self, $bool ) = @_; return $self→{"_$pragma"} = $bool; }; } }
the use_css method can be used to tell the preprocesser what stylesheet to include in the header of the html document generated.
sub use_css { my ( $parser, $css ) = @_; $parser→{_css} = $css; }
The preprocessor is a subclass of Pod::Parser and as such does most of its parsing by overriding the handlers suggested by the documentation.
At the beginning of the document we simply reset our tracking variables and initialize our html code highlighter for any code we may encounter.
sub begin_input { my ($parser) = @_; $parser→{_code_sections} = []; $parser→{_last_paragraph} = ''; delete $parser→{$_} foreach (qw( _current_code _code _current_code_name _doc )); $parser→{_highlighter} ||= PPI::HTML→new(); }
at the end of the parsing cycle we write out the code and document files. The method that creates the code file behaves a bit differently based on whether we are working on a script or on a module, so we pass along the input file's extension, less the l at the front which indicated it was a literate perl file in the first place.
sub end_input { my ($parser) = @_; my ( $filename, $extension ) = ( $parser→input_file =~ m/ (.+?) \. ([^.]+) $ /x ); $extension =~ s/^l//; write_file( "$filename.$extension", $parser→dump_code($extension) ); write_file( "$filename.html", $parser→dump_docs() ); }
To process the files we override methods from our base class Pod::Parser to handle each type of POD feature.
The command paragraphs we are particularly interested in are begin code and end code, which indicate where our runnable code lives. Since Pod::Parser is a streaming parser we need to keep track of when we enter and exit these code sections. We accumulate the code until we reach the end code command. At this point we can add it to the documentation output, but we defer generating the output for the code file as we need to re-integrate code snippets in place of the code anchors, all of which we won't have collected until the end of the document.
sub command { my ( $parser, $command, $paragraph, $line_num ) = @_; $paragraph =~ s/\s+$//; $parser→close_verbatim(); if ( $command eq 'begin' and $paragraph eq 'code' ) { $parser→{_in_code_block} = 1; delete $parser→{_current_code}; } elsif ( $command eq 'end' and $paragraph eq 'code' ) { $parser→{_in_code_block} = 0; $parser→add_doc_code( $parser→{_current_code} ); } else { «normal pod commands» } $parser→{_last_paragraph} = 'command'; } sub is_source_code { my ( $parser, $paragraph ) = @_; if ( $parser→{_in_code_block} ) { if ( $paragraph =~ $code_anchor ) { $parser→{_code_names}{ lc $1 } = 1; } $parser→add_code($paragraph); return 1; } elsif ( $paragraph =~ /^>/ ) { $paragraph =~ s/^>//gm; local $parser→{_in_code_block} = 1; $parser→is_source_code($paragraph); $parser→add_doc_code($paragraph); return 1; } return 0; } sub add_doc_code { my ( $parser, $paragraph ) = @_; my $code = $paragraph; $parser→add_doc( '<div class="code"><pre>' ⋅ $parser→htmlize_code( tidy($code) ) ⋅ '</pre></div>' ); } sub add_code { my ( $parser, $code ) = @_; $parser→{_code}{ $parser→{_current_code_name} } ⋅= $code; $parser→{_current_code} ⋅= $code; } sub add_doc { my ( $parser, $text ) = @_; $parser→{_doc} ⋅= $text; }
$paragraph = $parser→interpolate( $paragraph, $line_num ); if ( $command =~ /^head/ ) { $parser→{_current_code_name} = lc $paragraph; push @{ $parser→{_code_sections} }, lc $paragraph; if ( $command =~ /head1/ ) { $parser→{_doctitle} ||= $paragraph; if ( $parser→input_file =~ /pm$/ ) { $parser→add_code("package $paragraph;\n"); } } $parser→add_doc( '<a name="' ⋅ lc $paragraph ⋅ '"></a>' ); } $parser→add_doc( htmlize_doc( $command, $paragraph ) );
sub verbatim { my ( $parser, $paragraph, $line_num ) = @_; return if $paragraph =~ /^\s*$/; unless ( $parser→is_source_code($paragraph) ) { $parser→add_doc('<pre class="sample">') unless $parser→{_last_paragraph} eq 'verbatim'; $parser→add_doc( $parser→htmlize_code($paragraph) ); } $parser→{_last_paragraph} = 'verbatim'; } sub close_verbatim { my ($parser) = @_; $parser→add_doc('</pre>') if $parser→{_last_paragraph} eq 'verbatim'; }
sub textblock { my ( $parser, $paragraph, $line_num ) = @_; $parser→close_verbatim(); unless ( $parser→is_source_code($paragraph) ) { $paragraph = $parser→interpolate( $paragraph, $line_num ); $parser→add_doc( '<p>' ⋅ $paragraph ⋅ '</p>' ); } $parser→{_last_paragraph} = 'text'; }
sub interior_sequence { my ( $parser, $seq_command, $seq_argument ) = @_; for ($seq_command) { /C/ and return "<samp>$seq_argument</samp>"; /B/ and return "<b>$seq_argument</b>"; /I/ and return "<i>$seq_argument</i>"; /L/ and return "<a href=\"\">$seq_argument</i></a>"; /F/ and return "$seq_argument"; /S/ and do { $seq_argument =~ s/ \s / /xg; return $seq_argument; }; } }
In order to generate a coherent code file we first need to replace all code anchors with the code snippets to which they refer. At this stage we prepend strict and warning pragmas if they were called for, and then assemble all the found code chunks in the order in which they were encountered (less the snippets that have been integrated already). Finally we pass the code through perltidy (via tidy) to produce a legible source file.
sub dump_code { my ( $parser, $extension ) = @_; my $source; «replace code anchors with code snippets» $source ⋅= "use strict;\n" if $parser→{_use_strict}; $source ⋅= "use warnings;\n" if $parser→{_use_warnings}; $source ⋅= join( '', map( { $parser→{_code}{$_} } grep { defined $parser→{_code}{$_} } @{ $parser→{_code_sections} } ) ); «source code amendments based on file type» return tidy($source); }
while ( my ( $name, $block ) = each %{ $parser→{_code} } ) { foreach my $match ( $block =~ / $code_anchor /xg ) { next unless defined $parser→{_code}{$match}; $block =~ s/ << \Q$match\E >> / $parser->{_code}{$match} /gx; delete $parser→{_code}{$match}; } $parser→{_code}{$name} = $block; }
To reduce boilerplate code in our perl files, the preprocessor can add back in some tedious but necessary code. For scripts it prepends the shebang line. For libraries it appends the necessary "true value" that allows your otherwise flawless code to compile properly.
for ($extension) { /^pl$/ and do { $source = '#!' ⋅ `which perl` ⋅ $source; last; }; /^pm$/ and do { $source ⋅= "1;\n"; last; }; }
We've broken out the call to perltidy because we also use it for cleaning up the code in the documentation file. The regexes around the perltidy call are neccessary to preserve the code anchors in the documentation version of the source code.
sub tidy { my ($code) = @_; my $output; #perltidy needs to see an empty ARGV local @ARGV = (); $code =~ s/$code_anchor_line/$1#b#$2#e#/gm; perltidy( source ⇒ \$code, destination ⇒ \$output ); $code =~ s/^(\s*)#b#(.+?)#e#/$1<<$2>>/gm; return $output; }
sub dump_docs { my ($parser) = @_; my $doc; $doc = '<html><head><title>' ⋅ $parser→{_doctitle} ⋅ '</title>'; $doc ⋅= '<link rel="stylesheet" type="text/css" href="' ⋅ $parser→{_css} ⋅ '" /></head><body>' if $parser→{_css}; $doc ⋅= $parser→{_doc} ⋅ "</body></html>"; return $doc; } our %pod_commands = ( 'head1' ⇒ 'h1', 'head2' ⇒ 'h2', 'head3' ⇒ 'h3', 'head4' ⇒ 'h4', 'item' ⇒ 'li', 'over' ⇒ '<ul>', 'back' ⇒ '</ul>', ); sub htmlize_doc { my ( $command, $paragraph ) = @_; if ( my $tag = $pod_commands{$command} ) { return $tag if $tag =~ /</; return "<$tag>$paragraph</$tag>"; } return ''; }
We use PPI::HTML to turn the code blocks into syntax-highlighted html. We want to leave the code anchors from the literate style intact, as the goal here is legibility, and so we have to jump through some regex hoops to keep them intact through the process.
Finally we eliminate the br tags that the highlighter inserts because we're going to wrap the output in a pre block.
«the entity list» sub htmlize_code { my ( $parser, $code ) = @_; # preserve code anchors $code =~ s/$code_anchor_line/$1#b#$2#e#/gm; # "" to get the html to stringify $code = "" ⋅ $parser→{_highlighter}→html( \$code ); «entity conversion» # restore code anchors $code =~ s/(<span\s+class\s*=\s*"comment"\s*>\s*)#b#(.+?)#e#/$1«<a href="#$2">$2<\/a>»/g; $code =~ s/ <br> //gx; return $code; }
In the name of legibility we also take the time to change some of the perl operators into html entities.
while ( my ( $repl, $match ) = each %conversions ) { $code =~ s/ <span\s+class\s*=\s*"operator"\s*> $match /<span class="operator">$repl/xg; }
Here is the list of entities. the keys and values are reversed from their sensible order so that we can use regexes to find and replace the operators:
our %conversions = ( '→' ⇒ qr/ -> /x, # -> '⇒' ⇒ qr/ => /x, # => '≥' ⇒ qr/ >= /x, # >= '≤' ⇒ qr/ <= /x, # <= '≠' ⇒ qr/ != /x, # != '…' ⇒ qr/ \.\.\. /x, # ... '⋅' ⇒ qr/ \. (?!\.) /x, # . '×' ⇒ qr/ \* /x, # * );
use base 'Pod::Parser'; use PPI::HTML; use File::Slurp qw( write_file ); use Perl::Tidy;