Wednesday, April 4, 2012

No Dart Pygments for Me

‹prev | My Chain | next›

tl;dr asciidoc / a2x do not support pygmentized output in any format other than HTML.

The epub and mobi versions of Dart for Hipsters continue to lack Dart syntax highlighting. I have the correct version of Pygments installed, but git-scribe, despite throwing all kinds of warnings before the pygments update, is still not producing epubs with syntax highlighted.

Of course, this could just be my fork of git-scribe that is causing the troubles. It has been a while, but one of the additions to my fork was a post-generate clean-up with Calibre. Perhaps this is somehow scrubbing the syntax highlighting?

The answer to that question was no. Not wanting to introduce yet another dependency to git-scribe, I made the post-generate clean-up optional by virtue of the presence of a script in the source directory for my book:
    def do_mobi
      # ....
      info "GENERATING MOBI"
      # Generate with kindlegen...
 
      cmd = @wd +  '/scripts/post-mobi.sh'
      if File.exists?(cmd) && File.executable?(cmd)
        return false unless ex(cmd)
      end
      # ...
    end
I had completely forgotten about that, which means... that Dart for Hipsters does not have the most Kindle Fire friendly mobi possible. I rectify that by copying the post-mobi.sh script from Recipes with Backbone into my dart-book repository:
➜  dart-book git:(master) ✗ mkdir -p scripts                                       
➜  dart-book git:(master) ✗ cp -p ../backbone-recipes/scripts/post-mobi.sh scripts/
➜  dart-book git:(master) ✗ cat scripts/post-mobi.sh 
#!/bin/sh

echo -n "doing post mobi things..."
ebook-convert book.mobi book_ebok.mobi --chapter-mark=none --page-breaks-before='/'
mv book.mobi book.mobi.pre-calibre
mv book_ebok.mobi book.mobi
echo "done!"
With that I can regenerate the mobi and it will now play nice with the Kindle Fire:
➜  dart-book git:(master) ✗ git-scribe gen mobi
...
GENERATING MOBI
  adding: etype (stored 0%)
  adding: META-INF/ (stored 0%)
  adding: META-INF/container.xml (deflated 33%)
...
**************************************************
* Amazon.com kindlegen(Linux)   V1.2 build 33307 *
* A command line e-book compiler                 *
* Copyright Amazon.com 2011                      *
**************************************************
...
Info(prcgen): The document identifier is: "Dart_for_Hipsters"
Info(prcgen): The file format version is V6
Info(prcgen): Saving MOBI file
Info(prcgen): MOBI File successfully generated!
doing post mobi things...
1% Converting input to HTML...
...
MOBI output written to /home/cstrom/repos/dart-book/output/book_ebok.mobi
Output saved to   /home/cstrom/repos/dart-book/output/book_ebok.mobi
done!
If you have a Kindle Fire, you can re-download Dart from Hipsters (from the same URL in the original email).

But that still leaves me with my non-syntax-highlighted problem, which clearly is not being caused by Calibre. The do_epub function in git-scribe is mercifully short:
    def do_epub
      return true if @done['epub']

      info "GENERATING EPUB"

      generate_docinfo
      # TODO: look for custom stylesheets
      cmd = "#{a2x_wss('epub')} -a docinfo -k -v #{BOOK_FILE}"
      return false unless ex(cmd)

      @done['epub'] = true
    end
Hrm... that TODO note looks promising. The a2x_wss function calls the AsciiDoc executable a2x with a hard-coded stylesheet:
    def a2x_wss(type)
      a2x(type) + " --stylesheet=stylesheets/scribe.css"
    end

    def a2x(type)
      "a2x -f #{type} -d book "
    end
Looking through the HTML in resulting epub, I see no evidence of syntax highlighting. The intermediary Docbook XML does, at least, mention Dart as the syntax type:
<simpara>We start our Dart application by loading a couple of Dart libraries with a <literal>main()</literal> function in <literal>scripts/comis.dart</literal>:</simpara>
<programlisting 
  language="dart" 
  linenumbering="unnumbered">
#import('dart:html');
#import('dart:json');

main() {
  load_comics();
}

load_comics() {
  // Do stuff here
}</programlisting>
Taking a step back, I run the simplest asciidoc / a2x command possible—just asciidoc with no command line switches—against a small, self contained chapter from my book:
➜  tmp git:(master) ✗ asciidoc -v dom.asc
asciidoc: reading: /etc/asciidoc/asciidoc.conf
...
asciidoc: writing: /home/cstrom/repos/dart-book/tmp/dom.html
asciidoc: dom.asc: line 30: filtering: pygmentize -f html -l dart  -O encoding=UTF-8
asciidoc: dom.asc: line 42: filtering: pygmentize -f html -l dart  -O encoding=UTF-8
asciidoc: dom.asc: line 53: filtering: pygmentize -f html -l dart  -O encoding=UTF-8
asciidoc: dom.asc: line 65: filtering: pygmentize -f html -l javascript  -O encoding=UTF-8
...
That looks promising. And, in fact, it does produce Dart (and Javascript) syntax highlighted output:


But I'm not using the asciidoc command to produce things from AsciiDoc format. Instead, as mentioned earlier, I am using the a2x wrapper for asciidoc. So I generate HTML from a2x using the simplest command-line options possible:
➜  tmp git:(master) ✗ a2x -f xhtml -v dom.asc
a2x: args: ['-f', 'xhtml', '-v', 'dom.asc']
a2x: executing: /usr/bin/asciidoc --backend docbook  --verbose  --out-file /home/cstrom/repos/dart-book/tmp/dom.xml /home/cstrom/repos/dart-book/tmp/dom.asc
asciidoc: reading: /etc/asciidoc/asciidoc.conf
asciidoc: reading: /home/cstrom/.asciidoc/asciidoc.conf
...
asciidoc: writing: /home/cstrom/repos/dart-book/tmp/dom.xml
a2x: executing: xmllint --nonet --noout --valid /home/cstrom/repos/dart-book/tmp/dom.xml
a2x: chdir /home/cstrom/repos/dart-book/tmp
a2x: executing: xsltproc  --stringparam callout.graphics 0 --stringparam navig.graphics 0 --stringparam admon.textlabel 1 --stringparam admon.graphics 0  --output /home/cstrom/repos/dart-book/tmp/dom.html /etc/asciidoc/docbook-xsl/xhtml.xsl /home/cstrom/repos/dart-book/tmp/dom.xml
a2x: chdir /home/cstrom/repos/dart-book/tmp
a2x: finding resources in: /home/cstrom/repos/dart-book/tmp/dom.html
a2x: finding resources in: /home/cstrom/repos/dart-book/tmp/dom.html
a2x: deleting /home/cstrom/repos/dart-book/tmp/dom.xml
There is no mention of pygmentize and there is no longer syntax highlighting in the output:


My guess is that the --backend docbook option that a2x supplies to asciidoc is the culprit here (since nothing else in the printed command line looks at all different that my first run). And, indeed, running the command-line option without that switch does produce highlighted output.

At this point, I have reached an impasse. I cannot produce pygmentized output from a2xunless it is based on HTML and no other highlighter supports pygments. At the same time, I cannot produce epub unless it is based on DocBook. Indeed, the asciidoc documentation says as much:
You also have the option of using the Pygments syntax highlighter for xhtml11 outputs.
I do not believe that I ever paid much attention that last caveat. I'm paying attention now.



Day #346

1 comment:

  1. Wow. Your efforts using Dart to create a Blog aggregation will be very useful to many; i am one of them. It deserves a chapter in your book with a Dart 'gadget' to transform HTML formatted contents in Blogger to a post-processed epub/mobi.

    ReplyDelete