LRF to HTML: The Rough Guide

As of this writing, calibre, which can convert many things from one format to another featuring command-line tools, does not convert LRF to HTML, or indeed, to most anything else other than LRS, an XML format. Currently this is not a high-priority item to fix in calibre itself, because calibre is aimed at converting things to LRF. (The ePub conversion is still relatively new and shiny.)

ETA: Here’s the LRS specification.

So. Heck. Why not. I’m using Ruby, by the way, because Ruby has the kick-ass REXML library, which also forms the cornerstone for my ruby-epub stuff (still in the making).

The scope of this code: extremely basic. Should be run on the LRS file produced from calibre’s lrf2lrs utility. The finer details of calibre’s LRS are skipped over, and there are some hacks. It is somewhat smart enough to deal with strange formatting (though not illegal formatting).

But basically it does this:

prompt% ./lrs2html AnExampleBook.lrs
Parsing XML
Done parsing
Attributes: #
Processing Styles
Styles: {"208"=>"text-align: center; ", "220"=>"text-align: center; ", "209"=>"text-align: center; ", "221"=>"text-align: foot; ", "210"=>"text-align: center; ", "213"=>"text-align: center; ", "214"=>"text-align: center; "}
Processing Pages (Sections)
Procesing text for Section 0
Title: An Example Book
Processed section An Example Book
Procesing text for Section 1
Title: Chapter 1: In the Beginning
Processed section Chapter 1: In the Beginning
Procesing text for Section 2
Title: Chapter 2: Flowering
Processed section Chapter 2: Flowering
Procesing text for Section 3
Title: Chapter 3: Autumn
Processed section Chaper 3: Autumn
Creating directory An_Example_Book
Writing sections
Writing 'An Example Book' to title.html
Writing 'Chapter 1: In the Beginning' to section-01.html
Writing 'Chapter 2: Flowering' to section-02.html
Writing 'Chapter 3: Autumn' to section-03.html
Writing TOC
DONE
prompt% ls An_Example_Book
section-01.html
section-02.html
section-03.html
title.html
toc.html

So, here’s the code, which is cheerily commented as always…. you might want to download the file, since there’s a control character that WordPress wisely does not allow me to post.

Code Download

Advertisements