EBookery Workflow for Various Formats

[toc class=”toc-right”]

About

This is just how I convert documents; there are several ways to do most of these, but these are what I like best and have found most rewarding, especially in terms of dealing with paragraphs correctly (e.g., discovering paragraphs and reflowing them properly, rather than simply inserting line breaks everywhere).

These instructions are for someone with a Mac OS X Intel computer. I haven’t got instructions for those on Windows; apologies.

Many of these instructions currently require command-line fu; there are tutorials out there on the web.

table.workflow { border: 1px solid #ccc; border-collapse: all; }
table.workflow td, table.workflow th { padding: 5px; margin: 5px; border: 1px solid #ccc; }
table.workflow th { background-color: #ccc; }

Summary

My basic workflow as of January 2009 is this:

  1. If it’s anything other than Epub, transform it to HTML.
  2. Create an Epub project using the HTML.
  3. Compile an Epub.
  4. Compile a MOBI file from the Epub.

This covers both recent versions of the Sony Reader and all versions of the Kindle.

Tools

CrossOver

I use this to run Mobipocket Creator and Mobigen, as they’re traditionally Windows-only tools.

Mobipocket Creator

Used for generating PRC files (if I want to, which is rarely these days) or for importing PDF and turning it into a nice XML format.

Mobigen

A Windows command line executable, I use this to quickly generate a MOBI file from an Epub.

Calibre

Calibre is an ebook library manager and sometime ebook reader, but mostly I use its command line utilities to convert various files.

GutenMark

There is nothing out there better for turning Project Gutenberg text files into HTML.

MacVim

Much fu is required here. This editor is very useful for on-the-fly scripting. And by on-the-fly, I mean applying intelligent regular expressions on file contents directly in the editor. Also extremely useful for editing Ruby scripts, of course.

REXML

A Ruby library that, these days, comes with every Ruby install, including that on Mac OS X. Knowing both this library and ruby gives you the most flexibility in working with HTML/XML.

RubyEpub Tools (ruby-epub)

Utilities that make metadata and compilation simple. A developing library with a main exectuable “epub”.

Target: HTML

This is one the most useful section for multiple formats, because once you have the HTML, almost every other format isn’t far behind.

Source: Microsoft Word Document

  1. Open in OpenOffice.
  2. Save as RTF.
  3. See Source: RTF below.

Source: RTF

  1. Open in TextEdit.
  2. Save as HTML. (This HTML is much simplified and easy to work with.)

Source: Epub

  1. Create a directory and move the epub file to it.
  2. Unzip the Epub. (You may need to rename the file extension to .zip.)
  3. All the HTML files that result are your friends. There may be images, stylesheets, and fonts as well. How you deal with this afterwards is up to you and your browser, but usually just keeping them in the directory and browsing over it is enough.

Source: Sony LRF Files

You need the command line for these instructions.

  1. Using calibre:
    lrf2lrs FILE.lrf -o FILE.lrs
  2. See LRF to HTML: The Rough Guide to convert the lrs file to lrf.
    Really short answer: [download id=”29″]

Source: PDF

  1. In Mobipocket Creator, import from Adobe PDF.
  2. In the directory created, there will be an HTML file and an XML file.
  3. Usually the HTML file isn’t to most likings (it still has a problem discovering paragraphs); using your own script to process the XML file can give better results.

    See Perfecting (Simple) PDF Conversion to EPub and Mobipocket.

Source: Project Gutenberg Text File

  1. Use GutenMark.

Source: PRC, MOBI, LIT, Plain TXT

You need the command-line for these instructions.

  1. Using calibre:
    any2epub --title "Your Book's Title"
        --authors="Authors Separated by Commas"
        FILE
  2. See Source: Epub above.

Target: Epub

Basics

  1. Convert whatever it is to HTML (as above).
  2. See Creating eBooks: An Epub Tutorial.
  3. Remember to split huge HTML files into section- or chapter-sized chunks.

RubyEpub Tools

This requires command-line fu.

If you’re using Ruby Epub Tools, here are the commands I use (the dots . are important):

tufor% epub create LastName_FirstName-Title_Words --title "Title Words" --author "FirstName LastName"
tufor% cd LastName_FirstName-Title_Words
tufor% [dump all HTML files into content/ directory]
tufor% epub add-to-opf . content/*
tufor% [edit metadata.opf to re-order spine]
tufor% [edit toc.ncx]
tufor% epub compile .
tufor% ls *epub
LastName_FirstName-Title_Words.epub

Note: Currently messing around with the toc.ncx file is not yet a feature of the epub script. Most likely I’ll have it generated from the spine of the OPF file, or run over a TOC html file if there’s additional structure (like subsections) involved.

Installing/Running EpubCheck

This requires command-line fu.

To install and run EpubCheck, you need to download the epubcheck zip file. Unzip them into a new directory (I suggest making a new directory called “EpubCheck” in your Library directory under your Home), and add the following script (name it epubcheck) to that directory (or to your default bin path).

#!/bin/sh
export EPUBCHECK_HOME="$HOME/Library/EpubCheck"
export CLASSPATH="$EPUBCHECK_HOME/lib:$CLASSPATH"

java -jar $EPUBCHECK_HOME/epubcheck.jar "$@"

Once you chmod 755 the epubcheck script, simply call it on any epub file:

$HOME/Library/EpubCheck/epubcheck LastName_FirstName-Title_Words.epub

Target: Mobipocket

Mobipocket files come in two flavors: PRC and MOBI. PRC is closer to a basic Palm document, while a MOBI file wraps the original PRC in another layer. However, both formats have the same capabilities with respect to meta-data, image support, CSS support, and multiple HTML files; and many readers (including the Kindle) that can read one can also read the other.

However, PRC is more inclusive, since PRC documents are not the single realm of just Mobipocket (unlike the MOBI format).

In either case, what I usually do is first create the Epub and add Mobipocket compatability; then the creation of either a PRC file (easy) or a MOBI file (easy if you have command line fu) naturally follows.

Adding Mobipocket Compatibility to Epub

In the Epub’s project directory:

  1. Add an HTML table of contents file explicitly for Mobipocket; I usually call it toc.html and put it in the content directory.
  2. Then add the ToC file to the manifest and spine sections of the metadata.opf file.
  3. Add a reference to your table of contents HTML in the guide section of the metadata.opf file. The end result looks something like this:
      
        
      
    

RubyEpub Tools

If you’re using RubyEpub tools, here are the commands I use after creating toc.html:

tufor% epub add-to-opf . content/toc.html
tufor% [edit metadata.opf to reorder the spine]
tufor% epub compile .

Generating PRC with Mobipocket Creator

  1. Open Mobipocket Creator.
  2. Tell it to open an existing project, and navigate to the metadata.opf file in your Epub directory.
  3. Build the project as usual. You’ll get a PRC named metadata.prc, which you’ll need to rename.

Generating MOBI with Mobigen

You definitely need CrossOver for this.

To install mobigen.exe, download it from the Mobipocket development center (right side). Once you unzip the file, there will be a mess there. Find the mobigen.exe file, which may have some junk plus , and rename it to mobigen.exe:

tufor% mv [junk with ]mobigen.exe mobigen.exe

You likely can’t do this from Finder, because it’s the bad translation of a DOS file system (hence the backslashes) to a Unix file system (which uses forward slashes).

Then place the following in a script file somewhere:

#!/bin/sh

MOBIGEN="$HOME/Software/mobigen/mobigen.exe"

export 'FONT_ENCODINGS_DIRECTORY'='/Applications/CrossOver.app/Contents/SharedSupport/X11/lib/X11/fonts/encodings/encodings.dir'
export 'CX_BOTTLE_PATH'='/Users/ajar/Library/Application Support/CrossOver/Bottles'
export 'Apple_PubSub_Socket_Render'='/tmp/launch-HtqXLr/Render'
export 'FONTCONFIG_ROOT'='/Applications/CrossOver.app/Contents/SharedSupport/X11'
export 'CX_ROOT'='/Applications/CrossOver.app/Contents/SharedSupport/CrossOver'
export 'SSH_AUTH_SOCK'='/tmp/launch-JHS5Yh/Listeners'
export 'COMMAND_MODE'='legacy'
export 'TMPDIR'='/var/folders/Uq/UqsgL-Ju2RWNH++8ZP3Nj++++TI/-Tmp-/'
export 'FONTCONFIG_PATH'='/Applications/CrossOver.app/Contents/SharedSupport/X11/etc/fonts'
export 'DISPLAY'=':2'
export 'DYLD_FALLBACK_LIBRARY_PATH'='/Applications/CrossOver.app/Contents/SharedSup
port/X11/lib:/Users/ajar/lib:/usr/local/lib:/lib:/usr/lib'
export CX_BOTTLE='win2k'
export PATH="$PATH:"'/Applications/CrossOver.app/Contents/SharedSupport/CrossOver/bin'

wine $MOBIGEN "$@"

Place this script in a directory that’s on your PATH.

Once all that is over, you can simply do this in the future:

  1. tufor% mobigen -s0 -c1 EPUB-FILE

This generates a MOBI file with a similar name in the current directory.

TO DO: Additions to this document

Things I want to add:

  • Turning scripts into Platypus commands for friendliness, thus reducing the need for some of the command-line fu.
  • Expand the RubyEpub tools. The road map is available on the home page of the main RubyEpub Tools site.