Considerations in the Making of eBooks: MRK's Fiction Sampler

I have some screenshots lying around of the eBook I did for Mary Robinette Kowal’s free fiction sampler, and I thought it would be interesting to look at the process I used to convert her .rtf files.

I have a MacBook, so it comes equipped with a lot of text-munging capabilities out of the box, thanks to pre-installed perl and python. With Crossover installed so I could leverage the Windows-only Mobipocket Creator and Reader, I have everything I need to create eBooks from text, HTML, RTF, or PDF (as long as it’s not scanned images), even Word documents.

There’ll be plenty of images, so the rest of this post is under the cut.

First Pass

The first step (after having read and enjoyed MRK’s stories) was to convert the RTF files to HTML. The MobiPocket Creator doesn’t import RTF, and without an installation of Microsoft Word for Windows (via Crossover), it can’t import Word documents either. Instead, I used iWork‘s Pages to export the RTF to HTML. (There are a couple routes to victory here; I could have used TextEdit, which already comes on a standard Mac install, to Save As HTML.)

I then had to process files that looked like this:

into something that the more limited range of Mobipocket’s HTML renderer could understand. The blue arrows on the left are my editor’s way of telling me that it’s wrapping a very long line around.

The resulting files were much cleaner:

As you can see, the main goal of the first pass is to get the body text into properly opened/closed paragraph tags (<p></p>). This tells the Mobipocket renderer where a paragraph begins and ends, so that it can indent each paragraph and space it from the others appropriately.

Second Pass

Each story is in its own file, which simplifies some things and makes others more complex. One important thing that’s simplified: separate files will be rendered by Mobipocket into one Mobipocket file, with page breaks between each file. I don’t have to worry about inserting <p style="page-break: always;"> in the right places of a single monster file. It’s also easier to create a table of contents for stories or chapters broken into separate files—but more on that later.

I now use header tags to set off the title, author, and in this case, original publication information of each story. Due to the Creative Commons license that Kowal’s work is under, and also out of respect for the author, I’m not changing the text—e.g., dropping her name or the publication information.

I typically use <h3> for chapter titles, and either <h2> or <h1> for story titles. The title/author/etc information was centered for the original RTF, so I also centered them here.


  

The Bound Man

by Mary Robinette Kowal

Originally published in the anthology, Twenty Epics

Screenshot from the Kindle:

Two line breaks (<br>); I don’t normally do this either, but the publication information is the same size as the body text, so I used spacing to set it off. It is a nice effect, though, so maybe I’ll do that from now on.

Third Pass

Now we go through the body of the text and make note of the following:

  • What notation is being used to denote scene breaks?
  • Are there any characters that require a specific encoding (such as the accented o in “Halldór” in “The Bound Man”)?
  • What notation is being used to denote italicized text, or bolded text?
  • Notation for em-dashes?

Then we adapt the main body formatting for for Mobipocket.

Because each of these files is in standard manuscript format, scene breaks are # and underlining means italics. I replace scene breaks with

···

which is just a stylistic thing of mine, and the underlining is replaced with italics (<em></em>). This is where knowledge of CSS comes in handy, because the RTF converters use CSS to denote underlining and centering, not the usual <i></i> or such. More flexible, but harder to root out. Special characters, like -- denoting an — are replaced with the appropriate HTML entity (&mdash;).

Because of the presence of characters that require UTF-8 encoding, I make sure to preserve some indicator at the top of each file that the encoding is UTF-8, such as

at the very top of each file. Otherwise, accented characters and even smart quotes (if present) won’t render correctly at all.

Fourth Pass

This is a more specialized pass, to pick out semantic clues as to how text should be formatted in certain cases. For instance, Death Comes But Twice is a special case, as a letter:

The initial address at the start of the body of a letter isn’t indented; we have to tell Mobipocket that it shouldn’t be. We do this via a width=0 attribute to the paragraph tag:

My dearest Lily,

(Naturally, this is where having read the text is helpful. I am fussy, so I do this sort of thing.)

Also note that the Creative Commons license at the end of each file needs to be set off from the text (easy to do with a rule, <hr>). It also looks ugly when justified—a concern since all Mobipocket readers tend to default to justifying text in paragraphs. Instead, we explicitly set the justification to be left (aka ragged-right), and also knock off the implicit indent:

This work is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Renders as:

Note: Publishers should let readers decide, except in special cases like above, whether or not to justify text or allow indentation for the main body of the work. Yes, this is a change from traditional book publishing, where all formatting decisions are made by the publisher, but eBooks need to be a bit more flexible with respect to font family, font size, indentation, and justification. Whoever eBook-ified Dust and other such books, I’m looking at you.

Fifth Pass

Each story file is processed now, so now we look at structuring the eBook as a whole: putting together all the stories, with a table of contents that allows a reader to jump to each one rather than paging through. eBooks do not, as a rule, page as well as real books do. No, they really don’t. Especially since the concept of a page as a fixed-size element is not present for eBook formats like Mobipocket or ePub, which depend on reflowable text so that users who change font sizes won’t get nasty formatting when they do so.

Any other guide elements that we should have, we’ll also do. A title page is usually nice; if any dedications are present, a dedication is nice. Similarly for preface, colophon, acknowledgments, and so on. Each such guide element is usually a separate file in itself.

A note on cover images: these are very nice when you can get them, but the image rights belong to the artist, not the author. You’d have to ask. Signs typically point to “no” if you’re not an actual publisher.

My title pages are very simple: <h1> for the title, <h2> for by line. Here’s a token screenshot (not very exciting):

The table of contents page is only slightly more complicated. Here you need to have a good working knowledge of <a href... and <a name.... We’d need both if we were working with a super-linked work (many links within a chapter point to other parts of the work) or with a single big file for the stories. Here, because the stories are separate files, we only need the former, and href points to the filenames of the stories.

The links in a table of contents are ones you don’t want justified or indented (most story or chapter titles don’t reach all the way across a screen nicely).

Contents

The Bound Man

Cerbo en Vitra Ujo

...

Renders on the Kindle as:

Now you need to order all the files in the Mobipocket file list in the order that you’d like a reader to page through them, if the reader were to go through the book from start to finish without any shortcuts. Some of them do that.

Metadata and Guide Data

And here is where I see a lot of publishers, professional and non-professional, make mistakes. Which isn’t great, considering how important metadata is to catalogs and, in particular, content organization on a Kindle or other Mobipocket device:

What’s wrong with this picture?

  • The list is ordered by author last name. Unfortunately, “After the Coup” has incorrect author metadata, indicating that “John Scalzi” is the last name, rather than just “Scalzi”.

  • Similarly for

    • In the Garden of Iden by Kage Baker,
    • Spirit Gate by Kate Elliott,
    • Flash by L. E. Modesitt.
  • The Birthday of the World has its title correct for a library card, incorrect for a Mobipocket reader (which can library-alphabetize without people re-ordering beginning articles to the end).

  • Also, The Birthday of the World has incorrect author metadata; a space was missing after the comma between last name, first name, so it’s displayed incorrectly here.

  • You can’t see it, but Farthing on page 10 is missing author metadata entirely.

Important elements of metadata include:

  • Title. This should be without the author name, and in normal title order. You do not need to put the beginning “The” or “A” at the end with a comma, because Mobipocket readers are smart enough to alphabetize library-style.

  • Author. Do not just paste the author’s name in; you must put it in last-name, first-name middle-name. And a space after that comma, folks. This is so the Mobipocket reader can alphabetize by author last name, library-style. Multiple authors are separated by semicolons.

Important guide data:

  • Marking which file is the title page.
  • Marking which file is the table of contents.
  • Marking which file is the [name of guide, like preface/introduction/etc].

It’s especially silly not to mark which page is the table of contents, since then the reader can’t easily access it when they’re deep in the text.

And remember: scroll all the way down and press the Update button, because the “Save” icon doesn’t save metadata or guide data in Mobipocket Creator!

Conclusions

… gaah, like an essay. But anyways, these were my considerations while I was creating the eBook for MRK’s fiction sampler.

At some future point in time I may cover the considerations that went into Psmith in the City. That was a bit more complicated, since I did not get to draw on the electronic copy of an author’s manuscript, and really did require perl scripts for major text massaging and processing.

13 thoughts on “Considerations in the Making of eBooks: MRK's Fiction Sampler

  1. That is totally fascinating. I had no idea how much work went into converting something for e-book. Many thanks, not just for the work, but for explaining how you do it.

    Spontaneous derivation? Ha.

  2. You’re welcome!

    I’m pretty fussy, which of course generates more work. I share this characteristic with many other non-professional eBook creators (and pros, too, I imagine). Some of the best formatted stuff out there is free, because so many do it out of love for the text.

    (And we don’t work on the deadlines that pros must do right now, when the huge backlist demands speed more than perfection.)

    I’m currently reading the Avram Davidson special story collection, and I cringe at the formatting. Margins are set in too much (assuming a thinner screen like the Sony eReader, punishing people using the Kindle or Mobipocket Reader on their computer for having a wider screen). Specific font chosen which is not as readable as the native fonts of each reader; at least the font size doesn’t seem to be fixed. Messy scans of the chapter titles. It makes me want to cry, especially since I paid for this mess, but what can you do? At least Tor chose a font with serifs.

    There are so many eBooks out there that just have issues. I want to smack multiple publishers…. I’ve thought about writing posts berating them for the things they do to the text, and also praising the times they get things right. We’re at the age of progressing from paper to digital, and digital formatting hasn’t got the history yet that paper does.

  3. Oh, and Spontaneous Derivation — yes, a bit of a conceit of mine. If I were smart I’d rename this something SEO-friendly, like “SF Kindle” and have a hostname to match. But I don’t. Ah well. It’s more fun this way.

  4. So, all in all it is a website with plenty of metatags built in and a specific format? I bet it takes you quite a while perusing code to make sure everything is in order even after going over the code hundreds of times. I agree wholeheartedly on the point regarding the effort put into professional, for-sale e-books. It makes me think that whoever is hired for creating a lot of these books might be more focused on playing solitaire than on the job. :D

  5. Dianne, yup, it’s a website, all archived and scrunched up, with some extra data. And yeah, even with scripting, there’s always something you miss. Some little detail. In some 80,000 – 100,000 words.

    Even when print goes digital, it’s really still a perfectionist’s paradise. (Or at least a type-A perfectionist’s paradise.)

    I pity the folks creating eBooks for the backlist once they start getting in the parts that no longer have the original manuscript lying around in some archive. OCR software is getting very good at scanning text in, but scanning in a thousand pages a day would really mess up my wrists….

  6. “I’m currently reading the Avram Davidson special story collection, and I cringe at the formatting. Margins are set in too much (assuming a thinner screen like the Sony eReader, punishing people using the Kindle or Mobipocket Reader on their computer for having a wider screen). Specific font chosen which is not as readable as the native fonts of each reader; at least the font size doesn’t seem to be fixed. Messy scans of the chapter titles. It makes me want to cry, especially since I paid for this mess, but what can you do? At least Tor chose a font with serifs.”

    That’s an unhappiness of mine with digital books, period. You are stuck with the fonts available on your device, which may bear scant resemblance to the font in which the paper edition was set. As someone who can tell the difference between, say, Baskerville and Garamond (or Arial and Helvetica, for that matter), and loves typography, this is a loss.

    I read Mobi titles on a Palm OS PDA, and did some conversions of True Type fonts to a form the PDA could use to provide better selections for Mobi than the stock Palm fonts. (Mobi supplies their own custom fonts for Palm devices, but they are sans serif and either too small or too big.)

    Macmillan is the one digitizing the Tor catalog. I suspect scanning is part of the problem. Avram’s work has been out of print for years, and I’d be surprised if electronic copies of the work existed to start from.

  7. Dennis, I see your point about publishers providing specific fonts along with their eBooks. (I really should do a screen cap of the Davidson some time.) Some readers may have quite awful fonts. The default font on the Kindle is a serif that’s rather readable, with good anti-aliasing so that it’s not the sharp badness of some older PDAs. Even its sans-serif is happily readable on the device itself. (The screenshots don’t have the lovely anti-aliasing… seems to be a hardware thing.)

    I’m fine with that, but I did read about the terrible sans-serif font for some files that the eReader has on it if you don’t somehow switch to a specific font.

    I agree that some of the problems with the Davidson collection is probably scanning. All in all, I’ve gotten used to the font for the main text—it just looks so unnatural compared to the Kindle’s serif. Or perhaps it’s more the spacing issue with full justification due to the margins being smaller. I just know when I read with my particular size preference (larger than most), the spacing between words becomes strange. It’s fine if I switch to left justification.

    How ever the developers for the Kindle worked it out, the font/margin default tends not to have justification issues except in extreme cases.

  8. You confused me for a moment. My referent for “ereader” is a program by that name, formerly called “PalmReader”, and before that “Peanut Reader”, after Peanut Press, an early ebook publisher targeting Palm devices who created a markup language called PML and a reader that could read it.

    Palm bought Peanut Press and made them the Palm Digital Media division, then sold that to Motricity, a supplier of mobile content delivery solutions, who renamed it to eReader. Motricity recently sold eReader to Fictionwise, who just released a port of the reader to the Apple iPhone/iTouch platforms.

    So it’s likely a registered trademark, and really shouldn’t be used as a generic descriptor.

    I see examples of spacing issues created by font size. I created fonts to use with Mobi on my Palm OS device because I wanted *smaller* fonts, to get more words on a page while remaining readable. An example of the results looks like this:
    this” rel=”nofollow”>

    But depending on font size, screen size, and whether justification is turned on, things can look odd.

  9. Good point about “eReader”. E-reader it is.

    Your link didn’t make it through, unfortunately. I am curious about what the results look like.

  10. Your link didn’t make it through, unfortunately. I am curious about what the results look like.

    Yes, I noticed. But there was no way to go back and edit the post. Let’s try again.

    Here is Mobi in landscape format on my device displaying a file using one of Mobi’s supplied fonts:

    And here it is using a converted True Type font:

    (For the technically minded, I convert the fonts on the desktop using iSiloX to put them into iSilo for the PDA format, then use a freeware app on the PDA called Font Collector to convert them to a farm Mobi will see, and enable them with an OS5 version of Fonthack.)

    Hopefully, these will show up…

  11. Thanks, Dennis!

    I’ve got to look into adding better fonts for my MobiReader. Just because I use it only for previews doesn’t mean I must suffer through non-anti-aliased font horror…

  12. You’re welcome. I may be able to offer suggestions. If you run Mobi Reader on the PC to preview output from Creator (as I do), there are a plethora of True Type fonts available, including a fair number of decent free ones. (My rule of thumb in evaluating a font is “Would I set *body* copy in it?”. If the answer is No, I think hard about whether to bother. Most of the fonts I see are “headline” fonts I might use *once*. If I can set body copy it it, I can set headlines, too…)

    I’m a SysAdmin these days, but I was a designer and production geek back in the Old Stone Age before DTP (because neither DTP software nor systems that could run it existed), and I developed an enduring interest in typography and love of good fonts. The best I can say for dedicated readers and devices like my handlheld is that the default choices are readable. I’d hesitate to call them “good”.

    My font hacking on my Palm device was as much to try to add fonts I thought were aesthetically pleasing as to get smaller fonts that would let me get more text on screen.

  13. Suggestions would be cool. I only really know the free fonts from Microsoft (who stopped distributing them… but the .bins are around for anyone to find).

Comments are closed.