From the eBookery: Update to My Man Jeeves, Psmith beta versions, Single Quotes

There was a minor flummox in My Man Jeeves; when you visit a chapter from the table of contents, the chapter title unbolds/uncenters/unbigs itself. When you next-page to the beginning of the chapter, on the other hand, the font stays bolded/centered/big. It had to do with the order in which HTML tags are nested. I fixed it.

If you downloaded version 1.2 of My Man Jeeves, you have the fix already. Here’s the download link again for anyone who missed out:

[download id=”15″]

To make up for the late notification, I have beta versions of some Psmith books, which I have never read before. Here are the files, to download in the US and everywhere else that the copyright has expired:

Update: Finals produced, betas removed. See the downloads page for the finals.

The only reason I call them beta is because, while I think I’ve straightened out all the formatting hijinks I ran into with going from plain text to Mobipocket Format With Nice Features, I won’t conclude that I’ve done so until I finish reading Psmith in the City as a sort of smoke test for the other two. In the time between creating the first Psmith book and the third Psmith book, I’ve corrected the first version some dozen times. Also, I’ve found that I do not understand cricket scoring.

I think after this I’ll be geared up to generate nice eBooks from Project Gutenberg text. Most of them probably don’t involve cricket either!

By the way, a little knowledge and elbow grease and perl and experience nowadays means that I can generate a (potentially) nice eBook for Psmith, Journalist in under an hour. Generally I aim for Feedbooks quality—that is, tables of contents, flowable text, emdashes/endashes and nice scene breaks, etc. Manybooks quality, I’ve found, is just a straight text conversion with nothing else. Some people—and here I’m talking about people who aren’t even related to Wodehouse—will actually try to sell you a Manybooks-level version for $4 (and sometimes it looks suspiciously like something dregged up from Manybooks without permission… but it’s hard to tell for sure, there’s so little formatting done in either case).

And now: a rant about single quotes.

The version of Psmith in the City is processed from the original Project Gutenberg .txt (no .html available), which is rife with usages of single quotes where double quotes should be; old fashioned. These days, however, both apostrophes and single quotes are the same in an ascii text (and also the ISO-Latin-1 text). So turning thousands of instances of lines like

'I say,' interrupted Mike, eyeing Psmith's movements with apprehension, 'you aren't going to drive, are you?'

to the polished

“I say,” interrupted Mike, eyeing Psmith’s movements with apprehension, “you aren’t going to drive, are you?”

was kind of a headache. And that’s just a normal instance. Then there are contractions that have apostrophes at the beginning of the word…

'I suppose you're going to the 'Varsity?' he said.

Changed to

“I suppose you’re going to the ‘Varsity?” he said.

‘Varsity is used a lot. Thank goodness for powerful editing software.

I hearby declare single quotes for first-level quotations to be, not just wrong, but evil and wrong.

Thankfully the other two Psmith Gutenberg texts did not suffer from this evil wrongness.

PS: To anybody who’s here from googling something like “pg wodehouse evil” or “pg wodehouse wrong”, here’s George Orwell’s “In Defence of P. G. Wodehouse”.

7 thoughts on “From the eBookery: Update to My Man Jeeves, Psmith beta versions, Single Quotes

  1. Inverted commas are standard in British usage. Personally I blame lazy Gutenberg typists who can’t tell an apostrophe from a prime, but then I’ve got kind of a type fetish…

  2. They’re not lazy, they just don’t have the full character set to work with. ASCII contains no inverted commas; you’re left with apostrophes and backticks. You would need one of the ISO Latin encodings.

    Since the first cut of most Project Gutenberg books is in ASCII, you usually lose the inverted commas. Although I’ve run into “ASCII” with ISO-Latin-1 characters, so it’s weird…

    I hate encodings. Everything ever should always have been in UTF-8.

    • Sam, I meant that I hate the world of multiple encodings left and right. Translating between them is a nightmare, especially when such translations are actually lossy.

      I’m also quite aware that UTF is a family of encodings, including things like UTF-16, which is a joy to run into at random (not).

      There should only be ONE. And it should be UTF-8. Or UTF-16. (Someone please make up their minds about that, though I think it’s settled on more or less UTF-8 as the one best/standard-sort-of encoding these days).

  3. Yeah, I was just being a pedant. I quite agree that we should just settle down with UTF-8.

    Of course, it’s a pain when one is forced to use Windows, and can’t type much beyond what’s printed on the keyboard (cf. typing en and em dashes on OSX vs. Windows).

    On a related note, I’ve never understood why Project Gutenberg still seems so keen to do away with original formatting — I print and bind LaTeX editions of PG books, and it’s a huge amount of work to get them ready (tables, images, etc., not to mention typographical matters). I wonder if there’s a wiki repository of some sort somewhere that I can upload the texts to….

  4. PG’s mission has always been to provide public domain texts across the widest spectrum possible in the language of the original text—and ASCII is the common denominator for North America/English/Australia text, followed by the various ISO Latin-1’s for Western Europe/South America/Africa, followed by UTF or other large encodings (such as JIS for Japanese text) for East Europe/Mediterranean (which I can’t spell…)/Asia.

    Thus we get what we get. PG was also born before UTF-8, and with thousands of texts available, converting all that to either LaTeX or XHTML just has to be a project on its own somewhere… and as TeX is the common denominator for scientific texts in academia and research, and XHTML 1.1 with entities is renderable across all modern web browers plus is necessary for modern ebook formats such as ePub, Mobipocket, and LIT….

    There are actually, though, HTML formats for a fair amount of PG’s texts these days; perhaps a petition for LaTeX? Since Plucker formats are also present and all…

Comments are closed.