Most days I’m filled with desire to import content for my Kindle. Sometimes I think about sharing it with the world, but most times I can’t. It’s the nature of copyright, you see.
I’m not talking about digitizing my library via painfully slow methods (though I am contemplating that); that’s stuff I very obviously can’t share. What I am talking about is text that’s available on the web, but covered by copyright. Despite the fact that it’s up there for all to see without logging in, legally speaking you aren’t allowed to create eBooks and distribute them unless you have permission from the copyright owner. Certain Creative Commons licenses don’t allow this either—if “no derivative work” is allowed, nothing can be done about it; creating an eBook counts as a derivative work. And, given that all content is copyright-protected unless stated otherwise, this is something of a barrier to the would-be eBook creator.
I’m not going to argue against the state of the world, however. People own their content, and they should be able to choose the distribution of that content themselves. Anything else is disrespecting that ownership, and the effort that went into creating it. If someone doesn’t want their stuff as an eBook, even though it’s on the web, that’s their perogative. “It makes no sense!” you may cry. Well, it makes sense if they want to pull that content eventually and sell it—and yes, the Internet Archive respects robots.txt. That their work lives on in your browser cache doesn’t mean it lives on everybody else’s.
So. Now that that’s cleared out of the way.
At the moment, I can easily create eBooks from web content. I have scripts tailored for particular websites that strip away extraneous HTML, educate quotes and other punctuation, create clickable tables of content, aggregate even hundreds of entries into a single collection. There are more problems with efficiently digesting PDFs, but I already have scripts that will take care of the hard page breaks you find while converting a PDF document.
It’s perfectly legal for me to create eBooks for myself and my Kindle. It’s not legal to distribute them, in that form, for everybody else. This frustrates me, but all the same, I do respect the whys behind the restrictions.
The knowledge of how to do make eBooks from this content, though—that can be distributed. And it already has been on the web to some extent.
It’s a difficult business—almost every particular story needs its own care with certain aspects, even stories hosted on the same site; not only do you have non-standard HTML and, in some cases, extremely broken HTML to deal with, you also have specific formatting to deal with, and any of these can be inconsistent through the text. You have text that’s split onto different pages; you may have 300+ entries to deal with, with slightly different formatting constraints applied over five years. There’s no one-size-fits-all to chomp every HTML page into decent eBooks of quality—though there are scripts that will let you do it easily at bad quality.
I’m on the good quality side, obviously. Some people will say it doesn’t really matter, and in a sense, it doesn’t. Words are words. On the other hand, there’s something to be said for tables of contents, text that reflows properly, paragraphs that aren’t broken inappropriately, italics that are applied, spacing that depends on context… these little things have plagued type setters for hundreds of years, and it’s not about to stop just because the digital age is here. When I turn text into eBooks, I also study the visual formatting of the original source so I can replicate it as best I can.
I’m also on the side of preparing good quality quickly. This requires scripting knowledge, usually perl in my case; HTML and CSS knowledge (not huge amounts, but beyond simple websites); knowledge of wget or curl to quickly download and sanitize references for parts of websites; and knowledge of the ins and outs of extremely powerful text editors, like Vim, that pretty much qualify as on-the-fly scripting. And also the wisdom to know when you have to do something manually—although even that can be sped up with the right knowledge of the right tools. Someone commented on how many good eBooks I’ve done in what can be thought of as a short period of time; I often forget this, because the speed is now natural for me. And maybe there is some distinction here, since what I can offer up for download is dwarfed by what I make for myself.
The world online is my oyster. I just can’t share most of it with you all. This is not something I would do just because I can, legally or not; even for Creative Commons works that allow derivative work, I still ask for permission for distribution, and even so I hand over control of distribution to them if they want it. It’s enough that works can get out there legally at all for Kindles worldwide.