Calibre, Word and MOBI: A Tale of Three Programs

(Yes, I know, MOBI is not a program, but my blog, my headlines…)

Ever since I started blogging about ebooks, I’ve cautioned people against using Microsoft Word to format their ebooks. Not because Word is a bad program and not because it’s impossible to create ebooks with it. It’s because it’s the not quite right tool. Word’s strength lies in creating print documents or pdfs.

Recently, I’ve been cautioning people to not use Calibre to convert their Word files into MOBI files in order to sell them on Amazon. Not because Calibre is a bad program and not because it’s impossible to create MOBI files with it. It’s because it’s not quite the right tool. Calibre’s strength lies in managing a person’s digital library. It was not created to convert commercial ebook files.

EPUB files are not as troublesome as MOBI files. EPUB is EPUB is EPUB, and while each device has its own special way of rendering the file to fit the platform, the differences between devices aren’t big enough for most people to notice. A single EPUB file will work pretty much the same on a Nook as it does on an iPad.

Calibre is set up for optimum use with EPUB files. If a publisher converts a Word (html) file into an EPUB file using Calibre, then what they see there is pretty close to what a Nook or iPad reader will see.

This is not true with MOBI files. The reason is Amazon. You see, EPUB devices have evolved and changed and upgraded and gone the way all technology goes, ever upward and onward. But the device makers built the newer devices around the existing ebook platform. So an EPUB ebook formatted five years ago will work pretty much the same on a new iPad as it did on a first generation Nook. Amazon went bass-ackwards. They built the new devices then tinkered and recreated entirely new ebook platforms to fit the new devices. So a MOBI file being sold on Amazon isn’t just a MOBI file. It’s also a KF/8 file and an iOS file and an AZW3 file and god knows what else is there. I don’t quite get all the technical stuff. What I do get is that the same ebook can work fine on a Kindle Fire, but go to hell on a Paperwhite and look okay on a Kindle Keyboard and turn into gibberish if an iPad user gets hold of it.

The whys and wherefores don’t matter as much as the fact that a file formatted in a program which is optimal for printing documents and then converted with a program that is at its best with EPUB files, is going to have trouble meeting the very odd demands of Kindles.

(By the way, if you are using Scrivner or InDesign to create your ebooks for sale on Amazon, you will run into the same exact problems because Amazon is constantly tweaking and fiddling with the platform(s) and updating devices and they don’t necessarily share what they’ve done with the rest of the world.)

I realize that none of what I just wrote is going to dissuade people from using Calibre to convert their Word docs into MOBI files to sell on Amazon. I know this because people are using Word because that’s the program they know and love(hate) and they need a way to convert those Word files and Calibre is the shortest distance between A and B.

So instead of wagging my finger and clucking my tongue, I did some research. Question: Is it possible to format a file in Word and convert it with Calibre and create a MOBI file good enough to sell on Amazon? (Here, I make a very clear distinction. If your Nook died and you bought a Kindle, and you want to convert all your Nook books into MOBI files you can load onto your Kindle, Calibre is a great tool. That’s personal use. You expect that the ebook might not work completely right, but that’s okay, at least you have it. You can’t ask your paying customers to accept that standard.)

What I discovered is: Yes, it is possible.

I managed to fix the worst problems I see with Calibre-converted ebooks. I managed to create ebooks that respond properly to all the user preferences in three generations of Kindles (Kindle Keyboard, Paperwhite and Fire). I almost got Calibre to build a toc.ncx (what the user sees in the Go To features on Fires and Paperwhites) the way I want it to. I think with some more tinkering and fiddling around inside the opf file, I can fix that problem. I couldn’t get the cover to display on the bookshelf in my Paperwhite, but that’s kind of a non-issue, since Amazon will handle that when the book is uploaded. (It is only a big deal if a publisher is selling direct.)

Even though the ebooks I created this way aren’t up to my standards, they will respond to user preferences and they will look fine and read fine, and thus, they are good enough for uploading to Amazon.

There is a caveat. If you format your document, save it as an html file and convert it as is with Calibre, your ebook will be broken. It will be a substandard product you should not ask people to pay for. What you have to do first and foremost is format your Word file so it works within Calibre’s parameters, and secondly, you have to fix the html coding in the Word file.

Sound scary? It is, kind of. Word’s html coding is a nightmare, full of mso odd bits that give Kindles the hiccups. The good news is, all you really need to do is remove some very specific lines of code and rearrange a few others.

Since this post is running long and I don’t even have any pretty pictures to enliven it, (plus I have a buttload of Christmas gifts to wrap) I am going to explain how I did it in my next post. It’ll have pictures. In the meantime, if any of you, Dear Readers, have figured this out and feel like sharing in the comments, feel free.

Advertisement

And Yet Another Post on That Pest, The Em Dash

Pardon my obsession, folks, but it’s the little things that drive me nuts. The lowly em dash, one of my favorite punctuation marks, drives me nuts in ebooks.

Kindle mobi files are lovely things. You can read them on your Kindle, Kindle Fire, computer, tablet, phone, or whatever your preferred device. The device will helpfully fit the text to your screen, and on the Kindle (don’t know about other readers) it makes a fair attempt at justifying the text. Look at the above image and see what happens when the file runs into an em dash that it believes is part of the words it connects. A monster space.

I read a lot of ebooks. Improper formatting can hurt you, the self-publisher. Oh sure, the weird spacing, font size jumps, orphaned punctuation, blank pages and other little irritants are only that, irritants. It’s not often I run into something that makes the text unreadable–it has happened, though. Sometimes I have to just grit my teeth and ignore the errors. Sometimes the formatting errors are so bad I will refuse to purchase from that particular publisher (or writer) again.

Now that I am learning how to format ebooks those little details obsess me. Then, I began to notice something. The majority of orphaned punctuation and monster spaces were showing up in some ebooks, but not in others. The problem is most prevalent in reissued back list titles. Ah ha, I thought, OCR–Optical Character Recognition. Publishers were scanning printed books and converting them to ebook files. That’s all well and good, except OCR files have to be proofread with extreme care because the print doesn’t always translate properly. Plus, OCR reads

happy–unhappy

as one word. Thus the ereader treats it as one word, too, so if it comes at the end of a line, you end up with a monster space. In order to prevent that, the formatter needs to go in and manually insert a “No-Width Optional Break.”

So, that led to me experimenting with Word to see how it handles the em dash.

In the version of Word I use (Word 2000), you will notice that Word has decided that between the first word and the em dash there is a No-Width Non-Break, meaning the first word and the em are forever joined. Between the em dash and the second word there is a No-Width Optional Break. There is no space between the em dash and the words it connects, but when it comes time to wrap to fit the screen, the break occurs and thus, there is no monster space.

happy– unhappy <–How Word actually sees the em dash

This is also quite elegant because it never allows the em dash to occur at the beginning of the line (which is nitpicky, but I’m a nitpicky person who believes punctuation should always be presented in context). Problem solved, right? Not right. Look at my poor little orphan quote mark. Word treats the quote mark as a word so the No-Width Optional Break rule is applied.

If I were a techno-geeky kind of person, I could fix that. I’m not. My version of Word does not allow me to insert No-Width Optional Breaks or Non-Breaks. Since this is standard formatting language, you can find out if your word processor or Word version allows you to manually insert those commands. Find SPECIAL CHARACTERS (in Word it is under INSERT and then SYMBOL. It will open a box that will let you find SPECIAL CHARACTERS. If there is a shortcut code (Ctrl + Whatever + Whatever) next to the special character, you can insert the code. If not, your version doesn’t support it). I’m pretty sure there are updates or special files that can be downloaded to allow for the characters.

Moving on… Since I’m using Scrivener to format mobi files, I wanted to see how the program handles em dashes.

Scrivener inserts the No-Width Optional Break before and after the em dash. That’s not wonderful. It’s not nearly as bad as the monster space, but an em dash at the beginning of a sentence is out of context. Not much, only a smidge, but it’s enough to give sensitive readers a slight pause as they figure out the meaning of the punctuation. And because of that, if the em dash is at the end of a piece of dialogue next to a quote mark, you end up with an orphan.

So I went looking in Scrivener’s CHARACTER MAP. This is what I found.

Those little blank boxes are actually codes. You select one, copy it and then paste it into the text where you want. If you’ll look at the Scrivener text image at the bottom you will see that by using the Narrow No-Break Space I hooked up the “else” with the em dash and quote mark. No orphan. This means I can go through the manuscript with the Search function and customize the em dashes. This requires patience and attention to detail. This code does NOT show up on the screen. Your text will look the same with or without the inserted code.

Also, Scrivener sort of freaked me out by inserting a paragraph return along with the code, which makes no sense, but then that’s why I’m NOT getting the big bucks. I just backspaced and it worked fine.

So, what I have learned so far.

  • If you are using an OCR file, you need to go in and manually insert either Optional Breaks or Non-Breaks between the em dashes and the words they are connected to.
  • Test whatever program you are using to see how it handles breaks. If the default set-up is screwing up your formatting, you need to manually insert Optional Breaks and Non-Breaks. Watch out for orphans.

Is this important? I vote yes.