Pardon my obsession, folks, but it’s the little things that drive me nuts. The lowly em dash, one of my favorite punctuation marks, drives me nuts in ebooks.
Kindle mobi files are lovely things. You can read them on your Kindle, Kindle Fire, computer, tablet, phone, or whatever your preferred device. The device will helpfully fit the text to your screen, and on the Kindle (don’t know about other readers) it makes a fair attempt at justifying the text. Look at the above image and see what happens when the file runs into an em dash that it believes is part of the words it connects. A monster space.
I read a lot of ebooks. Improper formatting can hurt you, the self-publisher. Oh sure, the weird spacing, font size jumps, orphaned punctuation, blank pages and other little irritants are only that, irritants. It’s not often I run into something that makes the text unreadable–it has happened, though. Sometimes I have to just grit my teeth and ignore the errors. Sometimes the formatting errors are so bad I will refuse to purchase from that particular publisher (or writer) again.
Now that I am learning how to format ebooks those little details obsess me. Then, I began to notice something. The majority of orphaned punctuation and monster spaces were showing up in some ebooks, but not in others. The problem is most prevalent in reissued back list titles. Ah ha, I thought, OCR–Optical Character Recognition. Publishers were scanning printed books and converting them to ebook files. That’s all well and good, except OCR files have to be proofread with extreme care because the print doesn’t always translate properly. Plus, OCR reads
as one word. Thus the ereader treats it as one word, too, so if it comes at the end of a line, you end up with a monster space. In order to prevent that, the formatter needs to go in and manually insert a “No-Width Optional Break.”
So, that led to me experimenting with Word to see how it handles the em dash.
In the version of Word I use (Word 2000), you will notice that Word has decided that between the first word and the em dash there is a No-Width Non-Break, meaning the first word and the em are forever joined. Between the em dash and the second word there is a No-Width Optional Break. There is no space between the em dash and the words it connects, but when it comes time to wrap to fit the screen, the break occurs and thus, there is no monster space.
happy– unhappy <–How Word actually sees the em dash
This is also quite elegant because it never allows the em dash to occur at the beginning of the line (which is nitpicky, but I’m a nitpicky person who believes punctuation should always be presented in context). Problem solved, right? Not right. Look at my poor little orphan quote mark. Word treats the quote mark as a word so the No-Width Optional Break rule is applied.
If I were a techno-geeky kind of person, I could fix that. I’m not. My version of Word does not allow me to insert No-Width Optional Breaks or Non-Breaks. Since this is standard formatting language, you can find out if your word processor or Word version allows you to manually insert those commands. Find SPECIAL CHARACTERS (in Word it is under INSERT and then SYMBOL. It will open a box that will let you find SPECIAL CHARACTERS. If there is a shortcut code (Ctrl + Whatever + Whatever) next to the special character, you can insert the code. If not, your version doesn’t support it). I’m pretty sure there are updates or special files that can be downloaded to allow for the characters.
Moving on… Since I’m using Scrivener to format mobi files, I wanted to see how the program handles em dashes.
Scrivener inserts the No-Width Optional Break before and after the em dash. That’s not wonderful. It’s not nearly as bad as the monster space, but an em dash at the beginning of a sentence is out of context. Not much, only a smidge, but it’s enough to give sensitive readers a slight pause as they figure out the meaning of the punctuation. And because of that, if the em dash is at the end of a piece of dialogue next to a quote mark, you end up with an orphan.
So I went looking in Scrivener’s CHARACTER MAP. This is what I found.
Those little blank boxes are actually codes. You select one, copy it and then paste it into the text where you want. If you’ll look at the Scrivener text image at the bottom you will see that by using the Narrow No-Break Space I hooked up the “else” with the em dash and quote mark. No orphan. This means I can go through the manuscript with the Search function and customize the em dashes. This requires patience and attention to detail. This code does NOT show up on the screen. Your text will look the same with or without the inserted code.
Also, Scrivener sort of freaked me out by inserting a paragraph return along with the code, which makes no sense, but then that’s why I’m NOT getting the big bucks. I just backspaced and it worked fine.
So, what I have learned so far.
- If you are using an OCR file, you need to go in and manually insert either Optional Breaks or Non-Breaks between the em dashes and the words they are connected to.
- Test whatever program you are using to see how it handles breaks. If the default set-up is screwing up your formatting, you need to manually insert Optional Breaks and Non-Breaks. Watch out for orphans.
Is this important? I vote yes.