Pardon my obsession, folks, but it’s the little things that drive me nuts. The lowly em dash, one of my favorite punctuation marks, drives me nuts in ebooks.
Kindle mobi files are lovely things. You can read them on your Kindle, Kindle Fire, computer, tablet, phone, or whatever your preferred device. The device will helpfully fit the text to your screen, and on the Kindle (don’t know about other readers) it makes a fair attempt at justifying the text. Look at the above image and see what happens when the file runs into an em dash that it believes is part of the words it connects. A monster space.
I read a lot of ebooks. Improper formatting can hurt you, the self-publisher. Oh sure, the weird spacing, font size jumps, orphaned punctuation, blank pages and other little irritants are only that, irritants. It’s not often I run into something that makes the text unreadable–it has happened, though. Sometimes I have to just grit my teeth and ignore the errors. Sometimes the formatting errors are so bad I will refuse to purchase from that particular publisher (or writer) again.
Now that I am learning how to format ebooks those little details obsess me. Then, I began to notice something. The majority of orphaned punctuation and monster spaces were showing up in some ebooks, but not in others. The problem is most prevalent in reissued back list titles. Ah ha, I thought, OCR–Optical Character Recognition. Publishers were scanning printed books and converting them to ebook files. That’s all well and good, except OCR files have to be proofread with extreme care because the print doesn’t always translate properly. Plus, OCR reads
happy–unhappy
as one word. Thus the ereader treats it as one word, too, so if it comes at the end of a line, you end up with a monster space. In order to prevent that, the formatter needs to go in and manually insert a “No-Width Optional Break.”
So, that led to me experimenting with Word to see how it handles the em dash.
In the version of Word I use (Word 2000), you will notice that Word has decided that between the first word and the em dash there is a No-Width Non-Break, meaning the first word and the em are forever joined. Between the em dash and the second word there is a No-Width Optional Break. There is no space between the em dash and the words it connects, but when it comes time to wrap to fit the screen, the break occurs and thus, there is no monster space.
happy– unhappy <–How Word actually sees the em dash
This is also quite elegant because it never allows the em dash to occur at the beginning of the line (which is nitpicky, but I’m a nitpicky person who believes punctuation should always be presented in context). Problem solved, right? Not right. Look at my poor little orphan quote mark. Word treats the quote mark as a word so the No-Width Optional Break rule is applied.
If I were a techno-geeky kind of person, I could fix that. I’m not. My version of Word does not allow me to insert No-Width Optional Breaks or Non-Breaks. Since this is standard formatting language, you can find out if your word processor or Word version allows you to manually insert those commands. Find SPECIAL CHARACTERS (in Word it is under INSERT and then SYMBOL. It will open a box that will let you find SPECIAL CHARACTERS. If there is a shortcut code (Ctrl + Whatever + Whatever) next to the special character, you can insert the code. If not, your version doesn’t support it). I’m pretty sure there are updates or special files that can be downloaded to allow for the characters.
Moving on… Since I’m using Scrivener to format mobi files, I wanted to see how the program handles em dashes.
Scrivener inserts the No-Width Optional Break before and after the em dash. That’s not wonderful. It’s not nearly as bad as the monster space, but an em dash at the beginning of a sentence is out of context. Not much, only a smidge, but it’s enough to give sensitive readers a slight pause as they figure out the meaning of the punctuation. And because of that, if the em dash is at the end of a piece of dialogue next to a quote mark, you end up with an orphan.
So I went looking in Scrivener’s CHARACTER MAP. This is what I found.
Those little blank boxes are actually codes. You select one, copy it and then paste it into the text where you want. If you’ll look at the Scrivener text image at the bottom you will see that by using the Narrow No-Break Space I hooked up the “else” with the em dash and quote mark. No orphan. This means I can go through the manuscript with the Search function and customize the em dashes. This requires patience and attention to detail. This code does NOT show up on the screen. Your text will look the same with or without the inserted code.
Also, Scrivener sort of freaked me out by inserting a paragraph return along with the code, which makes no sense, but then that’s why I’m NOT getting the big bucks. I just backspaced and it worked fine.
So, what I have learned so far.
- If you are using an OCR file, you need to go in and manually insert either Optional Breaks or Non-Breaks between the em dashes and the words they are connected to.
- Test whatever program you are using to see how it handles breaks. If the default set-up is screwing up your formatting, you need to manually insert Optional Breaks and Non-Breaks. Watch out for orphans.
Is this important? I vote yes.
Why not save all the hassle and use space N-dash space as I do? No problems on the Kindle with awkward line breaks if you do that.
Is that in Word, Lexi? Can you demonstrate? I am looking for less hassle, yes indeed.
This is an amazing (to me) discovery. I have no idea if my em-dashes are FUBAR on a reader screen–they looked fine in preview but, you know, I don’t know’s as I trust a preview to be WYSIWYG.
N dash should be a special character (and you should be able to create a special macro ctrl + key for any special character you want). Also in most versions of Word if you type space hyphen space it should auto-correct to an N-dash.
Lily, I went round and round with my version of Word trying to figure out a way to use all the advanced special characters. (I swear those help pages were written by some someone who time traveled from Ancient Greece!). In order to do that I have to do weird things to my program and that, quite frankly, scares me (my computer and I have a love/hate relationship and it is always daring me to go on, try something stupid, see what happens). Other versions of Word do not have the same problem. Hence the jump to Scrivener. I’m finding out that combining the features of the two programs gives me pretty much everything I want (along with more opportunities to screw up, sigh…) It involves a few extra steps, but the results are worth it.
Sorry, it just clicked that you all are talking about the en dash. I’ve seen that technique in several ebooks and I don’t care for it. It looks like wayward hypens to me. I honestly don’t know how many readers notice the difference, or if it matters to them if they do notice. It bugs me, though, so I’m going to keep screwing around with em dashes.
I have tried using “word space em dash space word” or “word em dash space word” but that still leaves orphans floating and it can look really bad if the justification pulls everything apart.
I know it is not possible right now to overcome every formatting liability or to make ebooks as beautiful as print books, but anything that makes them more of a pleasure to read is worth obsessing about. (to me, because I’m OCD that way)
One thing I have discovered about the previewer is that you can adjust the size of the image and get a better idea about the layout. I have a couple of projects I am going back to reformat and reload to make them look better. I have been learning tons about this stuff.
ISHBEL – Where’s my Aspirin?
xx
What? You’re not utterly fascinated by my weird little obsessions, Tom? I’m astonished. Heh.
I know, Tom. This is like calculus. Can’t do it. But this is also why some ebooks have totally weird dead space.
You know, one of these days I might start obsessing about something REALLY important. Then the world better watch out.
Oh my, I didn’t realize there was so much involved…I thought I would just copy/paste the whole thing…
I remember checking out one of the links you provided in another post that went to someone who codes ebooks. It scared me.
Well, I guess I’ll jump that hurdle when it comes along. Thanks for alerting us to the potential pitfalls of using one program over another. It seems Scivener has advantages. One more reason to get it.
I’m currently putting together a printable cheat sheet. Formatting for ebooks isn’t difficult at all. It’s just that there are a whole bunch of little details that must be attended to. I run into problems when I overlook or mishandle one, or misinterpret how the programs talk to each other. The nice thing about it is if it’s screwed up, there’s a always a chance to do over.
Try using the html code…
em dash is and em’s smaller sibling en dash is
I used © for copyright until I discovered © gives the symbol and is easy to remember.
Whoops, forgot to format the text so it will show properly:
em dash is and em’s smaller sibling is
The copyrights were © and ©
Thanks, S.C. I’ve been playing with HTML, but it is not locking in my brain and I have to keep looking stuff up. It’s been frustrating, but I’ll hang with it and maybe learn something useful.
Jaye, you are trying to woo me. One of the things that drives me buggy about ePubs is the terrible formatting. I look forward to learning more of your tricks to hack the apparent limitations of digital formats.
What amazes me (though it probably shouldn’t) is that the indies are doing a much better job overall than traditional publishers at finding and fixing the glitches, and learning how this stuff works. Indies seem to care a whole lot more about the reader experience. That’s a very good thing.
The codes in the character map in Scrivener (Windows) was EXACTLY what I needed. Was having trouble representing a suffix, wanted the hyphen to stay together with the letters rather than get separated on a soft line break. I had to flail around with the the Zero Width Joiner and the Narrow No-break Space, since just pasting in the former kept getting me a hard line break–I think I ended up pasting in both and then deleting one of them and then it magically WORKED–but I finally got the result I was looking for. Thank you!