Why You Shouldn’t Format Your Word Docs

Dungeon babyThere’s a reason my ebooks are superior–two reasons, actually–and neither has anything to do with my technical prowess (I don’t have much) or talent (anyone can do what I’m about to tell you).

Reason Number One: Pre-production, I clean the text. As soon as a document comes up in the queue, I open it and start stripping it of everything that can mess up an ebook: extraneous paragraph returns, extra spaces, and tabs. I tidy up punctuation, tag areas that require special coding, neaten italics and check for special characters that won’t translate. As a writer and editor myself, I know most of the writer tricks and have a rather lengthy list of things to look for. By the time I’m ready to start coding, the text is so clean it squeaks.

Reason Number Two: Post-production, the ebook is proofread. I don’t care who proofreads the ebook. I can do it, the writer can do it, the writer can hire the job out to someone else. I give the writer a proof copy of the ebook and a mark-up document and encourage them to be as picky as they can stand. Even if they hire me to proofread, they still get the proof copy to load on a device or their computer so they can check the formatting and layout. The point is to find mistakes before the readers do. The point is to make sure the ebook works properly.

I am shocked and appalled that every single person who produces ebooks doesn’t do the exact same thing. They don’t and I know they don’t because I read ebooks that are filled with the types of errors and hiccups that text cleaning and proofreading would have rooted out.

The trad pubs are actually worse offenders than are indies, especially when it comes to back list. I can see it with my own eyes, but it’s amusing to see a publisher admit it publicly on The Passive Voice blog:

J.A. Our experience with Kindle is that as soon as a customer complains they take down the file and send the publisher a takedown notice. It’s actually a real pain in the neck. It could be one person complained and something very minor. We get them occasionally and we fix them right away. They give the reader a credit for the download. I should add that when files are converted they generally aren’t checked page for page like a print book might normally be. We rely on the conversion house to do a good job. If we keep catching errors or getting complaints we would change vendors. We pay pretty good money for these conversions. Our books are almost all straight text so conversions aren’t generally a major issue, but books with columns or charts, or unusual layouts do cause problems and need to be checked carefully. –Steven Zacharius, CEO, Kensington Books

Emphasis mine.

Having personally cleaned up well over a million words of scanned and OCR’d text, that statement offends the shit out of me. Writers deserve better. Readers deserve better.

So what’s that got to do with formatting Word docs? Everything.

If you’re a Do-It-Yourselfer, and are formatting your own ebooks, you cannot skip these steps. (On a sidenote, my biggest gripe with Smashwords is how difficult they make it to proofread an ebook. An upload has to go through the whole publishing process before you can look at it live on a device. Depending on how fast you are at proofreading, the ebook can be live–all goofs intact–for weeks before you can fix them and go through the process again.) My suggestion for the indie formatting Word docs for Smashwords (or any other distributor who accepts Word docs) is to convert them first with a program like Calibre and proofread the results. Find and fix problems before uploading the Word doc to Smashwords.

If you’re hiring a formatter, find out first if they clean up your file pre-production. Many do not. If that’s the case, you need to do the cleaning. Some pros charge by the hour to clean up the Word doc. The more elaborately you’ve formatted your document, the longer it will take to clean it up and the more expensive it will be. (Not to mention wasting your own time on needless work.) My suggestion, if you have special requirements, arrange for a system of tags to let the formatter know what you want. I ask writers to put instructions inside square brackets, i.e. [HEADLINE, PUT IN SMALL CAPS, CENTERED, EXTRA SPACE ABOVE AND BELOW].

Find out, too, the professional’s policy on proofreading. Do you get a proof copy? Does the formatter charge extra to input changes and corrections? (I charge for actual proofreading, but I don’t charge to input changes and corrections from somebody else’s proofread.) If you are not allowed to make post-production changes to your ebook, find another service. Trust me, no matter how well edited, cleaned and formatted the file is going in, you will find something to fix while proofreading. (Gremlins!)

So, for you writers working in Word, one final suggestion: Post the following where you can see it while you work and keep repeating it until it sinks in:

What I see on the computer screen is NOT how how my text will look, or act, in an ebook.

Find and Replace: Do It Once, Do It Twice

Ol' Lew has taken quite nicely to the digital age.

Out of all the small jobs that make up the big job of getting a book ready for publication, proofreading is the job nobody wants. It is NO FUN.

It’s exacting, it’s painstaking, it reduces an otherwise interesting piece of writing into boring little components that must be examined individually. If your attention wanders or if you get caught up in the story (it’s harder to proofread a rousing good story than a so-so one), you can miss errors. Ideally, any project should have at least two proofreaders. This isn’t an ideal world, however, and not everybody has the funds or the qualified (and indulgent) friends to get two reads.

When I build an ebook, I either proofread it myself or send a proof copy to the writer to proofread. Sometimes we both proofread it. All in the hopes of rooting out the boo-boos and gremlins before a paying customer does.

I have, of course, learned a few tricks (of course) along the way. One of the most valuable tools in my arsenal (second only to Webster’s 9th) is the Find/Replace function. This is especially true since I have found that most writers have a tendency to repeat mistakes. One does need to be careful, though, about global FIND/REPLACE. Or you might end up with something like this:

Barnes & Noble was briefly suspected of employing an outrageous anti-Amazon marketing strategy in May after blogger Philip Howard noticed that a version of Tolstoy’s “War and Peace” sold by the chain store had substituted “nook” for every instance of the word “kindle” throughout the text, resulting in sentences like, “It was as if a light had been Nookd in a carved and painted lantern….” The e-book turned out to have been published by a third-party company, Superior Formatting Publishing, who issued an apology (still posted on the company’s Web home page) explaining that it had accidentally applied the “find and replace” function to the entire text when reformatting the Kindle version of the book for the Nook platform.

The stuff of a proofreader’s nightmares.

Every text handling program has its own set of rules and functions. I can’t possibly cover them all here. I suggest you play with your program’s FIND/REPLACE function and figure out what it can and cannot do. The one thing that every program has in common is that it searches for a unique string of characters. That unique string can include spaces and punctuation.

There are some F/R searches I do as a matter of course. The first is for extra spaces. Extra spaces are the bane of ebooks. They all need to be rooted out. I run searches for double spaces between sentences within paragraphs, and for extra spaces at the beginnings and ends of paragraphs. I also run searches for extra paragraph returns.

The second routine search I do is for backward quote marks and apostrophes. MS Word, especially, has a bad habit of turning quote marks the wrong way, especially when the quote marks are connected to em or en dashes or at the beginning of truncated words. Here the basic rules of grammar are useful. For instance, the left double quote belongs at the beginning of a quoted passage. I will search for a space right double quote or a paragraph return or new line right double quote. I run the opposite search for wrong-way right double quotes by looking for left double quotes at the end of sentences.

Another routine search is for proper names and place names. When I proofread I make a list of preferred spellings. Flying fingers or attention lapses trip up writers. Sometimes the misspellings look right and are easily missed. Take my name for instance. “Jay” looks right, but I spell it “Jaye.” I’ll do a search for “Jay” and “Jay’s” to catch any instances where the “e” was dropped.

The same thing goes for preferred spellings. A word such as “judgment” is also correctly spelled as “judgement.” It doesn’t matter to me what the writer prefers–consistency is my fallback. If the writer prefers the former, I will do a search for the latter and change any instances I find.

I’ve worked on quite a few backlist books that have been scanned and run through OCR. Do enough of them and you start recognizing common OCR errors. For instance, misreading the letter “e” as a “c”. Spell check will catch the most egregious errors, but if the text is supposed to be “eat” and the OCR reads it as “cat” then spell check is useless. It doesn’t take much time to run a search for the word “cat” to make sure each usage is what the writer intended. Another common problem with scanned books is that typesetters often use hyphens and en dashes to space text on a line. Finding those is a bear, but F/R is a big help in rooting out the many permutations that end up as errors in an ebook.

I can’t possibly cover every F/R trick. If you, while you are proofreading your own work, get into the habit of assuming you have a tendency to repeat certain errors, you can use F/R to help you create a cleaner ebook. If you find a goof, run a quick search to see if you repeated it elsewhere.

Check List of Common Errors That Can Be Found with FIND/REPLACE:

  • Extra Spaces
  • Extra Paragraph Returns
  • Proper Names
  • Place Names
  • Quote Marks (single and double)
  • Hyphenated Words
  • Preferred Spellings
  • Italicized Foreign Words (yes or no, but be consistent)
  • Em and en dashes, and hyphens