Pre-Production Check List: Cleaning Text

Hi, folks! Popping out of my mole hole for a breather. With so many writers getting ebooks ready for the holiday rush, it’s time for a quick refresher course in that most essential step: Cleaning up the text to get it ready for formatting.

Clean text is the key ingredient for a good looking ebook that works the way it’s supposed to. Over the course of doing a LOT of ebooks (seems like a thousand this week alone, heh) I’ve come up with a little check list to take me through the steps.

  • CHECK: Copy the file
  • CHECK: Tag special formatting–italics, underlining, bolding
  • CHECK: Scan for special styling–quotes, song lyrics, poetry, letters, etc.–tag those instances

Tagging. Because I use several different programs when working on one project, I’ve come up with tags that transfer from program to program without giving search functions fits. It doesn’t matter what tags you use as long as they are easy to find and don’t contain any characters that cause program meltdowns.

  • CHECK: Kill any tabs (I do this in Word because it’s so easy–one global search for ^t and a global replace with nothing. All done.)
  • CHECK: Turn ‘soft’ returns into ‘hard’ returns. Soft returns do funny things when copy/pasted back and forth. Easier to deal with them now. (UPDATE: It was pointed out that I didn’t explain how to do this. Oops. In Word, it is very easy. Search for ^l (lower case L) and do a global replace with ^p.)
  • CHECK: Copy/paste the entire file into a text editor

Why a text editor? Unlike a word processor, a text editor doesn’t add anything to the file unless I specifically tell it to. No hidden codes, no surprises. I use Notepad++, freeware that is powerful, easy to learn, and makes formatting ebooks in html a breeze.

  • CHECK: Eliminate extra spaces. Between sentences, after paragraphs, before paragraphs, between words. All must go.
  • CHECK: Tag scene breaks. Blank lines show up in manuscripts, often for no reason at all. I want to make sure a blank line is supposed to be there, so I tag all deliberately blank lines.
  • CHECK: Eliminate extra paragraph returns. Don’t need them, don’t want them, make them all go away. I usually leave a blank line where there is supposed to be a page or section break. All the rest go.
  • CHECK: Clean up special formatting tags. Rewriting and revising often leaves artifacts–italicized blank spaces, for instance. Also, when formatting with html, styling should be within a paragraph. There are rules. Making sure all the special formatting follows the rules makes my life easier.
  • CHECK: Search for inappropriate paragraph breaks. This is a real problem with books that have been scanned from print and restored via OCR. I search for paragraphs that begin with lower case characters or end without punctuation, and that finds most of the inappropriate breaks. (the rest are found in the final proofread)
  • CHECK: Search for reserved characters: straight quotes, straight apostrophes, ampersands, greater and lesser than brackets. These don’t always cause problems, but sometimes they do and that can cause interesting hiccups in an ebook. Easier to just turn them into named entities.
  • CHECK: Seek out non-ASCII characters and symbols. These will turn into question marks or bizarre symbols in the text editor. Ebook readers will not render them, so they must be turned into named entities.
  • CHECK: Standardize punctuation. Ebooks are real books, and require real printer’s punctuation. I go through and make sure em dashes are em dashes and not quickie writer shorthand, that ellipses all look the same, that apostrophes and quote marks are turned in the correct direction.

That checklist takes care of almost everything. Even though it sounds like a lot, most of the steps can be taken care of in one or two Find/Replace operations. Most manuscripts I work on can be cleaned up in less than an hour.

Even if you are formatting your ebook in a word processor or in Scrivener, this is good practice for every project. (Skip the steps about using named entities, but do check for non-ASCII characters) It will clean out the junk the programs put in and go a long, long way toward making your ebook look professional.

Have fun! (I’m headed back to the mole hole)