Indie Writers: Make MS Word Work for You Instead of Against You

A Quick Primer for Fiction Writers in using Microsoft Word in the Digital Age

It always saddens me a little when a writer sends me an overly formatted Word doc to turn into an ebook or print-on-demand. It’s not that I have to clean it up–I can strip and flip the messiest files in less than an hour. What bugs me is how much thought and effort the writer wasted on utterly useless manuscript styling.

Example of a Word doc that has been overstyled.

Example of a Word doc that has been overstyled.

The majority of writers I work with use Word. The vast majority have no idea how to use Word for their own benefit. I understand. I was a fiction writer for over two decades and even though I have been using computers and a variety of word processing programs since the late ’80s, it wasn’t until I started learning book production that I figured out how those programs worked. Why would I? All I needed was a printed manuscript in standard format to mail to my editor. Word processors made that easy.

Now I produce books for digital and print, and those old ways of “thinking print” make the writer’s job harder. Especially indie writer/publishers who might be doing it all alone or working with contractor editors and proofreaders and formatters.

Since it would take a full book–or volumes–to explain how word processors work, I’m going to urge you all to take what I tell you in this post and play around in your word processor. I will be talking about MS Word, but much of what I show you will apply to almost any word processor.

STUFF YOU DON’T NEED AND NEED NEVER USE AGAIN

  • Tabs
  • Page breaks
  • Headers
  • Footers
  • Page Numbers
  • More than one space for any reason
  • More than two hard returns for any reason
  • Multiple fonts
  • Text boxes
  • Justification
Example of a manuscript that uses NONE of the above.

Example of a manuscript that uses NONE of the above.

STUFF THAT MAKES WORD “WORK” FOR YOU

  • Style sheets (fiction writers can get away with using only two or three, four at the most)
  • Find/Replace
  • Save As
  • Web View
  • “Show” feature
  • Formatting tags
(Left) Basic manuscript formatting; (Right) Overly formatted manuscript.

(Left) Basic manuscript formatting; (Right) Overly formatted manuscript.

See that backward P-looking icon I’ve circled? That’s the “show” feature. Toggle it on and you can see paragraph returns, spaces, tabs and a few other formatting features. With the basic formatting on the left, all I had to do was apply one style (Normal) to the entire manuscript, then apply heading styles to the chapters and sections, and done. To style an entire manuscript takes minutes this way. The manuscript on the right is an entirely different matter. To get it looking the way I want would take hours, if not days, manually lining everything up, trying to get it to look the way I want it. Worse, I have to remember what I’ve done so I can remain consistent throughout. When I’m done, I still have to scroll endlessly through the entire document to find whatever I might need to find.

And what about what is happening behind the scenes? MS Word uses html to control all those features. If you’re printing a document, the only true concern you have is making sure your fonts print properly. If you’re turning your work into an ebook, all that hard work (and useless effort) works against you.

The html in the basic Word doc and how it displays in Firefox.

The html in the basic Word doc and how it displays in Firefox.

The overly formatted file in html and how it displays in Firefox.

The overly formatted file in html and how it displays in Firefox.

So let’s make Word work for you. The NUMBER ONE thing (print it out and blow it up to poster size and post it where you can see it while you work) is:

IT DOESN’T MATTER A RAT’S PATOOT WHAT YOUR WORKING DOCUMENT/SOURCE FILE LOOKS LIKE

(Seriously, if your Happy Place while composing fiction involves Comic Sans font, 22pts, with 2 inch margins, triple spaced, then go for it. The only time it matters what your document looks like is when you intend to print.)

STYLE SHEETS

Set them and forget them; the best tool in the MS Word

Set ‘em and forget ‘em; the best tool in MS Word

Every version of Word has a style sheets feature. If you’re using 2010, you’ll find them in the “Home” toolbar. Word comes with a huge variety of pre-built style sheets. You can use them as-is or modify them. You can create your own style sheets. The most useful styles for the fiction writer are: Normal, Heading 1, Heading 2.

  • Normal: apply to the body of your text. Set your paragraph indents, line spacing, and font. Never worry about spacing, margins and indents again.
  • Heading 1 & 2: apply to titles, chapter heads or sections. Bonus: Word will automatically list your headings in the navigation window. No more scrolling through a long document to find a specific chapter or section. Another Bonus: Ebook conversion programs recognize heading styles. Some, like Calibre, will automatically build a table of contents for you based on headings 1 & 2.

Additional styles fiction writers might find useful:

  • Emphasis: Remember, styles apply to paragraphs. “Emphasis” is italics. If your entire paragraph is italicized, use “emphasis”.
  • Strong: “Strong” is bold.
  • Custom style–“Center”: Instead of clicking on the icon for centering, create a style sheet. Makes life easy.
  • Poetry: For poetry, quotes, lyrics, anything you want with different margins and font style.

FIND/REPLACE

This is the most useful and the most underused tool in MS Word. You can use it to not only find words, you can find special characters, styles, highlighting, and special formatting (such as italics or bold).

Click on the dropdown menus and you can look for anything that appears.

Click on the dropdown menus and you can look for anything that appears.

A few useful search terms:

  • ^& (caret ampersand): Stands for a string of text. Say I want to tag my italics. I would leave the Find box blank, but ask it to search for italics. In the Replace box I’d type -STARTI-^&-ENDI-, do a Replace All and Word will wrap all my italicized text in tags.
  • ^p : Hard return. You can search for them or insert them
  • ^l  (caret lower case L): Soft return (shift enter)
  • ^t : Tab. Working on a document in which you or someone else used tabs and want to kill them all? Type ^t in the Find box, leave the Replace box blank, and do a Replace all. Done.
  • * (asterisk): A string of text. Use as a ‘wild card’ when you’re restoring your special formatting. Say I want to restore my italics. In the Find box type -STARTI-*-ENDI-, click the ‘wild card’ box, and leave the Replace box blank but ask it to replace text with italics. Do a Replace All and all your tagged text is italicized. Then use Find/Replace to get rid of the tags.

SAVE AS

When I’m working on a project, I might have four, five, ten versions of a file. If I’m making major formatting changes, I NEVER EVER mess with my source file. Let’s say I want a printed version. I do a Save As to make a new version that is named Print_Docname_date. Then I apply headers/footers, page numbers, page breaks and modify my styles to make it suitable for printing. My original source file remains unchanged and ready to use. Using Save As is the best habit you can get into while you’re working. (And it’s not like you’re having to save your work to floppy disks–your computer has lots of space. Use it!)

WEB VIEW

Basicformat4Forsake print view and get used to web view while you work. This view is flexible (flow text) and enables you to easily display multiple screens and compare text while you work. You can adjust the width of your screen, too, and not lose chunks of text or reduce the image size in order to see everything.

FORMATTING TAGS

Because I use a variety of programs, and I dislike intensely losing formatting such as italics or trying to remember where I want a block of offset text, I tag my formatting. Now, because Word is html-based, you do NOT want to use html tags in your text. It’s okay if you’re outputting a file to a text editor, but if you’re going to a program that is html-based such as Scrivener or InDesign, or if you intend to bring the text back (you’re ‘nuking’ it, according to Smashword’s style guide), then those html tags are going to seriously mess things up.

My tags are arbitrary. I’ve come up with them because they are unique and easy to search for; they don’t show up in text (normally). Feel free to use mine if you want or come up with something that makes sense to you to use. IMPORTANT TO REMEMBER: Special formatting such as italics or bolding require OPEN and CLOSE tags.

  • Italics: -STARTI- (open) -ENDI- (close)
  • Bold: -STARTB- -ENDB-
  • Underline: -STARTU- -ENDU-
  • For any special formatting such as headlines, poetry, etc: -SPECIAL- (this tag is a note to myself)
  • Placing Images: -IMAGE-
  • Scenebreaks or deliberate blank lines: ##

That’s it. Simple, no? This is MS Word in the digital age, a writing tool you can make work for you instead of against you.

 

Managing File Sizes for Ebooks

The majority of fiction writer/publishers will not run into overall file size problems. Text doesn’t create monster files. Using graphics or illustrations can add significantly to the overall file size, but I’ve yet to create an ebook that exceeds –or even comes close to–Amazon’s 50MB limit (which may be changing due to the introduction of the new Fire HD tablets). Even with illustrations and graphics, I do my best to keep the overall file size under 5MB because of Amazon’s delivery fees ($.15 per MB). Those fees are charged against the publisher and can eat up royalties quickly.

As I said, most fiction writer/publishers will not run into problems with overall file size.

Where fiction writer/publishers do run into problems are with the size of individual chapter files within the ebook. When you use <h1> or <h2> tags in html, or the Heading 1 or Heading 2 style in a word processor, you are alerting the conversion programs (such as Calibre or KindleGen) that this is a new chapter and should be split into a new file.* If you don’t use the headings or tags, the conversion programs look for certain words–Chapter, Part, Section, etc.–to determine where the file should be split. What is NOT reliable at all is using page breaks (in a word processor) or the “page-break-before” command in html/CSS. (I have absolutely no idea why those work sometimes, but sometimes they don’t–my best guess is the whims or moods of the Digital God.)

I always split html (text) files into chapters or parts, which manages the overall ebook very nicely. Even though this example is from a novel (Prophet of Paradise by J. Harris Anderson) that is almost 200,000 words long, notice the size of the individual chapters:

File Size

What happens if you don’t use tags or headings and your chapters have titles the conversion programs don’t recognize? What happens if you don’t have chapters at all and your ebook is deliberately one long tract? If it runs up against the 300KB file size limit (approximately 45,000 words), several things could happen:

  • Your file fails to convert
  • The conversion program inserts page breaks whether they are appropriate or not
  • The file converts, but some devices tell the user the ebook can’t be loaded

If your files are less than 300KB, but still largish (over 150KB) your readers could experience serious screen lag as they page through your story. This is an important consideration for genre fiction writers since the chances are your readers are Super-Readers and might have hundreds or even thousands of ebooks loaded on their devices. They will not be happy if your file sizes and their addiction cause several seconds of lag every time they “turn” the page.

What to do?

  • If you are using a word processor to style your ebooks, use the Heading 1 and Heading 2 styles for your chapters, parts and sections. (Do NOT depend on the conversion programs to recognize your inserted page breaks!)
  • If you are styling in html, use the <h1> and <h2> tags.
  • If your project does not have natural breaks such as chapters or parts (it’s long short story or novella) consider a minor restructure. Use the page count as your guide and try to find natural breaks around the 15,000 word mark–a scene break or time or pov shift or even an illustration that sits on its own “page”.

* If you are using Calibre to convert your ebooks, you can check the file splits in Calibre’s EPUB editor. You’ll see the list of individual text/html files and can open each one on the viewer/edit screen. If you are experiencing inappropriate page breaks, you can manage the fixes in the editor.

 

 

Why You Shouldn’t Format Your Word Docs

Dungeon babyThere’s a reason my ebooks are superior–two reasons, actually–and neither has anything to do with my technical prowess (I don’t have much) or talent (anyone can do what I’m about to tell you).

Reason Number One: Pre-production, I clean the text. As soon as a document comes up in the queue, I open it and start stripping it of everything that can mess up an ebook: extraneous paragraph returns, extra spaces, and tabs. I tidy up punctuation, tag areas that require special coding, neaten italics and check for special characters that won’t translate. As a writer and editor myself, I know most of the writer tricks and have a rather lengthy list of things to look for. By the time I’m ready to start coding, the text is so clean it squeaks.

Reason Number Two: Post-production, the ebook is proofread. I don’t care who proofreads the ebook. I can do it, the writer can do it, the writer can hire the job out to someone else. I give the writer a proof copy of the ebook and a mark-up document and encourage them to be as picky as they can stand. Even if they hire me to proofread, they still get the proof copy to load on a device or their computer so they can check the formatting and layout. The point is to find mistakes before the readers do. The point is to make sure the ebook works properly.

I am shocked and appalled that every single person who produces ebooks doesn’t do the exact same thing. They don’t and I know they don’t because I read ebooks that are filled with the types of errors and hiccups that text cleaning and proofreading would have rooted out.

The trad pubs are actually worse offenders than are indies, especially when it comes to back list. I can see it with my own eyes, but it’s amusing to see a publisher admit it publicly on The Passive Voice blog:

J.A. Our experience with Kindle is that as soon as a customer complains they take down the file and send the publisher a takedown notice. It’s actually a real pain in the neck. It could be one person complained and something very minor. We get them occasionally and we fix them right away. They give the reader a credit for the download. I should add that when files are converted they generally aren’t checked page for page like a print book might normally be. We rely on the conversion house to do a good job. If we keep catching errors or getting complaints we would change vendors. We pay pretty good money for these conversions. Our books are almost all straight text so conversions aren’t generally a major issue, but books with columns or charts, or unusual layouts do cause problems and need to be checked carefully. –Steven Zacharius, CEO, Kensington Books

Emphasis mine.

Having personally cleaned up well over a million words of scanned and OCR’d text, that statement offends the shit out of me. Writers deserve better. Readers deserve better.

So what’s that got to do with formatting Word docs? Everything.

If you’re a Do-It-Yourselfer, and are formatting your own ebooks, you cannot skip these steps. (On a sidenote, my biggest gripe with Smashwords is how difficult they make it to proofread an ebook. An upload has to go through the whole publishing process before you can look at it live on a device. Depending on how fast you are at proofreading, the ebook can be live–all goofs intact–for weeks before you can fix them and go through the process again.) My suggestion for the indie formatting Word docs for Smashwords (or any other distributor who accepts Word docs) is to convert them first with a program like Calibre and proofread the results. Find and fix problems before uploading the Word doc to Smashwords.

If you’re hiring a formatter, find out first if they clean up your file pre-production. Many do not. If that’s the case, you need to do the cleaning. Some pros charge by the hour to clean up the Word doc. The more elaborately you’ve formatted your document, the longer it will take to clean it up and the more expensive it will be. (Not to mention wasting your own time on needless work.) My suggestion, if you have special requirements, arrange for a system of tags to let the formatter know what you want. I ask writers to put instructions inside square brackets, i.e. [HEADLINE, PUT IN SMALL CAPS, CENTERED, EXTRA SPACE ABOVE AND BELOW].

Find out, too, the professional’s policy on proofreading. Do you get a proof copy? Does the formatter charge extra to input changes and corrections? (I charge for actual proofreading, but I don’t charge to input changes and corrections from somebody else’s proofread.) If you are not allowed to make post-production changes to your ebook, find another service. Trust me, no matter how well edited, cleaned and formatted the file is going in, you will find something to fix while proofreading. (Gremlins!)

So, for you writers working in Word, one final suggestion: Post the following where you can see it while you work and keep repeating it until it sinks in:

What I see on the computer screen is NOT how how my text will look, or act, in an ebook.

Word to Calibre to MOBI: Part 2: The html File

You finished Part 1 of this tutorial. Now on to Part 2. If you’re not familiar with html, what happens next is going to be freaky. But trust me, if you can copy/paste, you can do this.

NOTE: If your ebook is as simple as the one I’m using as an example, with no images and limited styles, you can stop right now and directly upload your Word file to Amazon. It will convert just fine and work well.

STEP 1: Do a Save As of your styled .doc file as an html file. It will look something like this:

CAL5Now you are done with Word.

STEP 2: Open your html file in Notepad++

Holy Moley! This is what it looks like?!?

CAL6CAL7STEP 3: Turn your special formatting tags into proper html tags

  • Italics <i> </i>
  • Bold <b> </b>
  • Underline <u> </u>

Easy to do with Find/Replace in Notepad++.

CAL8

Very important. ALL tags that are open must be closed. So if you have <i> for italics, then you must have </i> to close the tag. So use Find/Replace and make sure your numbers match up (Notepad++ will tell you how many items it replaced)

STEP 4 (Optional): Get rid of soft returns. Word has a nasty habit of inserting soft returns at the end of lines in paragraphs. In theory, they are meaningless. If you leave them in, they won’t affect your ebook very much. I have noticed, however, that they cause a wobbly quality to the justified text and some unusual behavior in line spacing. Not enough to affect reading quality, but enough to bug hyper-sensitive readers (like me). I prefer to remove them. If they bug you, too, let me know and I’ll show you how to use Find/Replace in Notepad++  to quickly remove them.

CAL9STEP 5: Get rid of the Section junk. If you styled your document the same way I did, you will have two lines of code–one at the beginning that says something like <div class=Section1> and a closing tag at the end of the document, </div>. They are extraneous. Delete them.

CAL11CAL10(by the way, if your Notepad++ file doesn’t look the same as mine, it’s because I have turned off word wrap and eliminated the extra soft returns)

STEP 6: Extract your styles. In my example there are three: MsoNormal, Center, and h1. Select them, copy them and paste them into a new text file.

This is what they look like. Comments in italics are mine.

h1
{mso-style-next:Normal; (Word junk, delete)
margin-top:48.0pt; (We are going to change this)
margin-right:0in;
margin-bottom:48.0pt;
margin-left:0in;
text-align:center;
page-break-before:always;
mso-pagination:none; (Word junk, delete)
mso-outline-level:1; (Word junk, Delete)
font-size:14.0pt; (We are going to change this)
mso-bidi-font-size:16.0pt; (Word junk, delete)
font-family:”Times New Roman”; (Delete)
mso-bidi-font-family:Arial; (Delete)
mso-font-kerning:0pt;} (Delete)

p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-parent:””; (Junk, Delete)
margin:0in; (Delete)
margin-bottom:.0001pt; (Delete)
text-indent:.3in; (Change)
mso-pagination:none; (Delete)
font-size:12.0pt; (Delete)
font-family:”Times New Roman”; (Delete)
mso-fareast-font-family:”Times New Roman”;} (Delete)

p.Center, li.Center, div.Center
{mso-style-name:Center; (Delete)
margin-top:6.0pt; (Change)
margin-right:0in;
margin-bottom:6.0pt;
margin-left:0in;
text-align:center;
mso-pagination:none; (Delete)
font-size:12.0pt; (Delete)
font-family:”Times New Roman”; (Delete)
mso-fareast-font-family:”Times New Roman”;} (Delete)

STEP 7: Modify the styles. The coding in an ebook is actually quite simple. The major bits for your css stylesheet are as follows and most are self-explanatory:

  • margin /This is the margin for each paragraph block. This controls the top, bottom, right and left
  • text-indent /This is for paragraph indents
  • font-size /Kindle books render in either “ems” or percentages. Converters do their best to recognize points (pts) and inches, but results are iffy. That is why we’re going to change them.
  • font-style /For italics
  • font-weight /For bold

We are going to keep this very, very simple. Because there will be some coding for the body text, you don’t need much in these paragraph styles. Basically, we will whittle and adjust so they look like this (feel free to copy/paste these):

p.MsoNormal
{text-indent: 1.4em;}

h1
{margin: 2em 0;
text-indent: 0;
text-align:center;
page-break-before:always;
font-size: 1.4em;
font-weight: bold;}

p.Center
{margin: 0.5em 0;
text-indent: 0;
text-align:center;}

If you want to play with the styling, go to the w3schools website. To know what works in a Kindle book, you can look at their “approved” list (which often seems to change on a whim).

STEP 8: Replace the header. Copy the text that follows:

<?xml version=”1.0″ encoding=”UTF-8″ ?>
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.1//EN” “http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd&#8221; >
<html xmlns=”http://www.w3.org/1999/xhtml&#8221; xml:lang=”en” >
<head>
<meta http-equiv=”Content-Type” content=”application/xhtml+xml; charset=utf-8″ />
<title>BOOK TITLE</title>
<style>
/*===Reset===*/
html, body, div, applet, object, iframe, h1, h2, h3, h4, h5, h6, p, blockquote, pre, acronym, address, code, del, dfn, img, ins, kbd, s, samp, small, strike, strong, sub, sup, tt, var, center, fieldset, form, label, legend, table, caption, tbody, tfoot, thead, article, aside, canvas, details, embed, figure, figcaption, footer, header, hgroup, menu, nav, output, ruby, section, summary, time, mark, audio, video
{margin: 0; padding: 0; border: 0; font-size: 100%; vertical-align: baseline;}
body {text-align: justify; line-height: 120%;}

<!– Insert your paragraph styles here –>
</style>
</head>
<body>

Paste it in your file as follows:

CAL12New header and styles pasted in:

CAL14Step 9: In the menu bar in Notepad++ find Encoding and click it. In the drop down menu it will say: Convert to UTF-8 without BOM. Click that.

To see your styling live, in the menu bar you will see “Run.” Click it and in the drop down menu choose “Launch in (whatever browser you use)” Here is mine in Firefox:

CAL15See, that wasn’t so hard was it? Now you have a serviceable html file you can convert into an ebook. BUT, your job isn’t done quite yet. In Part 3 I’ll show you how to convert your file into a MOBI file that works.

_________________________________________

Styling ebooks isn’t difficult. Armed with only a few lines of code, you can create beautiful ebooks and some very interesting text effects. If anyone is having trouble getting their styles just right, feel free to email me–jayewmanus at gmail dot com–and I can probably come up with just the paragraph style you need.

 

 

Word to Calibre to MOBI: Part 1: Styling in Word

So, I’ve been obsessanating–again. In my last post I promised that there was a way to convert Word files in Calibre into ebooks that work perfectly on Kindles. That is true. It can be done. I was looking for a quick and dirty hack that worked every time. That is not possible.

Here’s the real problem. You got your indie writer who has put her heart and soul into writing her story. She’s not technical. She’s not a computer geek. She just wants readers to find and love her stories. Problem: How to get the story from Word onto a reader’s Kindle? Enter Calibre. Just save your Word file as an html file, load it into Calibre, convert it into a mobi file and upload it to Amazon. Done!

The problem with that? Calibre mobi files don’t quite work right when uploaded to Amazon. Period. They can work, at best, almost right. For the writer who’s eager to get back to writing her next story, that’s good enough.

As a reader, that attitude pisses me off. I buy and read a lot of ebooks. It pisses me off when the user preference controls don’t work. It pisses me off when I can’t navigate an ebook. (It’s not just indie publishers, folks. I get pissed off by the Big Pubs who can’t bother proofreading the ebooks and by the nastiness that turns up in ebooks built with InDesign, and don’t even get me started on the crap that happens when they turn scanned backlist books into ebooks.) A poorly produced ebook is equivalent to a writer using a mimeograph and newsprint, stapling the pages together and saying, “Here you go. That’ll be five bucks.” I’m insulted.

As an ebook producer, I get it. Amazon doesn’t make it easy. It’s next to impossible to break open a mobi file to tinker around in the code and fine tune it. Plus, as I explained before, Amazon has… quirks. They build their devices, then create the platforms, then play catch up with updates to older models, and it’s not easy keeping up.

NOTE: The last time I bitched about Calibre being the wrong tool, Calibre’s creator informed me that the “line-squish” problem could be solved by converting the ebooks into azw3. That works. Except… I didn’t explore far enough. Amazon rejects azw3 files, so they are useless for distribution through Amazon.

The easiest thing a writer can do to ensure having a perfect ebook to sell on Amazon is to hire someone who knows what they are doing. For any number of reasons, that isn’t always realistic. I’m a realist. Hence, this series of posts that will take you step-by-step through the process of turning a Word file into a commercial-quality ebook to sell on Amazon. The beauty of this is, you don’t really need to understand html or how ebooks work or anything technical at all. All you have to know is how to Copy/Paste.

Before you begin, you will need four–FOUR!–programs on your computer.

Microsoft Word
Notepad++
Calibre
Kindle Previewer

I assume since you are using Word, you have Word. The other three are freeware. A note about Word. You do not want to do this with .docx files. You want .doc files. Older versions of Word actually work a lot better for making ebooks than do later versions of Word.

Ready? Let’s begin.

PART 1: STYLING IN WORD

Step 1: Do a Save As so your original stays intact.

Step 2: Tag your special formatting (italics, bolding, underlining). A word about “special formatting.” This only applies to words or passages that are italicized, bolded and underlined in the body text. Such things as headers and sub-heads will be dealt with later.

Calibre1I use a simple tagging system for special formatting.

  • Italics: -STARTI- -ENDI-
  • Bold: -STARTB- -ENDB-
  • Underline: -STARTU- -ENDU-

STEP 3: Turn “manuscript” punctuation into “printer” punctuation.

  • “Curly” or “Smart” quotes, not straight quotes (and apostrophes). Do make sure your quote marks and apostrophes are turned in the proper direction–Word has a bad habit of reversing them.
  • Proper em dashes, not two hyphens or en dashes or spaced hyphens
  • Proper ellipses

STEP 4: Kill “soft” returns and tabs, and eliminate extra spaces

  • To turn “soft” returns into hard returns: In Find/Replace search for ^l (that’s a caret mark and lower case L) and replace with ^p (caret mark and lower case P)
  • To get rid of tabs: In Find/Replace, search for ^t (caret and lower case T) and replace with nothing
  • Don’t forget to get rid of extra spaces before and after paragraphs

STEP 5: Select all, copy and paste entire file into Notepad++

Calibre2Yes, that is what it looks like. That’s what it is supposed to look like. This is a straight text file.

STEP 6: Finish cleaning up the file

  • Delete blank lines
  • Tag scene breaks (I use ## because it is easy to find)
  • Search for and clean up special formatting tags. Word is very sloppy and you’ll find tags around empty spaces and jumping paragraphs and other untidiness.

STEP 7: Back in Word, open a New Document and set your Styles (I am going by the assumption that you know how to use style sheets in Word.) For the purposes of this tutorial, I used three styles for my ebook:

  • Normal (built in style in Word, modify as you wish)
  • Heading 1 (built in, also modified)
  • Center (user-defined style)

CAL1It doesn’t matter much what font you choose. Times New Roman is fine.

CAL2This will be used for your chapter heads. Again, font doesn’t matter much.

CAL3STEP 8: Apply the “Normal” style to the new document. Select all and copy the text file in Notepad++ and paste the entire document into Word

Calibre3STEP 9: Style the document.

  • Apply the Heading 1 style to all chapter/story headings
  • Apply the Center style to any text you want centered (in this case, I applied it to the scene break indicators, THE END and table of contents entries)

CAL4Calibre4STEP 10: Bookmark all your Heading 1 entries (Word automatically bookmarks Heading entries, but those will not transfer over so you need to insert bookmarks manually)

STEP 11: Link your bookmarks in the table of contents

That’s it for Part 1. Your document is now clean and styled and ready for Part 2: turning your .doc file into a proper html file.

_____________________________________

A word about styles. Like I said, for this tutorial I am using only three styles. You can use all sorts of styles to create visually pleasing ebooks–just remember one very important thing: Word is a program whose main purpose is to create print documents. What you see on the screen is pretty much what you will get on a sheet of paper, but it is not at all what you would get in an ebook. I suspect after you finish this full tutorial you will have a better understanding of how ebooks work and how Word works, and you will understand why it is so important to use style sheets religiously.

A word about questions. I know you have them. Let’s make them useful for everybody. If you have a question about this tutorial, especially if it is a “How do I do this…?” type of question, email it to me at

jayewmanus at gmail dot com

I’ll put together a post with questions and answers.

 

 

 

 

Calibre, Word and MOBI: A Tale of Three Programs

(Yes, I know, MOBI is not a program, but my blog, my headlines…)

Ever since I started blogging about ebooks, I’ve cautioned people against using Microsoft Word to format their ebooks. Not because Word is a bad program and not because it’s impossible to create ebooks with it. It’s because it’s the not quite right tool. Word’s strength lies in creating print documents or pdfs.

Recently, I’ve been cautioning people to not use Calibre to convert their Word files into MOBI files in order to sell them on Amazon. Not because Calibre is a bad program and not because it’s impossible to create MOBI files with it. It’s because it’s not quite the right tool. Calibre’s strength lies in managing a person’s digital library. It was not created to convert commercial ebook files.

EPUB files are not as troublesome as MOBI files. EPUB is EPUB is EPUB, and while each device has its own special way of rendering the file to fit the platform, the differences between devices aren’t big enough for most people to notice. A single EPUB file will work pretty much the same on a Nook as it does on an iPad.

Calibre is set up for optimum use with EPUB files. If a publisher converts a Word (html) file into an EPUB file using Calibre, then what they see there is pretty close to what a Nook or iPad reader will see.

This is not true with MOBI files. The reason is Amazon. You see, EPUB devices have evolved and changed and upgraded and gone the way all technology goes, ever upward and onward. But the device makers built the newer devices around the existing ebook platform. So an EPUB ebook formatted five years ago will work pretty much the same on a new iPad as it did on a first generation Nook. Amazon went bass-ackwards. They built the new devices then tinkered and recreated entirely new ebook platforms to fit the new devices. So a MOBI file being sold on Amazon isn’t just a MOBI file. It’s also a KF/8 file and an iOS file and an AZW3 file and god knows what else is there. I don’t quite get all the technical stuff. What I do get is that the same ebook can work fine on a Kindle Fire, but go to hell on a Paperwhite and look okay on a Kindle Keyboard and turn into gibberish if an iPad user gets hold of it.

The whys and wherefores don’t matter as much as the fact that a file formatted in a program which is optimal for printing documents and then converted with a program that is at its best with EPUB files, is going to have trouble meeting the very odd demands of Kindles.

(By the way, if you are using Scrivner or InDesign to create your ebooks for sale on Amazon, you will run into the same exact problems because Amazon is constantly tweaking and fiddling with the platform(s) and updating devices and they don’t necessarily share what they’ve done with the rest of the world.)

I realize that none of what I just wrote is going to dissuade people from using Calibre to convert their Word docs into MOBI files to sell on Amazon. I know this because people are using Word because that’s the program they know and love(hate) and they need a way to convert those Word files and Calibre is the shortest distance between A and B.

So instead of wagging my finger and clucking my tongue, I did some research. Question: Is it possible to format a file in Word and convert it with Calibre and create a MOBI file good enough to sell on Amazon? (Here, I make a very clear distinction. If your Nook died and you bought a Kindle, and you want to convert all your Nook books into MOBI files you can load onto your Kindle, Calibre is a great tool. That’s personal use. You expect that the ebook might not work completely right, but that’s okay, at least you have it. You can’t ask your paying customers to accept that standard.)

What I discovered is: Yes, it is possible.

I managed to fix the worst problems I see with Calibre-converted ebooks. I managed to create ebooks that respond properly to all the user preferences in three generations of Kindles (Kindle Keyboard, Paperwhite and Fire). I almost got Calibre to build a toc.ncx (what the user sees in the Go To features on Fires and Paperwhites) the way I want it to. I think with some more tinkering and fiddling around inside the opf file, I can fix that problem. I couldn’t get the cover to display on the bookshelf in my Paperwhite, but that’s kind of a non-issue, since Amazon will handle that when the book is uploaded. (It is only a big deal if a publisher is selling direct.)

Even though the ebooks I created this way aren’t up to my standards, they will respond to user preferences and they will look fine and read fine, and thus, they are good enough for uploading to Amazon.

There is a caveat. If you format your document, save it as an html file and convert it as is with Calibre, your ebook will be broken. It will be a substandard product you should not ask people to pay for. What you have to do first and foremost is format your Word file so it works within Calibre’s parameters, and secondly, you have to fix the html coding in the Word file.

Sound scary? It is, kind of. Word’s html coding is a nightmare, full of mso odd bits that give Kindles the hiccups. The good news is, all you really need to do is remove some very specific lines of code and rearrange a few others.

Since this post is running long and I don’t even have any pretty pictures to enliven it, (plus I have a buttload of Christmas gifts to wrap) I am going to explain how I did it in my next post. It’ll have pictures. In the meantime, if any of you, Dear Readers, have figured this out and feel like sharing in the comments, feel free.

Quick Tip: Tag and Restore Italics in Word

TRY THIS AT HOME

You all know that the key to a good ebook format is a squeaky clean source file, right? Word doesn’t produce particularly clean documents. For best results, you should strip out extraneous codes before you begin to format. Mark Coker of Smashwords calls it the “Nuclear Option.” You copy/paste your document into a text editor and that will remove all the unwanted coding. Then you copy/paste the clean text back into Word and you are ready to format.

Anyone who has tried this knows that doing so will not only remove unwanted coding, it’ll nuke your italics, too (and other special formatting and styles). Here is an easy way to tag all your special formatting and then restore it. (What I will show you applies to bolding, underlining, different sized fonts, etc., too.)

Here is a document in need of a good cleaning:

TagOpen the search box and make it look like this:

Tag 1If you open the “Format” box you’ll see a drop down menu that gives you a “Font” option. Open that.

Tag 3Notice the many, many options you can search for. Cool, huh?

I have come up with tags through trial and error. I use several different programs when I format ebooks, so I needed something unique for search purposes that didn’t make any of the programs say, “Oh no you don’t!” and crash the search box. I use all caps and hyphens to make sure they don’t get mixed up in the text. The most common tags I use are:

  • -STARTI- for italics
  • -STARTB- for bold
  • -STARTU- for underline
  • -END- to close the tag

Back to the document. Click Replace All.

Tag 2Now all your italics are wrapped in tags. This is a good time to go through and make sure your tags are in the right place and that you don’t have any blank space tagged.

Now copy/paste into a text editor:

Tag 4All your formatting is gone.

Now open a new file in Word and apply your main style sheet. Copy/paste your text into the new file. Open the search box and make it look this:

Tag 5Do a Replace All and… ta da!

Tag 6I generally wait until I’ve formatted all my headers and centering and any other styling necessary before I restore special formatting. Once done, all that’s left to do is to get rid of the tags.

Tag 7Replace All and done!

In the time it took you to read this blog post, you could have tagged and restored six files. It really is that easy.