Two Quick, Easy No-Cost Ways to Convert a PDF into a Word Doc

There are two types of PDF files that concern writers and from which writers would like to extract editable text.

The first is created by exporting a text document from a word processor or publishing program into a PDF file. The second type is created by scanning printed material and producing a PDF file.

(The second type, the scan, is actually an image file that requires further conversion via OCR (optical character recognition). OCR conversion requires special software, and it falls into the category of “you get what you pay for” and will be the subject of another blog post.)

This post concerns the first type of PDF. A common request I get is: “I had someone do a print layout for my book and it’s been edited and updated, but it’s in a PDF and I need a final copy as a Word doc. Can you help?”  No problem. It takes just a minute, so I don’t charge people to do it. (I do, however, charge an arm and a leg to clean up conversions. Just kidding, only an arm.)

The good news: Converting a PDF file into a Word doc is easier than ever and the results are better, too. And, you probably have the tools on your computer already.

The bad news: Conversion is always a mixed bag—some results are vastly superior and some will make you tear your hair out.

The good news about the bad news is that if you know what is happening, you can fix it without ending up in a weepy, shivering, fetal ball. Or sending people like me an anxious email saying, “I’ve spent months trying to fix this fripping’ Word doc and I’ve torn all my hair out and can you please, please, please help meeeeee!” Then wondering what is wrong with you when in a couple of hours I send you a fully restored Word doc—nothing wrong with you, but I’ve recovered millions of words from PDF files and pretty much know what I’m doing. 😉

Use MS Word to Convert the PDF

If you have a version of MS Word that is capable of exporting a PDF file then it is capable of importing a PDF file. How to know? Open a doc in Word and click Save As. In the tool box is a dropdown menu of different file types: .doc, .docx, .rtf,. txt, and a bunch of others. If the list includes PDF, you’re golden. Conversion is as easy as opening a document.

In Word, click on File > Open and select the PDF file you want to open. (Be patient. Depending on how fast your computer is and how large the PDF file is, conversion may take several minutes.)

Once it is open on your computer do a SAVE AS into the DOCX file format.

In the example the Show feature is activated so you can see the paragraph returns and other formatting.

What I like about this method:

  • Headers and footers are rendered as headers and footers (for the most part, depending on how the original PDF was created), meaning they can be quickly deleted or safely ignored.
  • It’s not horrible about retaining paragraphs.

The disadvantages:

  • It can hide hyphenation. (Sometimes the hyphenation is there but invisible and Word will not allow a search for them—if this occurs, you’ll need a text editor to clean them up. See below.)
  • If the fonts used in the pdf are not available on your computer, Word will substitute fonts. If Word is unable to read the font, it will insert black boxes, pink boxes or gibberish.
  • Images and other graphics can make the file difficult or impossible to open. This works best for a text-only document.
  • Depending on the source PDF, Word can go into overdrive attempting to retain the formatting. That can result in massive (and slow!) files.

Use Google Drive to Convert the PDF

You may have to create a Google account (gmail account) in order to use Google Drive, but it’s free and widely available.

  1. Go to Google Apps > Drive
  2. Click New > File Upload
  3. Select the PDF file you want to convert
  4. When the box opens saying “1 Upload Complete”, click on the file name
  5. Tell it to “Open with Google Docs”
  6. File > Download As > Microsoft Word (docx)
  7. Open the downloaded file in Word
  8. Save As to make sure the new Word doc is on your computer.

The advantages:

  • The PDF file is editable in Google Docs, so if you don’t have Word or don’t want to use it, you can work on the PDF directly. VERY IMPORTANT!: This version remains on the cloud, not your computer, so if you want it saved on your computer you will have to download it.
  • No real formatting to fight with.
  • It makes very little effort to convert images and graphics during conversion, so it rarely chokes up or crashes because of it.

The disadvantages:

  • Headers and footers will have to be removed manually.
  • Hyphenation will have to be cleaned up manually.
  • Spacing issues.
  • Not fabulous about retaining paragraphs.

Tips for Making Clean Up Merely Mildly Annoying (as opposed to having you curled up in a fetal ball, quietly weeping)

  • Forget trying to retain the formatting from the PDF file. The text is what matters, focus on it.
  • Work in Web Layout view rather than Print Layout view so that you can adjust the width of the screen to approximate the width of the PDF text. This will make checking for and fixing wayward paragraphs easier.
  • Make sure all scene breaks, page breaks and deliberate blank lines are clearly tagged with some kind of marker so you know exactly where they are. Don’t use extra hard returns or actual page breaks to mark them—you’ll regret it.
  • If possible, work with the Word doc and the PDF open on the screen side by side so you can see scene breaks, page breaks, deliberate blank lines and special formatting such as italics.
  • Activate the Show feature (click the pilcrow icon ¶ in the Home Ribbon menu) so you can see such things as paragraph returns, soft returns, tabs and spaces.
  • If Word is having trouble reading a font, you will need to try another method. Contact me (see below) and I’ll see if I can find a solution for you.
  • Clear the formatting. First, make sure all your scene breaks, page breaks and deliberate blank lines are clearly marked. Second, tag your italics (easy way: https://jwmanus.wordpress.com/tag/italics-in-ebooks/). To clear the formatting. Ctrl+a to select all text then click the Clear All Formatting icon in the Home Ribbon. This will leave you with a blank slate, essentially, and remove any unwanted formatting Word has applied. Apply the Normal style to the selected text then modify the style so it suits you. Restore the italics.

Quick Find/Replace terms useful for clean up:

Get rid of unwanted page breaks:
In the Find field: ^m
In the Replace field: leave blank
Replace All

Get rid of unwanted section breaks:
In the Find field: ^b
In the Replace field: leave blank
Replace All

Turn soft returns into hard returns:
In the Find field: ^l
In the Replace field: ^p
Replace All

To find and delete unwanted hyphens (in most cases, discretionary hyphens that are turned into single dashes have a space after them):
In the Find field: -(hit the space bar once to create a blank space)
In the Replace field: leave blank
Replace All

What if Word has hidden the hyphens?

It’s a common problem. It’s frustrating because you might never know it happened until you format your book as an ebook or send it in an email to someone. To find out if Word has done this, you will need a text editor. On a Windows machine, Notepad works fine. Open a blank document in the text editor. Use Ctrl+a to select all the text in the Word doc. Copy it, then paste it into the text editor. If you see this character ¬ then Word has replaced the hyphenation with “non-characters” that will cause trouble down the line. Word’s Find/Replace won’t do you any good. You will need to tag your italics, copy/paste the entire document into the text editor then use the editor’s Find/Replace function to delete the hyphenation.

_______________________________

If neither of these conversion methods works for you, feel free to contact me at jayewmanus at gmail.com. I have other tools on hand that can convert difficult files. If the conversion does work for you, but you’re struggling with restoring the text, explain your problem in the comments and let’s figure it out.

 

 

 

 

Advertisement

MS WORD for Writers: Working with Styles

The very best feature of Microsoft Word—or of any word processor or writing program—is styles. Open up Word and you’ll see the Styles handily placed in the Home Ribbon. Many writers have no idea what those do or why they are there. You may have clicked on one out of curiosity and oddball things happened in your doc and it freaked you out.

Fear not. Styles are easy. Easy to use, easy to create, easy to modify. They make writing easier, reduce errors, and prevent destructive coding caused by over-formatting. Trust me. If you’re writing fiction, you only need two: Normal and Heading 1.

(Nonfiction sometimes requires extra heading styles. Formatting ebooks or a print-on-demand edition means using multiple styles. Once you see how easy it is to use Normal and Heading 1 in your daily writing, you’ll have no problem if/when there comes a time when you need multiple styles.)

Normal Style

If the Tool Ribbon is not displaying in your Word setup, click Home. When the Ribbon opens you will see a pin icon in the lower right corner. Click that and the Ribbon will stay open.

2018-01-10_Word Style Ribbon

How it works: Open a new blank Word doc. In the Home Ribbon, hover your cursor over the style that says “Normal”. Right click and select “Modify”. A “Modify Style” tool box will open. At the  bottom check the boxes for “Add to the Styles gallery”; “New documents based on this template”. Under that, click the arrow to open the options for Format. Select “Paragraph”. A Paragraph tool box will open.

2018-01-09_Word Modify Styles

In “Indents and Spacing” select:

Alignment: Left
Outline Level: Body Text
Indentation Left: 0”
Indentation Right: 0”
Special: First Line
By: 0.5”
Spacing Before: 0 pt
Spacing After: 0 pt
Line spacing: 1.5 lines
At: (leave blank)
In “Line and Page Breaks” clear all boxes.
Click OK

Next, from the Modify Style box Format dropdown menu select “Font” and a Font tool box will open.

Font: Times New Roman
Font Style: Regular
Size: 12
Font color: Automatic
Underline style: (none)
Effects: clear all boxes
Click OK

In the Modify Style box click OK.

2018-01-09_Word Modify Paragraph2018-01-09_Word Modify Font

Now type a few paragraphs. Every one of them, without you doing anything except type words and hit Enter, will look the same. Same indent, same font, same spacing. From this point forward, when you open a new blank doc, the Normal style will be your default. Unless you modify the style, a doc you write today will look exactly like anything you’ll write years from now.

The set up above is a suggestion for writing fiction. If you prefer different line spacing or another font or narrower or wider indents, modify the style. Find a look that is comfortable for you and works well with your creative process.

A few cautions:

  • In Line spacing select single, 1.5 lines or double; avoid At Least (it has a specific purpose when creating print layouts). If you prefer no paragraph indents, you will need to add spacing either before or after each paragraph so they don’t all run together. 6 pt will approximate one line of spacing.
  • When selecting a font, stick to the “common” fonts such as Times New Roman, Garamond, Arial or Courier. Some of the fonts included on your computer or imported from other sources are not recognized by other programs or operating systems. If another program can’t substitute its own fonts for your fonts, the recipient will see gibberish.

Heading 1

Heading styles (Word has them built in up to nine levels) affect more than just the look of the text. Headings create navigation in a doc. To see this in action, type a list of headings in your Word doc. Chapter 1, Chapter 2, and so on. Without modifying anything, set your cursor at the beginning of a line and click on Heading 1. (Don’t worry about what it looks like right now.) Next, in the Home Ribbon click “Find”. A Navigation pane will open on the left side of the screen. Click Headings and you will see every heading to which you’ve applied a heading style.

2018-01-10_Word Styles Pane

Using heading styles

  • Eliminates the need for page breaks (unnecessary in a work in progress);
  • Allows you to easily navigate through a long doc without scrolling or paging—click an entry in the Navigation pane and Word will take you right to it;
  • Makes it easier to see and repair the common error of incorrectly numbering chapters;
  • Ensures consistency in that you’ll never have to remember the font size or effects, or worry about placement.

Heading styles can be modified the same as the body styles. Hover the cursor over the style, right click, and the Modify tool box will open.

Styles Pane

The easiest way to track styles is to open the Styles pane. (See image above) In the Ribbon Styles command box, click the arrow in the lower right corner of the box. A pane will open. The first time it’s opened it will display a list of every built-in style offered in the template you’re using. The Styles pane can be customized so it only displays the styles in use. Click on “Options…” (at the bottom right of the Styles pane) and a tool box will open to customize the display.

The Styles pane has three Quick Access icons at the bottom. “New Style”, “Style Inspector”, and “Manage Styles”. Clicking any of them opens a tool box for a specific task.

Create New Styles

Select the text you want styled. Click the “New Style” icon in the Styles pane or click the down arrow on the styles display box in the Ribbon and click “Create a Style”. Either way will open a New Style tool box. Give the new style a name and set up the font and paragraph any way you like.

Apply Styles in an Existing Word Doc

If you have a work in progress for which you have not been using styles, applying styles can be easy or tricky depending on how much formatting you’ve done.

The easy way:

  1. Select all the text. (Ctrl+a)
  2. Click the “Clear all Formatting” icon in the Home Ribbon Font command box. (See the image of the Ribbon above for its location.)
  3. With the text still selected, click “Normal” to apply the style.
  4. Deselect the text then scroll through the doc and apply Heading styles to chapter and section headings.
  5. Done.

If however, you clear the styles and apply Normal and your doc is a mess, that means you’ve done a lot of extraneous formatting that needs to be cleaned out. There isn’t enough room in this blog post to cover a thorough clean, but there are four common things writers do that mess up their docs. Running the following Find/Replace operations will take care of them.

1) Tabs: To get rid of tabs, go to the Home Ribbon Editing command box and click “Replace” to open the Find/Replace box.

In the Find field: ^t
In the Replace field: (leave it blank)
Replace All

2) Soft Returns (Shift+Enter): Change soft returns into hard returns.

In the Find field: ^l (that is a lower case L)
In the Replace field: ^p
Replace All

3) Extra Spaces:

In the Find field: (hit the space bar twice to create two blank spaces)
In the Replace field: (hit the space bar once to create one blank space)
Replace All and repeat until results show zero

4) Extra Hard Returns: Be cautious with this operation. If you’ve been using extra hard returns to indicate scene breaks, you need to tag them first. Use pound/hashtag signs or asterisks or even insert [SCENE BREAK]. The same thing goes for deliberate blank lines such as those between stanzas in poetry or songs. If you need deliberate blank lines, tag them so they aren’t lost. When the tagging is done:

In the Find field: ^p^p
In the Replace field: ^p
Replace All and repeat until the results show zero

There you have it, how to use styles. Increase productivity and remove the distraction of “formatting” when you’re supposed to be telling a story.

By the way, even if you don’t use MS Word, almost every word processor or writing program uses styles. In your program of choice, look under Edit or Layout, or search the Help menu. Find styles, set them up, and use them.

****************************

My goal for 2018 is to teach as many writers as possible how to efficiently and expertly use MS Word as a writing and self-publishing tool. Watch this blog-space for more tips, tricks and techniques. Or, if you’d prefer all the information in one package, including step-by-step instructions for formatting ebooks and print-on-demand editions, WORD for the Wise: Using Microsoft Office Word for Creative Writing and Self-publishing is available at Amazon as an ebook and in print.

MS Word and Print on Demand Books

Since I published the ebook, WORD for the Wise, I’ve been getting some questions. Since I know for every person who actually sends an email, there are many more with the same questions who don’t send emails, thought I’d throw out some answers here.

  • Yes, there will be a print edition. I’m working on it now and it should be available for sale in a few weeks.
  • Yes, Word formats for ebooks are perfectly acceptable and not at all difficult to do as long as you’re aware of the limitations. There are plenty of good resources on the internet and in how-to books, including mine, with step by step instructions.

The main question I want to cover has to do with print on demand books. I know a lot of you are currently publishing print editions–I’m doing as many print editions these days as I am ebooks. Judging by the questions I’ve gotten, a lot of writers have doubts about both the quality of Word formats and how easy it is to do.

Regarding quality, if a book is formatted properly and uses good fonts, the average reader would find it very difficult to tell the difference between a book formatted in Word and one formatted in a publishing program such as InDesign. A book designer, a professional typographer, or a hardcore bibliophile who collects books as objects would see the difference. But a reader who buys a romance or mystery or science fiction to enjoy the story either won’t notice the difference or won’t care. The self-publisher with more time than money or who wants to handle production themselves because Go, DIY!, should feel perfectly confident that it is possible to format their book in MS Word so that it looks professional and can proudly take its place on readers’ bookshelves.

As for ease of use, Word can be persnickety, as those of you who use it are well aware. Because it’s an office program rather than a publishing program, it’s not set up to do some of the things that a publishing program can do. With a little patience and some practice, however, a determined do-it-yourselfer could format a novel in a few hours. The example below took me five minutes.

pod format

Granted, I’ve been doing this a while and I’ve been immersing myself in Word for the past few months, but even so, with a bit of practice you can do it, too.

The keys to a good print-on-demand format:

  • Use a good font. The majority of fonts pre-installed on computers are too wimpy or too “homemade” looking for commercial publishing. There are a few fonts that are suitable, however, and with some testing you’ll be able to find one that’s suitable.
  • Keep the design simple. The very best models are sitting on your bookshelf. Take a look at traditionally published books in your genre. That’s what readers expect to see and most of the designs are simple enough to emulate.
  • Be patient with yourself. If you get tangled up or the program starts fighting with you, be willing to start over. It’s okay. Practice makes perfect.
  • Take it step-by-step. There’s an order to doing a print format that greatly reduces frustration and creates a better product. In WORD for the Wise, I lay out the process as clearly as I know how. Even though there are a lot of steps, no one step is difficult.

I hope that answers your questions. If there are others, well, you know where to find me.

WORD for the Wise
Using  Microsoft Office Word for Creative Writing and Self-publishing

 

 

 

 

I Finally Did It: WORD FOR THE WISE is now an ebook

I know, I know, I haven’t posted in ages. I’ve been very busy. Anyone want to know how to scan and restore foreign edition paperbacks and turn them into ebooks and print books without being able to understand the words? *crickets* No? Okay, on to the subject at hand.

After years of cleaning and processing MS Word docs, and posting tips and tricks and hacks for using Microsoft Office Word for writing and self-publishing, and answering a lot of emails about problems with Word, I finally compiled that hard-earned knowledge into a book.

2017-11-08_Ebook Cover_Manus_Word for the Wis copy

Here’s an excerpt from the Introduction:

Word is an excellent word processor, one of the most powerful on the market. All that power comes with a price: Where the act of composing fiction or nonfiction is a simple process (in technical terms) Word is complicated. It’s right there in the name itself: Microsoft Office Word. It’s a productivity program for businesses; not a publishing program for writers of commercial fiction and nonfiction.

For writing a report or a business proposal or a policy & procedures manual, it’s one of the best programs around. For writers, though? It’s kind of like driving a Porsche Carrera to the grocery store.

Even so, just about every writer I deal with uses Word. Even Mac users. Even writers who wouldn’t touch a Microsoft program send material that has been exported as a Word doc. Word is everywhere thanks to Microsoft having installed it on all Windows PCs for decades. (They no longer give away the Microsoft Office Suite; Word must now be licensed via subscription.)

Smashwords, the largest and heartiest of the aggregators for self-publishers to distribute and sell ebooks, converts Word docs into a wide variety of ebook platforms. (A publisher can also upload an EPUB file to Smashwords.) Other sites now allow self-publishers to upload Word docs. Even Amazon allows it. The conversion processes they use are programmed to recognize and modify the HTML coding in a Word doc.

Writers are using Word to compose their work, and some use it to format ebooks, and others use it to format print-on-demand editions. Even some professional ebook and print formatters use Word. Word might not be the best word processor for writers, but it is everywhere and it’s not going away for a long, long time.

I have processed thousands of Word docs, millions and millions of words, from hundreds of clients. The majority of those writers are like me from ten years ago, using the program inefficiently and often destructively. Cleaning up those files is how I’ve become an expert.

I can help you use Word like an expert, too.

My goals with this book are:

  • Teach writers to customize Word to suit their particular needs.
  • Teach writers to use the features that actually make their writing lives easier.
  • Help writers increase their creative productivity by eliminating destructive practices.
  • Teach writers to create the various types of docs used for editorial tasks, digital submissions, ebooks and print-on-demand interior files.

Even if you don’t use Word, you might find this book useful. There are dozens of word processors and programs created specifically for creative writing. The majority use the same underlying principles as Word.

I give you my promise. There are no gotchas in this book. No traps. No need for special skills or technical knowledge. I won’t use tech-speak because I don’t know any; I’m talking to you writer to writer. You don’t even need a spectacular memory since many of the things I recommend will require your attention just once. Set it and forget it and write on.

For the time being it’s only available on Amazon. (Have to figure out how to sneak all the mentions of Amazon and Kindle past Apple–heh.) I’m working on the print edition and should have that live in a week or so.

So if you ever wanted to know what I know about using MS Word, now you can, all in one easy guide.

WORD for the Wise:
Using Microsoft Office Word for Creative Writing and Self-Publishing

Tables of Contents in Ebooks: Yes!

There’s a big brouhaha going on now with Amazon. Scammers and other crooks have flooded Kindle Unlimited. Amazon is making one of their sweeps in an attempt to root them out. As per usual, when automation is unleashed, innocents get caught up in the net–sometimes with very expensive consequences.

One of the ways publishers are being dinged has to do with the tables of contents. Crooks are manipulating them to game the Kindle Unlimited page reads, so Amazon is going after ebooks that lack a standard (in form and in placement) ToC. Amazon highly recommends that every ebook has an active (publisher generated) table of contents, and requires an internal table of contents (this is what you see when you use the Go To feature on a Kindle). For more information on Amazon’s policies, start here and don’t forget to read this.

ToC Blog 1

The two most common arguments I get against building a Table of Contents in an ebook are (1) It’s a novel. It’s stupid to put a table of contents in a novel. And (2) A long list of chapters eats up the sample/Look Inside features at Amazon and hurts my chances at a sale.

My answer to #1 is: Novels don’t need tables of contents, but ebooks do. A reader can’t just open a book to the middle and leaf through a few pages to find Chapter 9. They have to navigate. An ebook without a ToC requires endless paging through to navigate and that’s no fun. As a reader, an ebook without a useful navigation guide is a broken ebook, and it’s irritating. For those who point out that the internal ToC is the navigation guide, my answer is that not every Kindle device (or other reading devices) displays the internal guide. Instead the device points to the user generated table of contents and if there isn’t one, the link is grayed out–useless.

The answer to #2 is not so easy. For non-fiction, it’s a no-brainer. A comprehensive table of contents IS a sell point. Readers want to see what they are getting and a solid ToC in the sample/Look Inside can often tell them everything they need to know.

ToC Blog 2

For novels, especially with a lot of chapters, it does get trickier. I’ve read ebooks with up to ten “pages” of chapter lists. Endless Chapter 1, Chapter 2, Chapter 3… This does eat up the sample/Look Inside. It’s useful once the reader has purchased the ebook, but for tempting them into buying in the first place, it can be harmful. The temptation is strong to forego the ToC altogether or to move it into the backmatter. Normally, I’d recommend putting the ToC in the back of the book, but with the current Amazon crackdown, I would say that for any ebook enrolled in Kindle Select/Kindle Unlimited, DO NOT DO THAT. Any perception that you are somehow gaming the system or bending the rules can cause you to run afoul of Amazon’s policies.

Let’s explore some practical options.

The Internal ToC (required by Amazon)

If you’re building your ebook from scratch, you will hand-build your internal ToC (tocncx). It will look something like this:

Blog ToC 4

This produces the NCX view/Go To list, along with giving a strict order to display the sections of your book. (If you want to learn how to do this, which will allow you to create more sophisticated and better ebooks, check out The eBook Design and Development Guide, by Paul Salvette.)

For those of you formatting in Word, onsite conversion will build your internal ToC. The conversion seeks out sections based on styles and/or chapter headings (It picks up “Chapter” for instance). For a full explanation, look here. The easiest way to do this is to use Word’s built in heading styles: Heading 1, Heading 2, Heading 3, etc. Apply these to the chapter/section starts. Conversion will do the rest.

The Publisher-Generated (active) Table of Contents
(highly recommended by Amazon)

If you have up to thirty entries in your ToC it’s not going to eat up too much of the sample/Look Inside (about two “pages” worth). I would build a standard ToC and call it a day, having done due diligence. Where to put it? As long as it’s in the front matter, it’s up to you. (My personal preference is to have the title page be the start page, so I generally place the ToC before that.)

What if you have thirty+ entries? A simple solution is to put all the entries in a block.

Blog ToC 3

The above example is for a seventy chapter ebook, and the ToC takes up one “page”. This can also work well with non-fiction that contains a large number of sub-entries:

Blog ToC 5

In Word, if you use the built-in Heading styles, all you have to do is style your table of contents to look the way you want it, then use the automatically generated bookmarks to link to the entries.

Blog ToC 6

Don’t forget to test all your links–no matter how you build your ebook. It’s easy to mis-link an entry, but even easier to fix it. So test, test, test.

A Word for those who are Dead Set against a Chapter List

There are those who just cannot bring themselves to include a ToC that contains a list of chapters. If you are one of them, include, at least, a truncated ToC. It can be very short. For example:

Title Page
The Story
About the Author

It won’t be very useful for your readers, but it will put you in compliance with Amazon.

 

 

MS Word, A Primer for Indie Writers: Part III: Punctuation and Special Characters

The best evidence that MS Word is not the best tool for fiction writers is in the way it handles punctuation and special characters. The program was created for office writing, and the documents it creates are meant to be printed on site in order to find homes in filing cabinets. Many features that make it terrific for an office can cause major problems for indie writer/publishers.

My Number One Recommendation: Turn It Off

Auto Correct is a boon for office drones, but it’s an annoyance (at best) and dangerous (at worse) for fiction writers. Find it under File>Options>Proofing.

Word_Styles_12Enable or Disable features as you see fit.

Click on the Auto Correct Options button and this comes up:

Word_Styles_13When I’m composing, the only auto-formatting I allow in Word is curly/smart quotes instead of straight quotes. Anything else means I’m going to end up fighting with Word and that pisses off the muse and sends her sulking into the corner. Every once in a while I need to format a Word doc for Smashwords. Then some of those auto features come in handy. See that box in the right hand image that says “Replace text as you type”? You can enable that and make it so Word inserts special characters for you. The copyright symbol, for instance, or the Euro symbol rather than a dollar sign. Be careful with this option and make sure you are using an ebook friendly font (Times New Roman, Garamond), otherwise Word could insert special characters from a subset that is not supported in ereading devices.

PUNCTUATION

When I’m prepping a document for production one of the things I do is make sure the punctuation is print standard. If you want your ebook or print on demand edition to look professional, you will do the same.

Curly/Smart Quotes versus Straight Quotes

Straight quotes/apostrophes look bad and amateurish. Period. Use curly/smart quotes. If you have straight quotes in your document, you can change them to curly quotes with Find/Replace. Enable auto correct for smart quotes, then type a double quote mark in the Find field and a double quote mark in the Replace field, click Replace All and Word will change straight to curly. Do the same for apostrophes/single quotes.

Now you will run into a major headache caused by Word: Curly quotes turned in the wrong direction. To find and correct the most common offenders, here are two searches I suggest you run using the Find feature:

  • Dash/hyphen or em dash with a double quote. In the Find field search for -” or ^+”
  • Space apostrophe (insert a blank space before the apostrophe). This will find open contractions with wrong way apostrophes.

HYPHENATION

If you take only one thing away from this post, it is to NEVER use Word’s auto-hyphenation feature.

Word_Styles_14When producing an ebook, do NOT hyphenate your text. Ereading devices will render the hyphens as characters placed randomly throughout. It looks awful.

When producing a print on demand book, use Manual hyphenation. Yes, it takes time. Yes, it is tedious. Yes, it seems ridiculous to manually do something the program can do in seconds. But, Word is a slob when it comes to hyphenation and it uses weird rules. Don’t trust it.

Em and En Dashes

This isn’t a grammar guide, so you’ll have to open a style manual and study up. Em and en dashes have specific uses and are NOT interchangeable. If you want your book to look professional, use these punctuation marks correctly.

Hot keys for quick insertion:
Em dash: CTRL+ALT+ Minus (the dash/minus on the Number pad)
En dash: CTRL + Minus (the dash/minus on the Number pad)

Auto-format as you type:
Enable auto format so that a double dash — becomes an em dash
Enable auto format so a space – space becomes an en dash

Find/Replace
Compose using a double dash (for em dash) and space – space (for en dash). When you are done and ready to format your book, do a Find/Replace All to take care of them in one shot.
Em dash: double dash in the Find field, and replace with ^+ (caret plus sign)
En dash: space – space in the Find field, and replace with ^- (caret single dash)

Ellipsis

Ah, the ellipsis, much beloved by writers everywhere and so widely, horrendously misused. Get a style manual and bone up on proper usage. An ellipsis is a special character consisting of three dots. Not two, not four, not twelve–three. While you are composing in Word, three periods in a row will suffice. When it comes to production, three periods in a row will screw up your book (digital and print) by orphaning periods.

Now is the time..
. (oops, little orphan)

For a professional looking ebook or print on demand book you want to use either the ellipsis character or a spaced ellipsis.

Word_Styles_15I showed the characters with the Show feature both off and on so you can see the (invisible) non-breaking space characters.

To make an ellipsis character:

Hot key: CTRL+ALT+. (period)
Auto format: Refer to the above image showing auto format options. Enable the “Replace text as you type” option to replace three periods in a row with an ellipsis.

To make a spaced ellipsis:

Hot key: .(period) CTRL+Shift+space .(period) CTRL+Shift+space .(period)
Find/Replace: (During composition use three periods) In the Find field type three periods … and in the Replace field type .(period)^s.(period)^s.(period)

SPECIAL CHARACTERS

If you are creating a document for your personal use, to print on your printer, this isn’t a concern. Just about everything you see on the screen will show up on the printed page. When you’re producing a book, either in print or digital, however, special characters can create big problems.

What is a special character?
Anything you can’t type directly on your computer keyboard.

In the Insert tool bar, click on Symbols.

Word_Styles_16For those of you who hire out your formatting, using obscure symbols or characters can cause big problems.It’s also a big problem when restoring text from scanned pages converted into a Word doc with OCR (Word can be very creative with interpretation). Ereading devices are selective about the characters they will render. The older the device, the fewer characters it will accept. My suggestion to you is, if you want/need obscure characters or symbols in your ebook, send a note to your formatter.

Dear Formatter: In chapter 7 I have several emoticons (smiley face and frowny face) I would like turned into symbols if possible.

Sometimes it is possible, sometimes substitutes must be made. Doing it this way is better than inserting a character that will not render and the formatter missing it and the ebook ends up displaying an “I do not know what this means” symbol (an X’d rectangle with a question mark in it).

For those of you creating ebooks with Word, stick to only those characters and symbols you find in “normal text”, Latin-A extended and Latin-B extended. Most of those are safe. To test if they will render, use the Kindle Previewer and look at the text in the DX device. If it shows up there, it’s good.

For those of you creating print on demand books with Word, you have a slightly different problem. You must ensure that your fonts (or at least, the font characters) are embedded. Go to File>Options>Save.

Word_Styles_17Word comes loaded with dozens or hundreds of fonts. Not all of them are embeddable. When you save the file as a pdf, the receiving program will try to find substitutes for any characters it cannot reproduce in your desired font. This can be a disaster. It can also make getting your book through the Createspace review process a major pain in the patoot.

For more information from Createspace: https://www.createspace.com/en/community/docs/DOC-1791

For more information about embeddable fonts: https://www.itg.ias.edu/content/embedding-fonts-microsoft-word-documents-windows

MS Word, A Primer for Indie Writers
Part I: Styles
Part II: Scene Breaks, Page Breaks, and Sections
Upcoming: Part IV: Find/Replace and SpellCheck

MS Word, A Primer for Indie Writers: Part II: Scene Breaks, Page Breaks and Sections

From a production point of view, white space in a Word doc can be a problem. It can confuse you or your hired formatter. It can cause goofs in your ebooks, not to mention making extra work for yourself.

I have some simple solutions for you.

SCENE BREAKS

Did you mean to hit Enter twice, or is that a scene break? How much time do you spend centering or using the space bar to align asterisks? How often do you forget to add the asterisks, or sometimes use one and other times five? How hard do you make it on yourself (or others) to find scene breaks when your book is in production?

Make it easy. I use a double pound sign (hashtags, for you young’uns).

Word_Styles_4Type them in and drive on. The double pound signs are unique (be very rare to find them within the text) and thus, searchable. When it comes time to produce the ebook or layout a print on demand edition, all I have to do is search for the double pound signs, do a Replace All and scene breaks are taken care of. (By the way, I turned on the Show feature in the sample so you can see the hard returns.)

If using Word to format your ebook or pod book, you can replace the ## with your scene break indicator of choice and style them all in one operation. Here is how:

Create a new style and call it Break or Scene Break. Here is a simple set up.

Word_Styles_5Open the Find/Replace box and do this:

Word_Styles_6Do a Replace All.

Word_Styles_7If you are sending your book to someone else for formatting, tell the formatter that you used ## for your scene breaks and let them know how you’d like them handled.

NOTE: The ## is arbitrary, which I use because it’s easy and unique. You can use any tag that makes sense to you, even typing in SCENE BREAK. As long as it is an easily searchable string, you’re golden.

PAGE BREAKS

I don’t use page breaks when I’m composing in Word. It’s unnecessary and just makes extra space I have to scroll through. I use a tag:

==

That’s two equal signs. I use it because it forms a unique search string. So the text ends up looking something like this:

Title
Author
==
Copyright Information
==
Table of Contents
Chapter One
Chapter Two
And so on
==
Chapter One

My little tag comes in handy while I’m formatting, too, since it allows me to use it as a search term to plug in page breaks and styles. If you want to print your document or you’re formatting an ebook or pod edition, there are two easy ways to insert page breaks.

Number 1: Find and Replace

Word_Styles_8If you want to retain the tag, use ^m== in the Replace field.(You can delete the tags later) Do a Replace All and you have page breaks.

NOTE: ^m is Word’s code for Manual Page Break. You can find other codes in the Special menu you see in the Find/Replace box. Those codes can be used in either the Find or the Replace fields.

Number 2 is to use your Heading 1 style. Modify Heading 1 the way I showed you in Part I. In the modify paragraph box, Line and Page Breaks, check the Page Break Before box. Now Word will insert a page break before every instance of the Heading 1 style.

Word_Styles_9To insert a manual page break in Word. You can use the hot key: CTRL+Enter. Or go to the Insert tool bar. Click on the icon for Page Break.

Word_Styles_10SECTION BREAKS

Sections are a nice feature in Word. They allow you to treat different parts of a large document with different styles, page numbering and first page treatments (no headers or footers on the first page, for instance). For composition, most print documents, or ebooks, you don’t need sections. If you are laying out a print on demand book, sections will save you many headaches and much frustration. The Section Breaks command (with its options for Odd and Even breaks and Next page or Continuous) is found in the Page Layout tool bar.

Word_Styles_11DELIBERATE WHITE SPACE

As noted before, white space can be a problem in Word. Sometimes you want a blank line–to set off a poem or letter, for instance–but it’s not a scene break. What I do is tag the blank line with a single pound sign/hashtag. It looks like this:

Here is my story moving along.
#
The only problem with
Kittens is that
Kittens grow up to be cats!
#
And the story continues on (with apologies to Mr. Nash)…

My little tag (which is entirely arbitrary, by the way, you can use anything you like, even type in BLANK LINE if it suits you) is a search term and I also use it to indicate that a section requires special formatting. If you use my single pound sign, remember it is NOT necessarily a unique search string. I make it unique with this string in the Find field ^p#^p. That tells Word to only consider a pound sign if there is a paragraph return before and after it.

There you go, learn a few Word features and use my tips, and white space will never trip you up again.

Part I: Styles
Next Post: Part III: Punctuation and Special Characters

Indie Writers: Make MS Word Work for You Instead of Against You

A Quick Primer for Fiction Writers in using Microsoft Word in the Digital Age

It always saddens me a little when a writer sends me an overly formatted Word doc to turn into an ebook or print-on-demand. It’s not that I have to clean it up–I can strip and flip the messiest files in less than an hour. What bugs me is how much thought and effort the writer wasted on utterly useless manuscript styling.

Example of a Word doc that has been overstyled.

Example of a Word doc that has been overstyled.

The majority of writers I work with use Word. The vast majority have no idea how to use Word for their own benefit. I understand. I was a fiction writer for over two decades and even though I have been using computers and a variety of word processing programs since the late ’80s, it wasn’t until I started learning book production that I figured out how those programs worked. Why would I? All I needed was a printed manuscript in standard format to mail to my editor. Word processors made that easy.

Now I produce books for digital and print, and those old ways of “thinking print” make the writer’s job harder. Especially indie writer/publishers who might be doing it all alone or working with contractor editors and proofreaders and formatters.

Since it would take a full book–or volumes–to explain how word processors work, I’m going to urge you all to take what I tell you in this post and play around in your word processor. I will be talking about MS Word, but much of what I show you will apply to almost any word processor.

STUFF YOU DON’T NEED AND NEED NEVER USE AGAIN

  • Tabs
  • Page breaks
  • Headers
  • Footers
  • Page Numbers
  • More than one space for any reason
  • More than two hard returns for any reason
  • Multiple fonts
  • Text boxes
  • Justification
Example of a manuscript that uses NONE of the above.

Example of a manuscript that uses NONE of the above.

STUFF THAT MAKES WORD “WORK” FOR YOU

  • Style sheets (fiction writers can get away with using only two or three, four at the most)
  • Find/Replace
  • Save As
  • Web View
  • “Show” feature
  • Formatting tags
(Left) Basic manuscript formatting; (Right) Overly formatted manuscript.

(Left) Basic manuscript formatting; (Right) Overly formatted manuscript.

See that backward P-looking icon I’ve circled? That’s the “show” feature. Toggle it on and you can see paragraph returns, spaces, tabs and a few other formatting features. With the basic formatting on the left, all I had to do was apply one style (Normal) to the entire manuscript, then apply heading styles to the chapters and sections, and done. To style an entire manuscript takes minutes this way. The manuscript on the right is an entirely different matter. To get it looking the way I want would take hours, if not days, manually lining everything up, trying to get it to look the way I want it. Worse, I have to remember what I’ve done so I can remain consistent throughout. When I’m done, I still have to scroll endlessly through the entire document to find whatever I might need to find.

And what about what is happening behind the scenes? MS Word uses html to control all those features. If you’re printing a document, the only true concern you have is making sure your fonts print properly. If you’re turning your work into an ebook, all that hard work (and useless effort) works against you.

The html in the basic Word doc and how it displays in Firefox.

The html in the basic Word doc and how it displays in Firefox.

The overly formatted file in html and how it displays in Firefox.

The overly formatted file in html and how it displays in Firefox.

So let’s make Word work for you. The NUMBER ONE thing (print it out and blow it up to poster size and post it where you can see it while you work) is:

IT DOESN’T MATTER A RAT’S PATOOT WHAT YOUR WORKING DOCUMENT/SOURCE FILE LOOKS LIKE

(Seriously, if your Happy Place while composing fiction involves Comic Sans font, 22pts, with 2 inch margins, triple spaced, then go for it. The only time it matters what your document looks like is when you intend to print.)

STYLE SHEETS

Set them and forget them; the best tool in the MS Word

Set ’em and forget ’em; the best tool in MS Word

Every version of Word has a style sheets feature. If you’re using 2010, you’ll find them in the “Home” toolbar. Word comes with a huge variety of pre-built style sheets. You can use them as-is or modify them. You can create your own style sheets. The most useful styles for the fiction writer are: Normal, Heading 1, Heading 2.

  • Normal: apply to the body of your text. Set your paragraph indents, line spacing, and font. Never worry about spacing, margins and indents again.
  • Heading 1 & 2: apply to titles, chapter heads or sections. Bonus: Word will automatically list your headings in the navigation window. No more scrolling through a long document to find a specific chapter or section. Another Bonus: Ebook conversion programs recognize heading styles. Some, like Calibre, will automatically build a table of contents for you based on headings 1 & 2.

Additional styles fiction writers might find useful:

  • Emphasis: Remember, styles apply to paragraphs. “Emphasis” is italics. If your entire paragraph is italicized, use “emphasis”.
  • Strong: “Strong” is bold.
  • Custom style–“Center”: Instead of clicking on the icon for centering, create a style sheet. Makes life easy.
  • Poetry: For poetry, quotes, lyrics, anything you want with different margins and font style.

FIND/REPLACE

This is the most useful and the most underused tool in MS Word. You can use it to not only find words, you can find special characters, styles, highlighting, and special formatting (such as italics or bold).

Click on the dropdown menus and you can look for anything that appears.

Click on the dropdown menus and you can look for anything that appears.

A few useful search terms:

  • ^& (caret ampersand): Stands for a string of text. Say I want to tag my italics. I would leave the Find box blank, but ask it to search for italics. In the Replace box I’d type -STARTI-^&-ENDI-, do a Replace All and Word will wrap all my italicized text in tags.
  • ^p : Hard return. You can search for them or insert them
  • ^l  (caret lower case L): Soft return (shift enter)
  • ^t : Tab. Working on a document in which you or someone else used tabs and want to kill them all? Type ^t in the Find box, leave the Replace box blank, and do a Replace all. Done.
  • * (asterisk): A string of text. Use as a ‘wild card’ when you’re restoring your special formatting. Say I want to restore my italics. In the Find box type -STARTI-*-ENDI-, click the ‘wild card’ box, and leave the Replace box blank but ask it to replace text with italics. Do a Replace All and all your tagged text is italicized. Then use Find/Replace to get rid of the tags.

SAVE AS

When I’m working on a project, I might have four, five, ten versions of a file. If I’m making major formatting changes, I NEVER EVER mess with my source file. Let’s say I want a printed version. I do a Save As to make a new version that is named Print_Docname_date. Then I apply headers/footers, page numbers, page breaks and modify my styles to make it suitable for printing. My original source file remains unchanged and ready to use. Using Save As is the best habit you can get into while you’re working. (And it’s not like you’re having to save your work to floppy disks–your computer has lots of space. Use it!)

WEB VIEW

Basicformat4Forsake print view and get used to web view while you work. This view is flexible (flow text) and enables you to easily display multiple screens and compare text while you work. You can adjust the width of your screen, too, and not lose chunks of text or reduce the image size in order to see everything.

FORMATTING TAGS

Because I use a variety of programs, and I dislike intensely losing formatting such as italics or trying to remember where I want a block of offset text, I tag my formatting. Now, because Word is html-based, you do NOT want to use html tags in your text. It’s okay if you’re outputting a file to a text editor, but if you’re going to a program that is html-based such as Scrivener or InDesign, or if you intend to bring the text back (you’re ‘nuking’ it, according to Smashword’s style guide), then those html tags are going to seriously mess things up.

My tags are arbitrary. I’ve come up with them because they are unique and easy to search for; they don’t show up in text (normally). Feel free to use mine if you want or come up with something that makes sense to you to use. IMPORTANT TO REMEMBER: Special formatting such as italics or bolding require OPEN and CLOSE tags.

  • Italics: -STARTI- (open) -ENDI- (close)
  • Bold: -STARTB- -ENDB-
  • Underline: -STARTU- -ENDU-
  • For any special formatting such as headlines, poetry, etc: -SPECIAL- (this tag is a note to myself)
  • Placing Images: -IMAGE-
  • Scenebreaks or deliberate blank lines: ##

That’s it. Simple, no? This is MS Word in the digital age, a writing tool you can make work for you instead of against you.

 

Managing File Sizes for Ebooks

The majority of fiction writer/publishers will not run into overall file size problems. Text doesn’t create monster files. Using graphics or illustrations can add significantly to the overall file size, but I’ve yet to create an ebook that exceeds –or even comes close to–Amazon’s 50MB limit (which may be changing due to the introduction of the new Fire HD tablets). Even with illustrations and graphics, I do my best to keep the overall file size under 5MB because of Amazon’s delivery fees ($.15 per MB). Those fees are charged against the publisher and can eat up royalties quickly.

As I said, most fiction writer/publishers will not run into problems with overall file size.

Where fiction writer/publishers do run into problems are with the size of individual chapter files within the ebook. When you use <h1> or <h2> tags in html, or the Heading 1 or Heading 2 style in a word processor, you are alerting the conversion programs (such as Calibre or KindleGen) that this is a new chapter and should be split into a new file.* If you don’t use the headings or tags, the conversion programs look for certain words–Chapter, Part, Section, etc.–to determine where the file should be split. What is NOT reliable at all is using page breaks (in a word processor) or the “page-break-before” command in html/CSS. (I have absolutely no idea why those work sometimes, but sometimes they don’t–my best guess is the whims or moods of the Digital God.)

I always split html (text) files into chapters or parts, which manages the overall ebook very nicely. Even though this example is from a novel (Prophet of Paradise by J. Harris Anderson) that is almost 200,000 words long, notice the size of the individual chapters:

File Size

What happens if you don’t use tags or headings and your chapters have titles the conversion programs don’t recognize? What happens if you don’t have chapters at all and your ebook is deliberately one long tract? If it runs up against the 300KB file size limit (approximately 45,000 words), several things could happen:

  • Your file fails to convert
  • The conversion program inserts page breaks whether they are appropriate or not
  • The file converts, but some devices tell the user the ebook can’t be loaded

If your files are less than 300KB, but still largish (over 150KB) your readers could experience serious screen lag as they page through your story. This is an important consideration for genre fiction writers since the chances are your readers are Super-Readers and might have hundreds or even thousands of ebooks loaded on their devices. They will not be happy if your file sizes and their addiction cause several seconds of lag every time they “turn” the page.

What to do?

  • If you are using a word processor to style your ebooks, use the Heading 1 and Heading 2 styles for your chapters, parts and sections. (Do NOT depend on the conversion programs to recognize your inserted page breaks!)
  • If you are styling in html, use the <h1> and <h2> tags.
  • If your project does not have natural breaks such as chapters or parts (it’s long short story or novella) consider a minor restructure. Use the page count as your guide and try to find natural breaks around the 15,000 word mark–a scene break or time or pov shift or even an illustration that sits on its own “page”.

* If you are using Calibre to convert your ebooks, you can check the file splits in Calibre’s EPUB editor. You’ll see the list of individual text/html files and can open each one on the viewer/edit screen. If you are experiencing inappropriate page breaks, you can manage the fixes in the editor.

 

 

Why You Shouldn’t Format Your Word Docs

Dungeon babyThere’s a reason my ebooks are superior–two reasons, actually–and neither has anything to do with my technical prowess (I don’t have much) or talent (anyone can do what I’m about to tell you).

Reason Number One: Pre-production, I clean the text. As soon as a document comes up in the queue, I open it and start stripping it of everything that can mess up an ebook: extraneous paragraph returns, extra spaces, and tabs. I tidy up punctuation, tag areas that require special coding, neaten italics and check for special characters that won’t translate. As a writer and editor myself, I know most of the writer tricks and have a rather lengthy list of things to look for. By the time I’m ready to start coding, the text is so clean it squeaks.

Reason Number Two: Post-production, the ebook is proofread. I don’t care who proofreads the ebook. I can do it, the writer can do it, the writer can hire the job out to someone else. I give the writer a proof copy of the ebook and a mark-up document and encourage them to be as picky as they can stand. Even if they hire me to proofread, they still get the proof copy to load on a device or their computer so they can check the formatting and layout. The point is to find mistakes before the readers do. The point is to make sure the ebook works properly.

I am shocked and appalled that every single person who produces ebooks doesn’t do the exact same thing. They don’t and I know they don’t because I read ebooks that are filled with the types of errors and hiccups that text cleaning and proofreading would have rooted out.

The trad pubs are actually worse offenders than are indies, especially when it comes to back list. I can see it with my own eyes, but it’s amusing to see a publisher admit it publicly on The Passive Voice blog:

J.A. Our experience with Kindle is that as soon as a customer complains they take down the file and send the publisher a takedown notice. It’s actually a real pain in the neck. It could be one person complained and something very minor. We get them occasionally and we fix them right away. They give the reader a credit for the download. I should add that when files are converted they generally aren’t checked page for page like a print book might normally be. We rely on the conversion house to do a good job. If we keep catching errors or getting complaints we would change vendors. We pay pretty good money for these conversions. Our books are almost all straight text so conversions aren’t generally a major issue, but books with columns or charts, or unusual layouts do cause problems and need to be checked carefully. –Steven Zacharius, CEO, Kensington Books

Emphasis mine.

Having personally cleaned up well over a million words of scanned and OCR’d text, that statement offends the shit out of me. Writers deserve better. Readers deserve better.

So what’s that got to do with formatting Word docs? Everything.

If you’re a Do-It-Yourselfer, and are formatting your own ebooks, you cannot skip these steps. (On a sidenote, my biggest gripe with Smashwords is how difficult they make it to proofread an ebook. An upload has to go through the whole publishing process before you can look at it live on a device. Depending on how fast you are at proofreading, the ebook can be live–all goofs intact–for weeks before you can fix them and go through the process again.) My suggestion for the indie formatting Word docs for Smashwords (or any other distributor who accepts Word docs) is to convert them first with a program like Calibre and proofread the results. Find and fix problems before uploading the Word doc to Smashwords.

If you’re hiring a formatter, find out first if they clean up your file pre-production. Many do not. If that’s the case, you need to do the cleaning. Some pros charge by the hour to clean up the Word doc. The more elaborately you’ve formatted your document, the longer it will take to clean it up and the more expensive it will be. (Not to mention wasting your own time on needless work.) My suggestion, if you have special requirements, arrange for a system of tags to let the formatter know what you want. I ask writers to put instructions inside square brackets, i.e. [HEADLINE, PUT IN SMALL CAPS, CENTERED, EXTRA SPACE ABOVE AND BELOW].

Find out, too, the professional’s policy on proofreading. Do you get a proof copy? Does the formatter charge extra to input changes and corrections? (I charge for actual proofreading, but I don’t charge to input changes and corrections from somebody else’s proofread.) If you are not allowed to make post-production changes to your ebook, find another service. Trust me, no matter how well edited, cleaned and formatted the file is going in, you will find something to fix while proofreading. (Gremlins!)

So, for you writers working in Word, one final suggestion: Post the following where you can see it while you work and keep repeating it until it sinks in:

What I see on the computer screen is NOT how how my text will look, or act, in an ebook.