Punctuation Purgatory: The Em Dash and the Ellipsis

There are some people who smugly believe they are the bane of my existence. Sorry. My Cone of Silence is such a powerful force field, no mere human being can annoy me for long. The true bane of my existence is punctuation in ebooks. Especially the two characters most beloved by fiction writers: the em dash and the ellipsis.

On the good news front, the people who program Amazon’s Kindles have solved the em dash problem. It used to be that Kindles treated two words joined by an em dash as a unit. Hence, it could cause big, ugly spaces in sentences when the text flow jumped that “word” to the next line:

You’re innocently typing along and minding
your own business and decide, for good
or maybe not so benign
reasoning–character counts in this business,
you know–and there’s a big ugly space…

It appears now that every em dash is flanked by zero-width non-joiners. What that means is, the em dashes break when they reach the end of a line. No more big, ugly spaces in sentences.

Every silver cloud must have a spot of puce. I wouldn’t be me if I didn’t bitch about it. The rule appears to be iron-clad, even for em dashes at the end of a line of dialogue.

“Hey, stupid! Watch out for that–“

No problem–unless your dialogue runs a little long and the text wraps to the next line.

“Hey, stupid! You better watch out for that
–“

This would be an easy fix. Just slip a zero-width joiner between the word and the em dash so it’s not allowed to break at the end of the line. EXCEPT Kindles no longer recognize the zero-width joiner entity. I can put them in, but the device just ignores them.

Le sigh...

***

Ellipses never seemed to cause much problem on the device end–the problems were caused by writers using three periods instead of an ASCII character. Or worse, trying to go for the “bookish” look and spacing the periods. This caused a whole generation of orphans on the screen.

What are saying, Jaye? My ellipses are.
.. improper?

Or something even sadder can occur. The
poor little orphaned period sitting all alone..
.

The cure for this is simple. If you are using Word, run a Find/Replace All operation with three periods in the Find box and three periods in the Replace box. Word will automatically change your three periods into ellipses that the ebook will treat as a unit. If you’re using html, do a Find/Replace to turn the three periods into the ASCII character.

What if you want spaced ellipses? Normally I discourage that. Spaced ellipses are just asking for trouble. They look fabulous in print, but they play havoc in ebooks. An ellipsis at the beginning of a line or even sitting by itself on a line looks a bit odd, but it’s acceptable. An orphaned period or two periods looks like a mistake. Plus, justification could warp them out of shape. That is not acceptable.

But. I have a client who really, really, really wanted spaced ellipses and was willing to risk a platoon of orphaned periods to get them.

I came up with a solution that is so simple, so elemental I feel like a dope for not thinking of it before. The no-break space.

In html the entity is & nbsp ; (but all closed up–the spaces are just to fool wordpress). So, a spaced ellipsis would look like this:

nbspThe first line is a regular ellipsis. The second is an ellipsis with punctuation. On the Kindle it will look like this:

. . .

. . . ?

Ta da! Spaced ellipses the Kindle treats as units.

snoopy

 

What the BLEEP is Wrong With You, Harper-Collins?

gordon

Get your BLEEPING substandard BLEEP off my BLEEPING Kindle!

You know what I think about shitty ebooks? It makes me want to start channeling Chef Gordon Ramsay. “Come on! What the BLEEP is wrong with you?”

What set me off? What transformed me from laid-back, easy going, tolerant and generally all ’round good ol’ gal and unleashed my inner-Mad Chef with a potty mouth?

This.

halfheadBefore I go totally off my nut, let me state, categorically, Stuart MacBride is one of my favorite authors. He’s on my recommended reads list, he’s made my two of my top ten lists, (here and here), and I’ve blogged about his books and characters. AND because I know how publishing houses work, the majority of my wrath is directed at

HARPER-COLLINS

Yeah, that Harper-Collins. You know, the big publisher who curates fine fiction and offers so much value to authors and readers with their editing and covers and marketing and brand name? Yeah, that one.

HARPER-COLLINS–MORE SPECIFICALLY, HARPER-VOYAGER

When I bought my first Kindle the very first book I purchased was Shatter the Bones by Stuart MacBride. Paid a premium for it, too. Despite how little I knew then about ebook formatting, I knew that book was an utter embarrassment. I could make a better looking ebook by running a Word file through MobiPocket. Along with setting me on a journey of learning how to produce a fine-looking ebook, it also taught me the value of downloading samples. Thusly I learned how much contempt Harper-Collins has for its authors and readers. They put out some of the shittiest looking ebooks around.

So why I did buy Halfhead? It looked good and I’m an optimist. I thought, well, finally! HC realizes ebook readers deserve decently formatted ebooks. It wasn’t until I settled in for an enjoyable read that I realized

THEY DIDN’T PROOFREAD THE EBOOK!!!

So to channel my inner-Gordon: “What the BLEEP is BLEEPING wrong with you? Get the BLEEP out of my BLEEPING Kindle! You should be BLEEPING embarrassed! Come on!

Split words, joined words, backward quote marks, mixed up homonyms, and no consistency in hyphenation. That’s proofreading 101. Halfhead is filled with mistakes a sixth grader could have spotted and fixed. It’s embarrassing.

My goal as a self-publisher is to produce a book with fewer than five typos/gaffes per 100,000 words. That’s a freakin’ high standard and damned near impossible to achieve, but it’s a standard borne of respect for authors, literature and readers. The only way to even get close to meeting that standard is to proofread the ebook until my eyeballs bleed. It means loading a PROOF COPY onto my Kindle and going through the book line by line, word by word, and punctuation mark by punctuation mark.

IT MEANS GIVING A SHIT.

Having not seen a HC contract, I have no idea what kind of royalties they are paying authors. I imagine it’s around 25% net (with publisher accounting that can mean only pennies per unit sold). So figure roughly that authors–for the privilege of being published by HC with all its supposed services and benefits–are giving up anywhere from 82.5% to 94% of the cover price. My question for Mr. MacBride (and any other HC author) is WHY? Why do you let them treat your work like this? Why do you let them abuse your readers with sub-par production? Proofreading is so elemental, so necessary, and to let a book go out the door without it is completely, utterly inexcusable.

ANY ENTITY THAT ALLOWS AN EBOOK TO GO LIVE WITHOUT BEING PROPERLY PROOFREAD DOES NOT DESERVE TO CALL ITSELF A PUBLISHER

No proofreading… Are you BLEEPING kidding me?

Restore Paragraphs in an OCR Scan

Earlier, I wrote a post about DIY scanning and doing an OCR rendering and clean-up of your back list books. It doesn’t have to be expensive and it’s not difficult to do. It does require patience, because cleaning up an OCR rendering takes time.

If you used FreeOCR (as I’d recommended) one thing you’ve noticed is that it inserts a hard return at the end of every single line. The first time I saw that I freaked out a bit. I envisioned having to go through the entire file, manually deleting those extra returns and restoring every paragraph. Then I discovered the hard returns actually help in cleaning up the file because I can work line by line through the text, comparing it to the original material.

Once the text is cleaned up, the paragraphs do need to be restored. If you are using Notepad++ (a text editor that I highly recommend) you can use Find/Replace to do the job. The first step takes some time, but the actual restoration uses the power of Replace All to do the job quickly.

Before you begin work on the file, do a Save As and work on the copy. That way if you mess up, the original is intact and you can easily start over.

STEP ONE: Insert an extra line between each “true” paragraph.

In order to keep an eye on what you are doing toggle on the Show Characters button. It’s in the menu bar and the icon looks like a blue pilcrow (paragraph symbol). It will display black boxes with [CR]–for carriage return–and [LF]–line feed–wherever there is a hard return.

Once you have an extra line between every true paragraph, you will need to insert an extra space at the end of every line. This way you won’t end up with joined words.

STEP TWO: Open the Find/Replace box and toggle on “extended”.
In the Find box type: \r
In the Replace box type: (space)\r
(don’t type out “space” just tap the space bar once)
Do a Replace All

Now you are going to tag the places where you WANT a hard return.

STEP THREE: In the Find box type: \r\n(space)
In the Replace box type: \r\n-N-
Do a Replace All

Now the step where you have to steel your nerves. Remove ALL the hard returns.

STEP FOUR: In the Find box type: \r\n
Leave the Replace box blank (no spaces either)
Do a Replace All.

Now you have one giant block of text with zero hard returns. But don’t freak out. Now you restore the proper paragraphs.

STEP FIVE: In the Find box type: -N-
In the Replace box type: \r\n
Do a Replace All.

Now your paragraphs are restored and there are no extra hard returns to be found. You will need to now get rid of those extra spaces at the end of each paragraph.

In the Find box type: (space)\r
In the Replace box type: \r
Do a Replace All.

That’s it. Except for the first step where you have to insert an extra line between each real paragraph, explaining this takes longer than doing it. This method is a whole lot easier than manually deleting the unwanted hard returns.

Have fun!

Clarifying Source Files: How To Use Them

I’ve done a lot of talking about source files, and inadvertently confused some folks. I don’t mean to muddy issues–it just happens. So I made a chart! (Aren’t you glad you stopped by?)

The SOURCE FILE is just that. The source from which everything else springs. You don’t format it (beyond what is necessary for YOU to comfortably compose) because you DO NOT NEED TO. Essentially, while composing original works the more you act as if your word processor is a typewriter (except for tabs–no tabs!), the cleaner it will be and the easier it will be for you or someone else to format it for a specific use.

Once you have a Source File, you MAKE COPIES of it in order to format it for a specific purpose.

Let’s say you’re sending a manuscript to XYZ Publishing House. You need a printed document. You open the source file and do a Save As to make a copy. In that copy you will insert a cover page, header, page numbering, and adjust the margins and font according to the publisher’s guidelines. The source file remains intact, unchanged.

You want to self-publish your novel. You open the source file and do a Save As to make a copy. You can send that copy to a hired formatter and let them take it from there. You can format a .doc file in Word according to the distributor guidelines. You can hand code the copy in html. The source file remains intact.

I didn’t include every single way to format a file, but you get the picture, right? Let’s say you published your ebook. A reviewer would like a pdf file. You do a Save As, make a copy and format a pdf. You want to make an electronic submission? Do a Save As, add your address block, maybe change the font and line spacing, and you will submit a nice clean file that agents and editors can easily read on almost any computer or device.

If you look at my chart and think this is terribly complicated and I’m trying to make extra work for people, you are wrong. When it comes to digital files, there is no One-Size-Fits-All format. If you get in the habit of creating your original files in a no-frills, minimally formatted style it will save you work, save you time, and save you headaches.

I hope this clarifies things.

 

Scan, OCR and Restore BackList Books

This week I read a comment on a blog (can’t remember where–sorry) where a writer said she was putting off reissuing her backlist titles because she didn’t have accessible computer files for them and so she’d have to scan the actual books, run them through an OCR program and format them. She didn’t know how to do that.

I hear ya, sister. A few months ago I’d have nodded in agreement, and said, “Yep, too hard, too time-consuming, too expensive.” Now, however, having spent the past few months restoring nearly two dozen old paperback books from scans and turning them into ebooks, I know it’s NOT too hard, it IS time-consuming, and the cost can range from dollars per page (expensive) to FREE (DIY option).

(Another option is to retype the book, but quite frankly, folks, unless you are a super-typist with wrists of steel–which I most certainly am not–that is a daunting proposition.)

You know me. Somebody sez, “Can you do this?” and I reply, “How hard can it be?” Then I bumble and fumble around until I figure out how to do it. Then I come on here and am able to give you some tips that mean you can skip the bumbling and fumbling part. Unless you enjoy b&f. In that case, you can stop reading this post.

This is for the Do-It-Yourselfers.

SCANNING

Do a Google search for “scanning books” and the result will come up with thousands of services that will take your old books or manuscripts and turn them into pdf or doc files. Some services will scan the book without harming the binding, some will chop off the spine, destroying the book. Prices range from per-page costs to flat-rate. I haven’t used any of those services, so I can’t recommend any of them. You’ll have to do your own research.

You can also take your old books or manuscripts to a copy store such as Fed-Ex/Kinkos or a full-service office supply store such as Staples, and either do it yourself on their equipment or have them do it for you.

If you happen to own a scanner, you can do it at home. This is the insane option because quite frankly most home scanners are ridiculous beasts that take their sweet time (I know this because I had to try it myself just to see and so scanned a nearly 300 page manuscript–easy on the hands, tough on the buttocks. It took hours!) If you are home-scanning actual pages from a paperback, you will have to play with the settings on your scanner because most are at their best scanning photos and that resolution is far too high to get good results. Best results are achieved if you copy the pages onto good quality 20# or 24# copy paper and then scan the copies.

However you choose to have your book/manuscript scanned, my recommendation is to have the scanner turn it into a pdf file. There are services and programs that will do the OCR conversion during the scan and produce a .doc, .docx or .rtf file for you. On the surface, it looks like a bargain. I think it’s dangerous because: 1) the file you receive will be huge and bloated and junked up with tons of coding that can severely mess up your ebook: 2) it will not save you any work during clean-up and in some ways it makes clean-up more of a chore; 3) it could give you a false sense of security that your file is cleaner than it actually is and your ebook could end up like so many that are on my Kindle right now, full of formatting errors and gibberish.

Here is a file that has been scanned and converted at the same time:

Here is a file that has been DIY scanned and turned into a .doc file:

It’s a big mess, too, but there are actually fewer dangerous formatting issues you will have to address. Awful as it looks, this example is easier to clean up and turn into an ebook then the first example. So save your money (and a few headaches) and run the pdf through the OCR program yourself.

OCR

PDF files are image files. Pictures of a page. In order to clean up and format the pages they must be converted into text. That’s where OCR comes in–Optical Character Recognition.

I found a nifty little program called FreeOCR. It’s a free program you download onto your computer. It’s a powerful program with a few bells and whistles–none of which I recommend you use. This is a case where the more you automate the process, the worse your results will be. There is no good substitute for the human eye and human instincts when it comes to restoring a document file. You’re better off in the long-run by doing a basic OCR conversion. That means, open the FreeOCR program, open a pdf file, then render it page by page (depending on the size of the file and the density of the type, to do a complete book the process will take between 20 minutes and an hour).

The original scanned page is on the left, the OCR conversion is on the right. You can see what a mess it is. That’s because the OCR is very efficient. It turns not only images of text into text, it turns water stains, wrinkles, shadows, and debris embedded in the paper into text, too. If there are notes in the margin, it will try to turn that into text. A basic scan also inserts a hard paragraph return at the end of every line, gets rid of paragraph indents and destroys special formatting such as bold and italics (the first time I saw this I totally freaked out). Some things convert more cleanly than others. If you’re converting a decades-old paperback where the pages have yellowed and degraded, the conversion will be a HUGE mess.

But not a hopeless mess.

CLEAN UP

FreeOCR gives you an option of saving your rendered document as a Word file. You can do that and clean up your file in Word. There is a much easier, faster and more efficient way. Use a text editor (with a little eventual help from Word). I use Notepad++, a program you can download for free. Save your OCR rendering into the clipboard (or do a right click, Select All/Copy) and paste it into the text editor.

Whether you use Word or a text editor, this is the time-consuming part of the process. And there’s no help for it. If you want a good-looking ebook, you need to make your converted file squeaky clean. (Your other option is hiring someone to do it for you. BUT–and this is a huge but–you have to make sure the service you hire is NOT automating the process, but that there is instead an actual human being going through the book word by word and restoring the text. Those automated programs are powerful and they do a good job on some projects, but I have ebooks I have purchased on my Kindle right now that are unreadable messes due to those programs.)

I have learned a few things to make the job go faster and more efficiently.

  1. Save restoring the paragraphs for last. Take a look at the image of the OCR conversion in Word. I toggled on the Show/Hide feature so you can see how every line has a paragraph return. What you see is the layout from the printed book. That can help during clean up.
  2. Work off the actual pages. Either have the actual book in front of you or split your computer screen and have the pdf file open to the scanned pages. That way if the OCR mangled the text, you can retype a word or line from the actual copy instead of trying to guess what it is supposed to say. You can also tag special formatting such as italics as you go along.
  3. Use Find/Replace.

The text will be full of oddball characters (I call them bug shit). Things like degree symbols, floating quote marks, greater and less than characters, slashes, tildes. If something doesn’t belong in your text file–Find/Replace All gets rid of it. You can also use it to get rid of headers, footers and page numbers. Once you have the text cleaned up, you can use Find/Replace All to get rid of extra paragraph returns, restore the proper paragraphs and un-hyphenate any words that had been split in the printed version. (BONUS TIP: Before you get rid of the extra paragraph returns use Find/Replace to add an extra space at the end of each line. That keeps words from being joined and makes it easier to find hyphens you want to get rid of)

So, yes, this is time-consuming, but it is not hard nor does it have to be expensive. It is definitely worthwhile to get your backlist back in circulation.

 

Oh My God, I’m a Nerd!

Long time readers of this blog have watched my progression from building ebooks in Word to the present day as I’m handcoding html. You’ve heard me whine about the quality of ebooks and the difficulty of producing a book that renders perfectly on every device, every time. I’ve used different programs, different methods–and for the longest time I utterly resisted learning html because 1) I knew nothing about it; and 2) I resented the idea that one had to be some kind of mad genius-computer nerd in order to make a decent ebook.

Well. I was wrong. It doesn’t matter that I knew nothing about html. If one is motivated, one can learn. Plus, one doesn’t need to be a mad genius (or even a slightly bent genius) in order to learn basic coding–which is really all one needs in order to make a beautiful ebook.

One does have to be, however, a bit of a nerd. I realized this the other day when I announced to the old man, “Ha! Regex isn’t so hard and toggling the extended command means I can wrap paragraphs and find extra spaces as easily in the editor as I can in Word. Bwahahaha!”

Do you understand what I just said? Don’t worry. The old man didn’t either. A month ago I wouldn’t have understood it. Suffice to say, I’m learning a whole new language and it is finally making sense to me even though that means my family now looks at me the same way the dog does when I’m talking to him (he’s waiting for me to say the magic words–“walkies” or “cookies”).

But, I haven’t done all this alone. Every time one of you guys, blog readers, makes a comment about a new way of doing something or talks about a new program, I have to check it out. And I learn things. When I’m working, I have a screen open to the W3Schools website so I can quickly get questions answered. I’m always bopping around the ‘net, seeing how others have solved problems and seeing if they’ve learned something new. I don’t always (okay fine, most of the time) understand what others are talking about. The real experts have been doing computer programming for decades and they speak “html” with casual fluency while I’m over here speaking very loudly and very slowly and adding vowels to the ends of every word in an effort to make myself understood (I said-o no comprehendo, capiche-o, amigo?).

Needless to say, when I do find a reference source that a) tells me what I need to know; b) shows me what I’m doing wrong and how to fix it; and c) is written in a way that I can actually understand, I glom onto it.

All that build-up and confession leads to sharing a new treasure: The eBook Design and Development Guide by Paul Salvette. Paul follows this blog and comments occasionally. He also has an ebook formatting service. He gave me a head’s up about the book. There were two major factors in my decision to buy it. First it was written in comprehensive English (most of these types of guides offend my writerly sensibilities) and second (this is really important!) it’s nicely formatted (it’s astonishing how many how-to-format-your-ebook guides are so wretchedly formatted as to be unreadable).

This is not a beginner’s guide. Two months ago I wouldn’t have understood much beyond “and” and “the.” With my usual la-di-dah methods of clicking madly until something works, I learned enough of the basics of html on my own to create some very nice ebooks. Armed with those basics, I’m able to understand quite a bit of what Paul is talking about. It helps that he truly cares about how ebooks look and that they work properly on ereading devices, no matter what those devices might be. It also helps that the book is readable, with an engaging style, and only occasionally lapsing into nerd-speak that leaves me smiling, nodding and waiting for him to say “walkies” and “cookies.”

I read it in one sitting, bookmarking countless passages and taking notes with my analogue word processor. I figured out some areas where I am working way too hard to accomplish simple tasks, and making some mistakes which I had to work even harder to overcome and compensate for. Of course I had to run to the computer and try some new things.  I formatted two ebooks using his guidelines and had so much fun, I reformatted another book that happened to be more complicated just to see if I could. I could. I did! I understand a bit more about how ebooks work and some of the differences between the different platforms and why versions of html coding work better on some platforms than with others.

The book is easy to navigate (a most useful table of contents written in plain English) and it includes templates for xhtml address thingies and resets and style sheets. Handy-dandy and easy to use.

Paul, being a generous fellow, generously (foolishly) opened himself up to answering whatever stupid questions I might throw his way. He might be sorry about the offer, but I won’t be. One book doesn’t make me an expert and it sure doesn’t catch me up on twenty years of experience, but it does go a long way toward helping me reach my goal of producing beautiful ebooks.

Highly recommended for nerds-in-training.

-SB-, -PB-, and Italics

So I’m chatting with a friend and she asks, “What you doing?” I sez, “Nuking tabs. Bwahaha! All gone.”

I was, of course, prepping a manuscript for ebook formatting. That means going through the manuscript and getting rid of everything that will screw up the ebook.

My learning processes are always convoluted, in the beginning overly complicated and then as I figure out what’s important and what is not, I streamline and pare down to the essentials. If you are a regular follower of this blog, you’ve seen my process regarding source files for writers. I’ve gone from suggesting writers set up and use style sheets (they should) to what I’m going to suggest today.

When creating a source file with the end goal of turning it into an ebook, all the writer needs to do, formatting-wise, are three things:

  • Indicate page breaks
  • Indicate scene or section breaks
  • Italics, bolding and underlining

When it comes to page breaks, “indicate” means exactly that. Don’t actually break pages either with inserted page breaks or multiple paragraph returns. Why? Because when you’re ready to format, you or the person you hire has to take them out.

The more formatting you put into your source file, the more formatting that has to be removed. The more that has to be removed, the greater the chances of something that might be missed (screwing up the ebook) and the more it costs in time and money.

When I get a manuscript to format, it’s generally been created with a word processor. Whether I’m going to format it for Smashwords (a Word file) or for everything else (html files), the very first thing I have to do is–

  • Remove extra spaces
  • Remove extra paragraph returns
  • Remove page and section breaks
  • Remove headers, footers and page numbers
  • Tag page breaks
  • Tag scene or section breaks
  • Tag special formatting

I don’t actually have to remove tabs because those are going to disappear when I transfer the text to a text editor, but it’s easy (one Find/Replace operation) and it gives me a clearer picture of the dangerous stuff.

Now, seriously, I’m a writer. I fully understand the NEED to make the manuscript look RIGHT. But writers, you have to understand that every effort you make to that end is going to have to be undone. Because of the nature of word processors, some of the fancy touches you include can actually corrupt the ebook.

The less you do–Honestly! Truly! I’m not lying about this!–the better the ebook will be.

So what’s a poor writer to do? Not much, actually. Use whatever font and font size you like. That’s not what will end up in the ebook, but use whatever is comfortable for you while composing. Line space however you like. It makes no difference in the end. Get out of the habit of using tabs. If you can’t stand not having indented paragraphs, set up a simple style sheet that indents the paragraphs with every hard paragraph return. Get out of the habit of two spaces between sentences. Get out of the habit of adding extra hard paragraph returns to space the text. Get out of the habit of making pages. There are no pages in ebooks.

How does one indicate a page break?

I use a code. -PB- It’s unique, the dashes keep it from melding with text, and thus it is easy to find. What my clean file looks like before I take it to the text editor is this:

Final line in chapter one.
-PB-
Chapter Two
First line in chapter two and so it goes.

Use whatever makes sense to you. If you want to make extra sure you or your formatter don’t miss it, spell it out. -PAGE BREAK- That’s it. That’s all you have to do. When you do the actual formatting, that’s when you center, bold, add graphics, extra spacing, etc.

Scene breaks are another place you should get in the habit of tagging–especially if your habit is to use extra paragraph returns to make a blank line. Those can be easy to miss. My little code is -SB-. The text looks like

Last line of scene or section.
-SB-
First line of new scene or section.

It doesn’t matter much what you use as long as you use something. Asterisks, a pound sign, plus signs, or spell it out -SCENE BREAK-. Use something so the scene break doesn’t get lost.

As for special formatting–italics, bolding and underlining–at some point, whether you do the job yourself or hire it out, you are going to have to tag the special formatting. I’ve gotten into the habit with my own writing to tag as I write rather than highlighting the text and italicizing (or whatever). It’s easier in the long run and I’m used to how it looks. Most writers are not going to want to do that. No biggie. If you are going to tag your special formatting, a few things I have learned–

  • If you’re going to format in html, you know to tag the special formatting with open/close codes.< i > and < /i >
  • If you are going to format your ebook in Word and are tagging for the purpose of stripping extra coding out of the document, do NOT use html tags. Using < i > TEXT or Wild Card < /i > in the Find box of a word processor can have… interesting results. Not the fun kind of interesting either.
  • For Word files I use -I- and -ENDI- to open and close italics. Easy to find and doesn’t give Find/Replace fits.
  • Make sure your special formatting is paragraph specific. In other words, don’t just highlight big blocks of text and toggle on italics. Highlight the necessary text within each paragraph, italicize it, then do the same in the next paragraph. Fewer chances for conversion programs to argue about what you really mean.

That’s pretty much it. To get the best results in your ebook, no matter who does the formatting, copy the following, print it out, and tape it to your computer monitor as a reminder:

This is a FILE not a document. Less is more. Less is good. The less you do now, the less you have to do later.

 

 

Straight Quotes Versus Curly Quotes in Ebooks

I have mentioned my dislike for straight quotes in ebooks in earlier posts. I think they look amateurish and unfinished. Some people don’t mind them. Many people probably don’t even notice them. So this is one of those areas where personal preference should be the deciding factor.

Except…

This is from the w3schools html character reference guide:

Reserved Characters in HTML

Some characters are reserved in HTML and XHTML. For example, you cannot use the greater than or less than signs within your text because the browser could mistake them for markup.

HTML and XHTML processors must support the five special characters listed in the table below:

Character Entity Number Entity Name Description
" &quot; quotation mark
' &apos; apostrophe
& & &amp; ampersand
< < &lt; less-than
> > &gt; greater-than

(And look! This browser screwed up the copy/paste (or fixed it) and the table above isn’t complete. Follow the link to see the entire table with entity numbers.)

Keywords here: “…the browser could mistake them for markup.”

That is not a good thing. Ereading devices are essentially browsers and giving them an opportunity to mistake something is a big fat goof waiting to happen.

If your preference is for straight quotes (double and single) then go with your preference, but remember to code them either with the entity number or the entity name.

If you do go with curly quotes (or smart quotes), you’ll run into the frustration of them being turned occasionally backward by the infinite wisdom of your word processor (which insists upon doing what it’s told rather than what you mean). It’s a pain when you’re being a good doobie, composing your masterpiece with the auto-correct functions turned off in order to create a clean source file, but when you run a Find/Replace to turn on the curly quotes, quotes after em dashes or ellipses that should be right double quotes are instead turned into left double quotes, to name just one example.

You can however find and correct your curly double and single quotes with Find/Replace. I only know how to do it manually in Word. In a text editor I can do Replace All. In either case, it shouldn’t take long and the key is the search terms you use.

In Word look for instances where you’ve used double quotes before or after an elllipis or em dash. (I’d type the actual characters, but wordpress wants to “correct” them for me)

  • dash dash double quote
  • double quote dash dash
  • double quote period period period
  • period period period double quote

Or, if you’ve already converted your dashes into proper em dashes, do your search for:

  • ^+ double quote
  • double quote ^+

You get the picture, right? If you find an instance where the quote mark is turned the wrong direction, you can fix it. If you’re doing this in a text editor, you can search for the actual characters and do a Replace All to make sure they are turned the right way.

I also like to check for gremlin induced wrong way quote marks. I know right double quotes don’t belong at the beginning of a paragraph or sentence, and left double quotes don’t belong at the ends. In a text editor I run these searches:

  • <p>right double quote
  • left double quote</p>
  • (space bar)right double quote
  • left double quote(space bar)

I can either do the corrections manually or run a Replace All.

Single quote marks and apostrophes tend to turn the way they should. But here is a fun little gremlin that I’d never noticed until it was pointed out to me (and now I can’t unsee it!). I don’t even know what you’d call these, but I’m going with truncated contractions—words with their front ends chopped off, usually in dialogue. Examples:

  • ’em—as in “Smoke ’em if you got ’em.” (th)em
  • ’cause—as in “‘Cause I said so.” (be)cause
  • ’round—as in “Ain’t seen him ’round these parts.” (a)round

These words are contractions, which call for an apostrophe, and that means a RIGHT single quote mark. Fortunately, left single quotes aren’t used all that much in most American writing. You can search for specific usages (if you remember all the truncated contractions you used) or, if you’re in a text editor, you can search for left single quotes and change them to right single quotes as necessary.

Is this a pain. Yep. It’s worth it, though.

 

 

 

More About Ebook Formatting, Source Files and Tales of Tagging

First an apology for not answering every comment this week. On the “Source Files Update” post there were some great comments. People are coming up with solutions and solving problems. So go read the comments over there. One commenter in particular is hard at work on the subject of formatting ebooks from word processor files. I’ve been corresponding with William Ockham regarding his efforts to create a program that will make it easy to format a word processor file into a good-looking ebook. I’ve sent William some grotty files and he’s been problem solving. I’ve brought one of his comments over to this post so you can get a better idea of what he’s doing:

Wow, I’m flattered. I’ve been busy with my guest blogging stint over at http://www.thepassivevoice.com and didn’t see all these comments. Since there is some interest here, I’ll share what I can of my plans. I firmly believe that writers should use whatever tool works for them. For most people, that’s Microsoft Word. Some folks are using Scrivener and almost everyone else is using some word processor (a flavor of OpenOffice or those WordPerfect holdouts).

The first thing I’m going to release is a free document to source file converter service (to use Jaye’s terms). You save your manuscript in RTF format (pretty much every program supports RTF) and upload it to my service. My program will go through and do all the stuff that Jaye talks about. It will strip all the formatting except bold, italics, and chapter headings. You get back a nice clean source file in RTF format. You load it up into your tool and save it back as a .doc file and you have a source file suitable as the input for ebook formatting. It’s not much, but it is a nice little timesaver and your ebook formatter will thank you (even if you DIY). Did I mention it would be free?

I really appreciate all the expressions of support. I hadn’t really given much thought to a Kickstarter, but I am thinking about it now. In the meantime, there is something you could do to help. I need test cases. That is, I need real manuscripts before they’ve been given the Jaye Manus treatment. If anyone has copies of their novels (or short story collections) that they wouldn’t sharing with me, I would really appreciate it. I promise not use them for anything other than perfecting my software. I will send you the cleaned up version and destroy or return the original when I’m done.

If you can help in this way, save your gnarliest files (smart quotes, em dashes, paragraphs indented with tabs and spaces, whatever) in RTF format and
email them
to razoroftruth at
gmail dot
com

Let me know what program (i.e Microsoft Word) and version (like 2000 or 2007) and whether you are using Windows, Mac, or Linux (or other Unix variant).

Which brings us to another problem I’m working on with source files–tagging. One of the things keeping me so busy this week is learning HTML. Turns out it’s kind of fun and quite the challenge. I also discovered that my resulting ebook files are much smaller–why? Who knows. But that’s a plus since I love using graphics for headers and such. Anyhow, the biggest challenge has been doing an ebook in screenplay format. It’s not difficult. It requires essentially three styles: Centered, Block Quote and Hanging Text. Since it ran about 120 pages in manuscript form, the real challenge was making sure every style was properly applied. I also wanted a way to NOT have to go in and tweak every line of text.

Now me, I happen to think FIND/REPLACE is the greatest invention since the light bulb. I’ve stated before that Word’s F/R is a powerhouse. Indeed. I also made some very interesting discoveries about Word and text editors and how they interact re formatting tags.

Le sigh…

Let’s talk about the two most common special formatting tags in the writing universe. Asterisks to indicate bolded text and underscores to indicate italics. Most editors and agents understand what those marks mean. Sending an e-query with those tags in place would be perfectly acceptable. Except… Even if you turn off the auto-formatting features, Word treats them like special characters and so does a text editor. Meaning, a text editor will strip them out. So those are out. You can use them if you like–they are easy to read–but if you ever have to copy the file into a text editor, you’ll lose the tags and your special formatting.

Anyhow, I’ve been using my own little special formatting tags–ii for italics, BB for bolding, and UU for underlining. Nobody but me sees them or has to read them, so no big deal. BUT, I am in the process of creating a cheat sheet for Source Files, and need to come up with tags that One) Make sense; Two) Are easy to remember and use; Three) Don’t activate “helpfulness” in word processors; Four) Work well in FIND/REPLACE operations. Number three is a bitch. I popped around in different programs to see how they handle various tags. Turns out non-letter characters are a problem when created in strings–Word, especially, kept getting wobbly and persnickety. Plus, some can cause problems in HTML coding because it uses so many characters for commands. For instance, I tried i/TEXT/i for italics. That seems fairly straightforward, right? It didn’t make Word go all wobbly either and it translated into a text editor. Problems arose when I did F/R operations in the text editor. I needed characters that are NOT used in coding. Which leaves out almost all of them.

Ah ha, most FIND operations can be made case sensitive. And there is one non-letter character that gave me no problems at all–the lowly dash/hyphen. So here are a few of the tags I ended up with:

  • -ITAL-   -NOITAL-
  • -CTR-     -NOCTR-
  • -BQ-        -NOBQ-
  • -NBSP-

Those might seem a little “wordy” but they are pretty self-explanatory (italics, centered text, block quote, no break space) and they don’t cause interpretation wars between programs. When I paste the Word file into the text editor, all I have to do is run FIND/REPLACE operations to insert the coding. (ex: -ITAL- becomes <i> and -NOITAL- becomes </i> to make italicized text) Most fiction doesn’t require every paragraph be tagged. So I won’t go in to the nifty little shortcuts I found.

The really important thing I’ve discovered is that not all tagging is equal and some of the old printer’s tags will not work because the programs want to do something with them and it’s not always what the writer intends.

So how about you, folks? What nifty tricks tricks have you come up for tagging the special formatting in your files?

You Say Documents, I Say Source Files

A manuscript formatted for print, complete with header.

I’ve been creating manuscripts for well over twenty years. I can rattle off the formatting in my sleep. Double-spaced, one inch margins, header with page number top left corner, drop to middle of page to start a new chapter, blah blah blah. It’s a manuscript. A document to be printed and stacked and tucked in a box or an envelope and put in the mail. Who does that anymore? Oh sure, some agents and editors still insist on hard copies, but they’re in the minority and growing rarer by the day. Even though most agencies and publishers have gone digital, even though more and more writers are finding markets online and many are self-publishing either ebooks or POD, old habits die hard. Writers are still producing documents when they should be producing source files.

Whatever do you mean, Jaye?

Many writers, especially those who’ve been around a while, treat word processors like typewriters. We want to see on the screen what we want to appear on paper. Word processors are very accommodating that way. Most aren’t WYSIWIG, but pretty close. If we center text on the screen, it centers in the printout. If there are 24 lines on the screen, 24 lines print on the page. The printer doesn’t care if we indent lines with tabs, first line hanging or hit the space bar five or six times. It prints as an indent. If all you ever intend to do is create printed documents, then you can quit reading now. If you intend to submit electronically or create an ebook or a POD book or make pdf files, then listen up. It is time to break the document/manuscript habit.

You see, a clean source file can be copied indefinitely and used to create printed manuscripts, digital files for electronic submissions, ebooks, pdfs and POD books. With a clean source file an agent or editor can read your submission on a computer, smart phone, iPhone, iPad, Kindle, Nook, tablet or whatever else they might happen to have and your work will be readable. With a clean source file you can easily make a copy to create a professional looking printed document for that guy still living in 1973. And with a copy of that same file you can format an ebook that will convert cleanly for Smashwords, Amazon, Nook or whatever–or send it to a professional formatter who can turn it around in a matter of hours. Then you make another copy and format that for a slick pdf to send to reviewers. And you can snag a template off CreateSpace or Lulu and load it with your nice clean file and create a POD book. All the while that source file is sitting on your computer, nice and clean, and ready to be turned into whatever you happen to need next.

A Clean, Plain Jane, No Frills Source File, Created in MS Word

There is nothing difficult about creating source files. They are straight text files, nothing more. The difficult part is getting out of the mindset of seeing it as a printed document living on your screen. I know, I know, old habits die hard and writers, especially fiction writers, get a bit freaked out by the lack of page numbers, headers, page breaks and centered chapter heads. Trust me, get into the new habit of creating source files and it could save you from rejections (I wonder how many agents and editors have rejected submissions out of hand just because they couldn’t read the text on their iPhone or it turned into gobbledegook on their computer screen and rather than walk the writer through how to set up a file, they just said to hell with it); it can save you from the frustration of having Amazon or Smashwords reject your ebook (you followed their instructions!) or worse, getting it through the conversion process only to discover your ebook is live, but horribly corrupted; and it can save you money if you hire someone to format your ebooks or your POD book and they have don’t have to charge extra to clean the junk out of your file.

To create a clean source file:

  • Turn off all Auto-Correct/Auto-format functions in your word processor (especially if you use MS Word). Turn off widow and orphan control.
  • Set up a simple style sheet to take care of the font, line-spacing, and indents. Apply it to every source file before you begin a new project and use it religiously.
  • No tabs. NO tabs! NO TABS EVER NEVER NOT EVEN ONCE!
  • No extra spaces between sentences or at the ends of paragraphs.
  • No extra paragraph returns (if you have a scene break, indicate it with the pound sign or three asterisks). Do not use paragraph returns to drop your chapter heads to the middle of the page or to create a page break.
  • No page breaks–of any kind.
  • No centering text–not chapter heads, titles, poetry, nothing (easy way to track chapter breaks, use all caps CHAPTER ONE or bolding)
  • No special characters. Use “typewriter” characters such as two dashes to indicate an em dash and a slash mark for fractions. Avoid super- and subscript characters. If your text contains foreign characters, Anglicize the spelling and track the usages so the special characters can be inserted when the file is formatted for whatever purpose.
  • Even in Word, italics, bolding and underlining don’t seem to screw up a source file. Those are safe.

I’ve had people tell me, “But I need page breaks or nobody will know how many pages there are!” Nobody will be able to tell anyway unless you intend to print out the file on 8.5 x 11 20# bond. And then I’ve been told the writer knows how they want the document to look, so it’s okay. Trouble is, they know how it looks on their screen and how it looks coming out of their printer. They do not know how it looks on an iPad or iPhone or Android or Nook or Kindle or an agent’s Mac (you use a PC) or vice versa. Trouble is, every bit of formatting they do adds code to their file and that code can be misinterpreted or corrupted by another device. If you hire someone to create an ebook, they will look at your wonderful page arrangements, and tack on extra charges to the estimate because the first thing they have to do is get rid of everything you’ve done.

It takes some conscious thought to break old manuscript habits. You can get used to it. Just keep repeating: Source File, Source File, Source File…