Set Off Text: Notes, Footnotes, Captions and Excerpts

Hi guys. I’ve been neglecting the blog lately. But I’ve been thinking about you while learning new ways to make ebooks look even more fabulous.

But! Before we go further, I HAVE to show off Plunderbunny’s latest cover creation. (She and I are working together on a huge project, and she created a cover I just love.)

Optimized-SforAShe used an old photograph of a Lakota Sun Dance ceremony (supplied by the author) and an image of a necklace and medicine wheel (made by the author) to create a striking cover for the ebook. I think Spirituality for America turned out great.

Designing the interior was my job. Part of the challenge for this project was that it contained excerpts, quotes, footnotes, notes by the author, and images with captions. I designed the ebook with tablets in mind (especially given that it also contains lots of hyperlinks to sources on the ‘net). That means color comes into play. The big challenge was finding graceful ways to set off text. Goal: Make it stand out, look good and be readable across all devices.

IMAGE CAPTIONS

Lots of ways to caption images. One way is to marry the caption with the image itself. The big plus with this is that the image and caption never get separated. The big downside is that depending on the size screen the reader is reading on, the caption (because it is an image) can become distorted or too small to read.

Because some of the images had largish captions, I decided to make them all part of the text.*

shot3This image had a short caption. I thought it was set off nicely by centering the text and styling it with small caps (font-variant: smallcaps**). Some of the captions were a little long for centered text (I don’t like having one or two words sitting centered on a line themselves) so in those cases I went with a block paragraph style.

shot2A few of the captions were really long. Since small caps are better in small doses and I didn’t want the block of text mistaken for anything except a caption, I chose to reduce the text to 80% and set it like a quote. The beauty of this is (unlike a caption embedded in the image itself) if the user has trouble with the smaller font, they can increase it.

FOOTNOTES

I used the same device for footnotes within the body of the text.

shot1It is possible to embed footnotes in ebooks so that a tap by the user causes a “pop-up” to display, but otherwise the footnote is “hidden.” (If you’re interested in reading more about it, check out Paul Salvette’s blog at bbebooks.com) Because this is a Kindle book and that tricky trick doesn’t work on every Kindle version, I went with setting off the text with 80% font-size and set off like a quote. One decision I struggled with was where, exactly, to place footnotes. There is no “footer” area in an ebook. One option was to link the footnote and place it at the end of the chapter. That would actually be an excellent option in many cases, especially if the text contains a LOT of footnotes. In this case, though, because there weren’t that many, I went with placing them at the end of the paragraphs which were noted.

NOTES AND EXCERPTS

This book also contained several “Author Notes” rather like sidebars in a magazine article. The author wanted them placed within the text instead of in the “notes” section at the back of the book (containing references and the bibliography). There are also excerpts from other writers and works. I could have done regular block quotes and italics. I don’t know about the rest of you, but too much italicizing in an ebook is fatiguing to read (especially on my older Kindle which doesn’t have the best fonts). So I decided to go with a “box and shadow.”

shot4This is where I really had to behave myself. A couple of weeks ago I read David Wong’s novel, This Book Is Full Of Spiders. It is one of the most beautifully formatted ebooks in fiction that I have seen. It was so lovely, I loaded it on my Fire to read it (I usually read on my Paperwhite) just to enjoy the full visual experience. One of the devices the formatter used was to set off “book excerpts” in a box and shadow, using a colored background. It inspired me. Turns out it is not difficult at all to do.

shot5All it requires is to border the block of text and give it a background color. (I learned this from Paul Salvette’s formatting guide, check the sidebar on this blog) The css styling I used for the notes and excerpts was:

div.excerpt
{
margin: 0;
padding: 12px 12px 0 6px;
background-color: #FFDEAD;
border: 2px solid black;
}

Now when I said I had to behave, I meant the color choice. (I still haven’t figured out how to get a screenshot off the Fire, so a photo of the screen will have to suffice) Tablets support a LOT of colors and when you code in html you can pick from many. I had to seriously resist neon pink or day-glo green, instead choosing a light buckskin color that is easy on the eyes, goes with the theme of the book, and best of all, adds the right touch of shading to eink displays without making the text difficult to read.

fire3This ebook was a challenge, but it was an interesting challenge and that’s the kind I like best. I would love to hear from the rest of you how you found good ways to set off text in your ebooks. Share, people!

* Books is books, ebooks is ebooks, and they’re growing apart faster than most can keep up with. I no longer try to emulate “print” in ebooks for two reasons. Number one, trying to emulate a printed book is a frustrating exercise and the best result you can expect is to create an “ugly cousin.” Number two (and most importantly) ebooks have features and interface capabilities most ebook formatters are barely touching. As a reader I love being able to adjust the display to suit my preferences. On my tablet (a Kindle Fire) I enjoy having the ‘net instantly available with a tap of my finger. Nothing bugs me more than an ebook that has been forced into a nearly static display in a vain attempt to make it look like print.

**Small caps. I love small caps. I think they are elegant, show off text better than bolding, and just plain look good. Unfortunately, they don’t display on every device. So in cases where I think text with upper and lower case will work just as well (and the goal is purely aesthetic) I will use “font-variant: smallcaps.” Sometimes, such as at the beginnings of chapters or scenes, where I definitely want the text in all caps no matter what, but do not want the display to look oversized and chunky, I do “faux-caps.” I upper-case the block of text, then set the font-size to 80%. It looks nice no matter what device it’s being read on.

Find and Replace: Do It Once, Do It Twice

Ol' Lew has taken quite nicely to the digital age.

Ol’ Lew has taken quite nicely to the digital age.

Out of all the small jobs that make up the big job of getting a book ready for publication, proofreading is the job nobody wants. It is NO FUN.

It’s exacting, it’s painstaking, it reduces an otherwise interesting piece of writing into boring little components that must be examined individually. If your attention wanders or if you get caught up in the story (it’s harder to proofread a rousing good story than a so-so one), you can miss errors. Ideally, any project should have at least two proofreaders. This isn’t an ideal world, however, and not everybody has the funds or the qualified (and indulgent) friends to get two reads.

When I build an ebook, I either proofread it myself or send a proof copy to the writer to proofread. Sometimes we both proofread it. All in the hopes of rooting out the boo-boos and gremlins before a paying customer does.

I have, of course, learned a few tricks (of course) along the way. One of the most valuable tools in my arsenal (second only to Webster’s 9th) is the Find/Replace function. This is especially true since I have found that most writers have a tendency to repeat mistakes. One does need to be careful, though, about global FIND/REPLACE. Or you might end up with something like this:

Barnes & Noble was briefly suspected of employing an outrageous anti-Amazon marketing strategy in May after blogger Philip Howard noticed that a version of Tolstoy’s “War and Peace” sold by the chain store had substituted “nook” for every instance of the word “kindle” throughout the text, resulting in sentences like, “It was as if a light had been Nookd in a carved and painted lantern….” The e-book turned out to have been published by a third-party company, Superior Formatting Publishing, who issued an apology (still posted on the company’s Web home page) explaining that it had accidentally applied the “find and replace” function to the entire text when reformatting the Kindle version of the book for the Nook platform.

The stuff of a proofreader’s nightmares.

Every text handling program has its own set of rules and functions. I can’t possibly cover them all here. I suggest you play with your program’s FIND/REPLACE function and figure out what it can and cannot do. The one thing that every program has in common is that it searches for a unique string of characters. That unique string can include spaces and punctuation.

There are some F/R searches I do as a matter of course. The first is for extra spaces. Extra spaces are the bane of ebooks. They all need to be rooted out. I run searches for double spaces between sentences within paragraphs, and for extra spaces at the beginnings and ends of paragraphs. I also run searches for extra paragraph returns.

The second routine search I do is for backward quote marks and apostrophes. MS Word, especially, has a bad habit of turning quote marks the wrong way, especially when the quote marks are connected to em or en dashes or at the beginning of truncated words. Here the basic rules of grammar are useful. For instance, the left double quote belongs at the beginning of a quoted passage. I will search for a space right double quote or a paragraph return or new line right double quote. I run the opposite search for wrong-way right double quotes by looking for left double quotes at the end of sentences.

Another routine search is for proper names and place names. When I proofread I make a list of preferred spellings. Flying fingers or attention lapses trip up writers. Sometimes the misspellings look right and are easily missed. Take my name for instance. “Jay” looks right, but I spell it “Jaye.” I’ll do a search for “Jay” and “Jay’s” to catch any instances where the “e” was dropped.

The same thing goes for preferred spellings. A word such as “judgment” is also correctly spelled as “judgement.” It doesn’t matter to me what the writer prefers–consistency is my fallback. If the writer prefers the former, I will do a search for the latter and change any instances I find.

I’ve worked on quite a few backlist books that have been scanned and run through OCR. Do enough of them and you start recognizing common OCR errors. For instance, misreading the letter “e” as a “c”. Spell check will catch the most egregious errors, but if the text is supposed to be “eat” and the OCR reads it as “cat” then spell check is useless. It doesn’t take much time to run a search for the word “cat” to make sure each usage is what the writer intended. Another common problem with scanned books is that typesetters often use hyphens and en dashes to space text on a line. Finding those is a bear, but F/R is a big help in rooting out the many permutations that end up as errors in an ebook.

I can’t possibly cover every F/R trick. If you, while you are proofreading your own work, get into the habit of assuming you have a tendency to repeat certain errors, you can use F/R to help you create a cleaner ebook. If you find a goof, run a quick search to see if you repeated it elsewhere.

Check List of Common Errors That Can Be Found with FIND/REPLACE:

  • Extra Spaces
  • Extra Paragraph Returns
  • Proper Names
  • Place Names
  • Quote Marks (single and double)
  • Hyphenated Words
  • Preferred Spellings
  • Italicized Foreign Words (yes or no, but be consistent)
  • Em and en dashes, and hyphens

Punctuation Purgatory: The Em Dash and the Ellipsis

There are some people who smugly believe they are the bane of my existence. Sorry. My Cone of Silence is such a powerful force field, no mere human being can annoy me for long. The true bane of my existence is punctuation in ebooks. Especially the two characters most beloved by fiction writers: the em dash and the ellipsis.

On the good news front, the people who program Amazon’s Kindles have solved the em dash problem. It used to be that Kindles treated two words joined by an em dash as a unit. Hence, it could cause big, ugly spaces in sentences when the text flow jumped that “word” to the next line:

You’re innocently typing along and minding
your own business and decide, for good
or maybe not so benign
reasoning–character counts in this business,
you know–and there’s a big ugly space…

It appears now that every em dash is flanked by zero-width non-joiners. What that means is, the em dashes break when they reach the end of a line. No more big, ugly spaces in sentences.

Every silver cloud must have a spot of puce. I wouldn’t be me if I didn’t bitch about it. The rule appears to be iron-clad, even for em dashes at the end of a line of dialogue.

“Hey, stupid! Watch out for that–”

No problem–unless your dialogue runs a little long and the text wraps to the next line.

“Hey, stupid! You better watch out for that
–”

This would be an easy fix. Just slip a zero-width joiner between the word and the em dash so it’s not allowed to break at the end of the line. EXCEPT Kindles no longer recognize the zero-width joiner entity. I can put them in, but the device just ignores them.

Le sigh...

***

Ellipses never seemed to cause much problem on the device end–the problems were caused by writers using three periods instead of an ASCII character. Or worse, trying to go for the “bookish” look and spacing the periods. This caused a whole generation of orphans on the screen.

What are saying, Jaye? My ellipses are.
.. improper?

Or something even sadder can occur. The
poor little orphaned period sitting all alone..
.

The cure for this is simple. If you are using Word, run a Find/Replace All operation with three periods in the Find box and three periods in the Replace box. Word will automatically change your three periods into ellipses that the ebook will treat as a unit. If you’re using html, do a Find/Replace to turn the three periods into the ASCII character.

What if you want spaced ellipses? Normally I discourage that. Spaced ellipses are just asking for trouble. They look fabulous in print, but they play havoc in ebooks. An ellipsis at the beginning of a line or even sitting by itself on a line looks a bit odd, but it’s acceptable. An orphaned period or two periods looks like a mistake. Plus, justification could warp them out of shape. That is not acceptable.

But. I have a client who really, really, really wanted spaced ellipses and was willing to risk a platoon of orphaned periods to get them.

I came up with a solution that is so simple, so elemental I feel like a dope for not thinking of it before. The no-break space.

In html the entity is & nbsp ; (but all closed up–the spaces are just to fool wordpress). So, a spaced ellipsis would look like this:

nbspThe first line is a regular ellipsis. The second is an ellipsis with punctuation. On the Kindle it will look like this:

. . .

. . . ?

Ta da! Spaced ellipses the Kindle treats as units.

snoopy

 

What the BLEEP is Wrong With You, Harper-Collins?

gordon

Get your BLEEPING substandard BLEEP off my BLEEPING Kindle!

You know what I think about shitty ebooks? It makes me want to start channeling Chef Gordon Ramsay. “Come on! What the BLEEP is wrong with you?”

What set me off? What transformed me from laid-back, easy going, tolerant and generally all ’round good ol’ gal and unleashed my inner-Mad Chef with a potty mouth?

This.

halfheadBefore I go totally off my nut, let me state, categorically, Stuart MacBride is one of my favorite authors. He’s on my recommended reads list, he’s made my two of my top ten lists, (here and here), and I’ve blogged about his books and characters. AND because I know how publishing houses work, the majority of my wrath is directed at

HARPER-COLLINS

Yeah, that Harper-Collins. You know, the big publisher who curates fine fiction and offers so much value to authors and readers with their editing and covers and marketing and brand name? Yeah, that one.

HARPER-COLLINS–MORE SPECIFICALLY, HARPER-VOYAGER

When I bought my first Kindle the very first book I purchased was Shatter the Bones by Stuart MacBride. Paid a premium for it, too. Despite how little I knew then about ebook formatting, I knew that book was an utter embarrassment. I could make a better looking ebook by running a Word file through MobiPocket. Along with setting me on a journey of learning how to produce a fine-looking ebook, it also taught me the value of downloading samples. Thusly I learned how much contempt Harper-Collins has for its authors and readers. They put out some of the shittiest looking ebooks around.

So why I did buy Halfhead? It looked good and I’m an optimist. I thought, well, finally! HC realizes ebook readers deserve decently formatted ebooks. It wasn’t until I settled in for an enjoyable read that I realized

THEY DIDN’T PROOFREAD THE EBOOK!!!

So to channel my inner-Gordon: “What the BLEEP is BLEEPING wrong with you? Get the BLEEP out of my BLEEPING Kindle! You should be BLEEPING embarrassed! Come on!

Split words, joined words, backward quote marks, mixed up homonyms, and no consistency in hyphenation. That’s proofreading 101. Halfhead is filled with mistakes a sixth grader could have spotted and fixed. It’s embarrassing.

My goal as a self-publisher is to produce a book with fewer than five typos/gaffes per 100,000 words. That’s a freakin’ high standard and damned near impossible to achieve, but it’s a standard borne of respect for authors, literature and readers. The only way to even get close to meeting that standard is to proofread the ebook until my eyeballs bleed. It means loading a PROOF COPY onto my Kindle and going through the book line by line, word by word, and punctuation mark by punctuation mark.

IT MEANS GIVING A SHIT.

Having not seen a HC contract, I have no idea what kind of royalties they are paying authors. I imagine it’s around 25% net (with publisher accounting that can mean only pennies per unit sold). So figure roughly that authors–for the privilege of being published by HC with all its supposed services and benefits–are giving up anywhere from 82.5% to 94% of the cover price. My question for Mr. MacBride (and any other HC author) is WHY? Why do you let them treat your work like this? Why do you let them abuse your readers with sub-par production? Proofreading is so elemental, so necessary, and to let a book go out the door without it is completely, utterly inexcusable.

ANY ENTITY THAT ALLOWS AN EBOOK TO GO LIVE WITHOUT BEING PROPERLY PROOFREAD DOES NOT DESERVE TO CALL ITSELF A PUBLISHER

No proofreading… Are you BLEEPING kidding me?

Restore Paragraphs in an OCR Scan

Earlier, I wrote a post about DIY scanning and doing an OCR rendering and clean-up of your back list books. It doesn’t have to be expensive and it’s not difficult to do. It does require patience, because cleaning up an OCR rendering takes time.

If you used FreeOCR (as I’d recommended) one thing you’ve noticed is that it inserts a hard return at the end of every single line. The first time I saw that I freaked out a bit. I envisioned having to go through the entire file, manually deleting those extra returns and restoring every paragraph. Then I discovered the hard returns actually help in cleaning up the file because I can work line by line through the text, comparing it to the original material.

Once the text is cleaned up, the paragraphs do need to be restored. If you are using Notepad++ (a text editor that I highly recommend) you can use Find/Replace to do the job. The first step takes some time, but the actual restoration uses the power of Replace All to do the job quickly.

Before you begin work on the file, do a Save As and work on the copy. That way if you mess up, the original is intact and you can easily start over.

STEP ONE: Insert an extra line between each “true” paragraph.

In order to keep an eye on what you are doing toggle on the Show Characters button. It’s in the menu bar and the icon looks like a blue pilcrow (paragraph symbol). It will display black boxes with [CR]–for carriage return–and [LF]–line feed–wherever there is a hard return.

Once you have an extra line between every true paragraph, you will need to insert an extra space at the end of every line. This way you won’t end up with joined words.

STEP TWO: Open the Find/Replace box and toggle on “extended”.
In the Find box type: \r
In the Replace box type: (space)\r
(don’t type out “space” just tap the space bar once)
Do a Replace All

Now you are going to tag the places where you WANT a hard return.

STEP THREE: In the Find box type: \r\n(space)
In the Replace box type: \r\n-N-
Do a Replace All

Now the step where you have to steel your nerves. Remove ALL the hard returns.

STEP FOUR: In the Find box type: \r\n
Leave the Replace box blank (no spaces either)
Do a Replace All.

Now you have one giant block of text with zero hard returns. But don’t freak out. Now you restore the proper paragraphs.

STEP FIVE: In the Find box type: -N-
In the Replace box type: \r\n
Do a Replace All.

Now your paragraphs are restored and there are no extra hard returns to be found. You will need to now get rid of those extra spaces at the end of each paragraph.

In the Find box type: (space)\r
In the Replace box type: \r
Do a Replace All.

That’s it. Except for the first step where you have to insert an extra line between each real paragraph, explaining this takes longer than doing it. This method is a whole lot easier than manually deleting the unwanted hard returns.

Have fun!

Clarifying Source Files: How To Use Them

I’ve done a lot of talking about source files, and inadvertently confused some folks. I don’t mean to muddy issues–it just happens. So I made a chart! (Aren’t you glad you stopped by?)

The SOURCE FILE is just that. The source from which everything else springs. You don’t format it (beyond what is necessary for YOU to comfortably compose) because you DO NOT NEED TO. Essentially, while composing original works the more you act as if your word processor is a typewriter (except for tabs–no tabs!), the cleaner it will be and the easier it will be for you or someone else to format it for a specific use.

Once you have a Source File, you MAKE COPIES of it in order to format it for a specific purpose.

Let’s say you’re sending a manuscript to XYZ Publishing House. You need a printed document. You open the source file and do a Save As to make a copy. In that copy you will insert a cover page, header, page numbering, and adjust the margins and font according to the publisher’s guidelines. The source file remains intact, unchanged.

You want to self-publish your novel. You open the source file and do a Save As to make a copy. You can send that copy to a hired formatter and let them take it from there. You can format a .doc file in Word according to the distributor guidelines. You can hand code the copy in html. The source file remains intact.

I didn’t include every single way to format a file, but you get the picture, right? Let’s say you published your ebook. A reviewer would like a pdf file. You do a Save As, make a copy and format a pdf. You want to make an electronic submission? Do a Save As, add your address block, maybe change the font and line spacing, and you will submit a nice clean file that agents and editors can easily read on almost any computer or device.

If you look at my chart and think this is terribly complicated and I’m trying to make extra work for people, you are wrong. When it comes to digital files, there is no One-Size-Fits-All format. If you get in the habit of creating your original files in a no-frills, minimally formatted style it will save you work, save you time, and save you headaches.

I hope this clarifies things.

 

Scan, OCR and Restore BackList Books

This week I read a comment on a blog (can’t remember where–sorry) where a writer said she was putting off reissuing her backlist titles because she didn’t have accessible computer files for them and so she’d have to scan the actual books, run them through an OCR program and format them. She didn’t know how to do that.

I hear ya, sister. A few months ago I’d have nodded in agreement, and said, “Yep, too hard, too time-consuming, too expensive.” Now, however, having spent the past few months restoring nearly two dozen old paperback books from scans and turning them into ebooks, I know it’s NOT too hard, it IS time-consuming, and the cost can range from dollars per page (expensive) to FREE (DIY option).

(Another option is to retype the book, but quite frankly, folks, unless you are a super-typist with wrists of steel–which I most certainly am not–that is a daunting proposition.)

You know me. Somebody sez, “Can you do this?” and I reply, “How hard can it be?” Then I bumble and fumble around until I figure out how to do it. Then I come on here and am able to give you some tips that mean you can skip the bumbling and fumbling part. Unless you enjoy b&f. In that case, you can stop reading this post.

This is for the Do-It-Yourselfers.

SCANNING

Do a Google search for “scanning books” and the result will come up with thousands of services that will take your old books or manuscripts and turn them into pdf or doc files. Some services will scan the book without harming the binding, some will chop off the spine, destroying the book. Prices range from per-page costs to flat-rate. I haven’t used any of those services, so I can’t recommend any of them. You’ll have to do your own research.

You can also take your old books or manuscripts to a copy store such as Fed-Ex/Kinkos or a full-service office supply store such as Staples, and either do it yourself on their equipment or have them do it for you.

If you happen to own a scanner, you can do it at home. This is the insane option because quite frankly most home scanners are ridiculous beasts that take their sweet time (I know this because I had to try it myself just to see and so scanned a nearly 300 page manuscript–easy on the hands, tough on the buttocks. It took hours!) If you are home-scanning actual pages from a paperback, you will have to play with the settings on your scanner because most are at their best scanning photos and that resolution is far too high to get good results. Best results are achieved if you copy the pages onto good quality 20# or 24# copy paper and then scan the copies.

However you choose to have your book/manuscript scanned, my recommendation is to have the scanner turn it into a pdf file. There are services and programs that will do the OCR conversion during the scan and produce a .doc, .docx or .rtf file for you. On the surface, it looks like a bargain. I think it’s dangerous because: 1) the file you receive will be huge and bloated and junked up with tons of coding that can severely mess up your ebook: 2) it will not save you any work during clean-up and in some ways it makes clean-up more of a chore; 3) it could give you a false sense of security that your file is cleaner than it actually is and your ebook could end up like so many that are on my Kindle right now, full of formatting errors and gibberish.

Here is a file that has been scanned and converted at the same time:

Here is a file that has been DIY scanned and turned into a .doc file:

It’s a big mess, too, but there are actually fewer dangerous formatting issues you will have to address. Awful as it looks, this example is easier to clean up and turn into an ebook then the first example. So save your money (and a few headaches) and run the pdf through the OCR program yourself.

OCR

PDF files are image files. Pictures of a page. In order to clean up and format the pages they must be converted into text. That’s where OCR comes in–Optical Character Recognition.

I found a nifty little program called FreeOCR. It’s a free program you download onto your computer. It’s a powerful program with a few bells and whistles–none of which I recommend you use. This is a case where the more you automate the process, the worse your results will be. There is no good substitute for the human eye and human instincts when it comes to restoring a document file. You’re better off in the long-run by doing a basic OCR conversion. That means, open the FreeOCR program, open a pdf file, then render it page by page (depending on the size of the file and the density of the type, to do a complete book the process will take between 20 minutes and an hour).

The original scanned page is on the left, the OCR conversion is on the right. You can see what a mess it is. That’s because the OCR is very efficient. It turns not only images of text into text, it turns water stains, wrinkles, shadows, and debris embedded in the paper into text, too. If there are notes in the margin, it will try to turn that into text. A basic scan also inserts a hard paragraph return at the end of every line, gets rid of paragraph indents and destroys special formatting such as bold and italics (the first time I saw this I totally freaked out). Some things convert more cleanly than others. If you’re converting a decades-old paperback where the pages have yellowed and degraded, the conversion will be a HUGE mess.

But not a hopeless mess.

CLEAN UP

FreeOCR gives you an option of saving your rendered document as a Word file. You can do that and clean up your file in Word. There is a much easier, faster and more efficient way. Use a text editor (with a little eventual help from Word). I use Notepad++, a program you can download for free. Save your OCR rendering into the clipboard (or do a right click, Select All/Copy) and paste it into the text editor.

Whether you use Word or a text editor, this is the time-consuming part of the process. And there’s no help for it. If you want a good-looking ebook, you need to make your converted file squeaky clean. (Your other option is hiring someone to do it for you. BUT–and this is a huge but–you have to make sure the service you hire is NOT automating the process, but that there is instead an actual human being going through the book word by word and restoring the text. Those automated programs are powerful and they do a good job on some projects, but I have ebooks I have purchased on my Kindle right now that are unreadable messes due to those programs.)

I have learned a few things to make the job go faster and more efficiently.

  1. Save restoring the paragraphs for last. Take a look at the image of the OCR conversion in Word. I toggled on the Show/Hide feature so you can see how every line has a paragraph return. What you see is the layout from the printed book. That can help during clean up.
  2. Work off the actual pages. Either have the actual book in front of you or split your computer screen and have the pdf file open to the scanned pages. That way if the OCR mangled the text, you can retype a word or line from the actual copy instead of trying to guess what it is supposed to say. You can also tag special formatting such as italics as you go along.
  3. Use Find/Replace.

The text will be full of oddball characters (I call them bug shit). Things like degree symbols, floating quote marks, greater and less than characters, slashes, tildes. If something doesn’t belong in your text file–Find/Replace All gets rid of it. You can also use it to get rid of headers, footers and page numbers. Once you have the text cleaned up, you can use Find/Replace All to get rid of extra paragraph returns, restore the proper paragraphs and un-hyphenate any words that had been split in the printed version. (BONUS TIP: Before you get rid of the extra paragraph returns use Find/Replace to add an extra space at the end of each line. That keeps words from being joined and makes it easier to find hyphens you want to get rid of)

So, yes, this is time-consuming, but it is not hard nor does it have to be expensive. It is definitely worthwhile to get your backlist back in circulation.

 

Oh My God, I’m a Nerd!

Long time readers of this blog have watched my progression from building ebooks in Word to the present day as I’m handcoding html. You’ve heard me whine about the quality of ebooks and the difficulty of producing a book that renders perfectly on every device, every time. I’ve used different programs, different methods–and for the longest time I utterly resisted learning html because 1) I knew nothing about it; and 2) I resented the idea that one had to be some kind of mad genius-computer nerd in order to make a decent ebook.

Well. I was wrong. It doesn’t matter that I knew nothing about html. If one is motivated, one can learn. Plus, one doesn’t need to be a mad genius (or even a slightly bent genius) in order to learn basic coding–which is really all one needs in order to make a beautiful ebook.

One does have to be, however, a bit of a nerd. I realized this the other day when I announced to the old man, “Ha! Regex isn’t so hard and toggling the extended command means I can wrap paragraphs and find extra spaces as easily in the editor as I can in Word. Bwahahaha!”

Do you understand what I just said? Don’t worry. The old man didn’t either. A month ago I wouldn’t have understood it. Suffice to say, I’m learning a whole new language and it is finally making sense to me even though that means my family now looks at me the same way the dog does when I’m talking to him (he’s waiting for me to say the magic words–”walkies” or “cookies”).

But, I haven’t done all this alone. Every time one of you guys, blog readers, makes a comment about a new way of doing something or talks about a new program, I have to check it out. And I learn things. When I’m working, I have a screen open to the W3Schools website so I can quickly get questions answered. I’m always bopping around the ‘net, seeing how others have solved problems and seeing if they’ve learned something new. I don’t always (okay fine, most of the time) understand what others are talking about. The real experts have been doing computer programming for decades and they speak “html” with casual fluency while I’m over here speaking very loudly and very slowly and adding vowels to the ends of every word in an effort to make myself understood (I said-o no comprehendo, capiche-o, amigo?).

Needless to say, when I do find a reference source that a) tells me what I need to know; b) shows me what I’m doing wrong and how to fix it; and c) is written in a way that I can actually understand, I glom onto it.

All that build-up and confession leads to sharing a new treasure: The eBook Design and Development Guide by Paul Salvette. Paul follows this blog and comments occasionally. He also has an ebook formatting service. He gave me a head’s up about the book. There were two major factors in my decision to buy it. First it was written in comprehensive English (most of these types of guides offend my writerly sensibilities) and second (this is really important!) it’s nicely formatted (it’s astonishing how many how-to-format-your-ebook guides are so wretchedly formatted as to be unreadable).

This is not a beginner’s guide. Two months ago I wouldn’t have understood much beyond “and” and “the.” With my usual la-di-dah methods of clicking madly until something works, I learned enough of the basics of html on my own to create some very nice ebooks. Armed with those basics, I’m able to understand quite a bit of what Paul is talking about. It helps that he truly cares about how ebooks look and that they work properly on ereading devices, no matter what those devices might be. It also helps that the book is readable, with an engaging style, and only occasionally lapsing into nerd-speak that leaves me smiling, nodding and waiting for him to say “walkies” and “cookies.”

I read it in one sitting, bookmarking countless passages and taking notes with my analogue word processor. I figured out some areas where I am working way too hard to accomplish simple tasks, and making some mistakes which I had to work even harder to overcome and compensate for. Of course I had to run to the computer and try some new things.  I formatted two ebooks using his guidelines and had so much fun, I reformatted another book that happened to be more complicated just to see if I could. I could. I did! I understand a bit more about how ebooks work and some of the differences between the different platforms and why versions of html coding work better on some platforms than with others.

The book is easy to navigate (a most useful table of contents written in plain English) and it includes templates for xhtml address thingies and resets and style sheets. Handy-dandy and easy to use.

Paul, being a generous fellow, generously (foolishly) opened himself up to answering whatever stupid questions I might throw his way. He might be sorry about the offer, but I won’t be. One book doesn’t make me an expert and it sure doesn’t catch me up on twenty years of experience, but it does go a long way toward helping me reach my goal of producing beautiful ebooks.

Highly recommended for nerds-in-training.

-SB-, -PB-, and Italics

So I’m chatting with a friend and she asks, “What you doing?” I sez, “Nuking tabs. Bwahaha! All gone.”

I was, of course, prepping a manuscript for ebook formatting. That means going through the manuscript and getting rid of everything that will screw up the ebook.

My learning processes are always convoluted, in the beginning overly complicated and then as I figure out what’s important and what is not, I streamline and pare down to the essentials. If you are a regular follower of this blog, you’ve seen my process regarding source files for writers. I’ve gone from suggesting writers set up and use style sheets (they should) to what I’m going to suggest today.

When creating a source file with the end goal of turning it into an ebook, all the writer needs to do, formatting-wise, are three things:

  • Indicate page breaks
  • Indicate scene or section breaks
  • Italics, bolding and underlining

When it comes to page breaks, “indicate” means exactly that. Don’t actually break pages either with inserted page breaks or multiple paragraph returns. Why? Because when you’re ready to format, you or the person you hire has to take them out.

The more formatting you put into your source file, the more formatting that has to be removed. The more that has to be removed, the greater the chances of something that might be missed (screwing up the ebook) and the more it costs in time and money.

When I get a manuscript to format, it’s generally been created with a word processor. Whether I’m going to format it for Smashwords (a Word file) or for everything else (html files), the very first thing I have to do is–

  • Remove extra spaces
  • Remove extra paragraph returns
  • Remove page and section breaks
  • Remove headers, footers and page numbers
  • Tag page breaks
  • Tag scene or section breaks
  • Tag special formatting

I don’t actually have to remove tabs because those are going to disappear when I transfer the text to a text editor, but it’s easy (one Find/Replace operation) and it gives me a clearer picture of the dangerous stuff.

Now, seriously, I’m a writer. I fully understand the NEED to make the manuscript look RIGHT. But writers, you have to understand that every effort you make to that end is going to have to be undone. Because of the nature of word processors, some of the fancy touches you include can actually corrupt the ebook.

The less you do–Honestly! Truly! I’m not lying about this!–the better the ebook will be.

So what’s a poor writer to do? Not much, actually. Use whatever font and font size you like. That’s not what will end up in the ebook, but use whatever is comfortable for you while composing. Line space however you like. It makes no difference in the end. Get out of the habit of using tabs. If you can’t stand not having indented paragraphs, set up a simple style sheet that indents the paragraphs with every hard paragraph return. Get out of the habit of two spaces between sentences. Get out of the habit of adding extra hard paragraph returns to space the text. Get out of the habit of making pages. There are no pages in ebooks.

How does one indicate a page break?

I use a code. -PB- It’s unique, the dashes keep it from melding with text, and thus it is easy to find. What my clean file looks like before I take it to the text editor is this:

Final line in chapter one.
-PB-
Chapter Two
First line in chapter two and so it goes.

Use whatever makes sense to you. If you want to make extra sure you or your formatter don’t miss it, spell it out. -PAGE BREAK- That’s it. That’s all you have to do. When you do the actual formatting, that’s when you center, bold, add graphics, extra spacing, etc.

Scene breaks are another place you should get in the habit of tagging–especially if your habit is to use extra paragraph returns to make a blank line. Those can be easy to miss. My little code is -SB-. The text looks like

Last line of scene or section.
-SB-
First line of new scene or section.

It doesn’t matter much what you use as long as you use something. Asterisks, a pound sign, plus signs, or spell it out -SCENE BREAK-. Use something so the scene break doesn’t get lost.

As for special formatting–italics, bolding and underlining–at some point, whether you do the job yourself or hire it out, you are going to have to tag the special formatting. I’ve gotten into the habit with my own writing to tag as I write rather than highlighting the text and italicizing (or whatever). It’s easier in the long run and I’m used to how it looks. Most writers are not going to want to do that. No biggie. If you are going to tag your special formatting, a few things I have learned–

  • If you’re going to format in html, you know to tag the special formatting with open/close codes.< i > and < /i >
  • If you are going to format your ebook in Word and are tagging for the purpose of stripping extra coding out of the document, do NOT use html tags. Using < i > TEXT or Wild Card < /i > in the Find box of a word processor can have… interesting results. Not the fun kind of interesting either.
  • For Word files I use -I- and -ENDI- to open and close italics. Easy to find and doesn’t give Find/Replace fits.
  • Make sure your special formatting is paragraph specific. In other words, don’t just highlight big blocks of text and toggle on italics. Highlight the necessary text within each paragraph, italicize it, then do the same in the next paragraph. Fewer chances for conversion programs to argue about what you really mean.

That’s pretty much it. To get the best results in your ebook, no matter who does the formatting, copy the following, print it out, and tape it to your computer monitor as a reminder:

This is a FILE not a document. Less is more. Less is good. The less you do now, the less you have to do later.

 

 

Straight Quotes Versus Curly Quotes in Ebooks

I have mentioned my dislike for straight quotes in ebooks in earlier posts. I think they look amateurish and unfinished. Some people don’t mind them. Many people probably don’t even notice them. So this is one of those areas where personal preference should be the deciding factor.

Except…

This is from the w3schools html character reference guide:

Reserved Characters in HTML

Some characters are reserved in HTML and XHTML. For example, you cannot use the greater than or less than signs within your text because the browser could mistake them for markup.

HTML and XHTML processors must support the five special characters listed in the table below:

Character Entity Number Entity Name Description
" &quot; quotation mark
' &apos; apostrophe
& & &amp; ampersand
< < &lt; less-than
> > &gt; greater-than

(And look! This browser screwed up the copy/paste (or fixed it) and the table above isn’t complete. Follow the link to see the entire table with entity numbers.)

Keywords here: “…the browser could mistake them for markup.”

That is not a good thing. Ereading devices are essentially browsers and giving them an opportunity to mistake something is a big fat goof waiting to happen.

If your preference is for straight quotes (double and single) then go with your preference, but remember to code them either with the entity number or the entity name.

If you do go with curly quotes (or smart quotes), you’ll run into the frustration of them being turned occasionally backward by the infinite wisdom of your word processor (which insists upon doing what it’s told rather than what you mean). It’s a pain when you’re being a good doobie, composing your masterpiece with the auto-correct functions turned off in order to create a clean source file, but when you run a Find/Replace to turn on the curly quotes, quotes after em dashes or ellipses that should be right double quotes are instead turned into left double quotes, to name just one example.

You can however find and correct your curly double and single quotes with Find/Replace. I only know how to do it manually in Word. In a text editor I can do Replace All. In either case, it shouldn’t take long and the key is the search terms you use.

In Word look for instances where you’ve used double quotes before or after an elllipis or em dash. (I’d type the actual characters, but wordpress wants to “correct” them for me)

  • dash dash double quote
  • double quote dash dash
  • double quote period period period
  • period period period double quote

Or, if you’ve already converted your dashes into proper em dashes, do your search for:

  • ^+ double quote
  • double quote ^+

You get the picture, right? If you find an instance where the quote mark is turned the wrong direction, you can fix it. If you’re doing this in a text editor, you can search for the actual characters and do a Replace All to make sure they are turned the right way.

I also like to check for gremlin induced wrong way quote marks. I know right double quotes don’t belong at the beginning of a paragraph or sentence, and left double quotes don’t belong at the ends. In a text editor I run these searches:

  • <p>right double quote
  • left double quote</p>
  • (space bar)right double quote
  • left double quote(space bar)

I can either do the corrections manually or run a Replace All.

Single quote marks and apostrophes tend to turn the way they should. But here is a fun little gremlin that I’d never noticed until it was pointed out to me (and now I can’t unsee it!). I don’t even know what you’d call these, but I’m going with truncated contractions—words with their front ends chopped off, usually in dialogue. Examples:

  • ‘em—as in “Smoke ‘em if you got ‘em.” (th)em
  • ’cause—as in “‘Cause I said so.” (be)cause
  • ’round—as in “Ain’t seen him ’round these parts.” (a)round

These words are contractions, which call for an apostrophe, and that means a RIGHT single quote mark. Fortunately, left single quotes aren’t used all that much in most American writing. You can search for specific usages (if you remember all the truncated contractions you used) or, if you’re in a text editor, you can search for left single quotes and change them to right single quotes as necessary.

Is this a pain. Yep. It’s worth it, though.