As I write this I have around a two million words worth of back list books sitting on my desk, awaiting conversion from print into ebooks. In the past week alone I have scanned, converted and restored over 400K words to the stage where I can send the doc files to the writer for proofreading.
Tedious. Yes. Daunting, perhaps. Expensive, sometimes. Impossible and difficult, no way. Writers with back list, please, if you have gotten the rights back to your work, don’t let either expense or the thought of so much work stop you from bringing your back list back to life and reissuing it as either ebooks or print-on-demand or both.
Summertime is a fabulous time for restoring back list. Especially for the do-it-yourselfer, since you can take your laptop out on the deck and do the tedious work while working on your tan. (I like to queue up oddball indie films on Netflix and semi-watch and semi-listen to them while I’m working.) Over the next few blog posts, I’ll take you step-by-step through the process.
Understand, this process ranges from very expensive (having someone else do ALL the work for you) to no-cash-outlay at all (takes time). One way I save writers money–and time–is by doing the scanning, conversion and gross restoration (which I can do in hours) then sending them a Word doc in manuscript format so they can do the fine tuning and proofreading. It’s still tedious, but it’s not rip-your-hair-out frustrating.
A word of caution: There are some services that promise to scan, convert and turn your print book into an ebook, all for one very low price. This is the process used by many of the big publishing houses and this is why so many of their (your!) ebooks are broken, ugly, and riddled with formatting errors and typos. Research those services extensively. If there is any hint that they convert pdf files into ebooks, walk away. Run away! There is the right way to do this and there is the super-speed, el-cheapo, don’t give a shit about the quality of product way–and nothing in between.
This is the process for the RIGHT way:
- Scan the book into a pdf file
- Convert the pdf using OCR into a document file
- Gross restoration: remove headers, footers, page numbers, and bugshit produced when conversion “reads” speckles, debris, foxing, watermarks or penciled notations as characters; restore paragraphs; restore special formatting such as italics or bolded text; remove all formatting artifacts embedded by the pdf AND the word processor.
- Fine tune and proofread.
- Format the fully restored text for either digital or print-on-demand.
- Proofread the ebook and/or print-on-demand.
Skip any of the above steps and you’ll end up with a substandard product that is disrespectful to your written work AND to your readers. There is no way to skip any of those steps and turn out a great product. I can, however, share quite a few tricks and tips that will make the process easier for you.
STEP 1: SCAN AND CONVERT
Two ways to do this.
SOMEONE ELSE: If you do a Google search for “book scanning services” you will turn up hundreds of companies that will scan and convert your printed book into a workable document file. Or, you can run down to your local office supply store (Kinko’s or Staples) and they will do the job while you wait and give you a CD or thumbdrive containing your file to take home. Prices are all over the board. I recommend you budget $100. Chances are, the job can be done far more cheaply than that, and you can use your change to have a really nice lunch while you’re waiting for your book to be scanned.
DO-IT-YOURSELF: It is possible you have everything you need already to scan and convert your books.
- X-acto knife or paper cutter
- External storage device or cloud service
- Conversion program
“X-acto knife? Paper cutter? Jaye, what are you talking about?”
To easily scan your books, you will need to take them apart. The easiest way to do this is to run down the office supply store and have them chop off the spines. They’ll charge you a couple of bucks and it only takes minutes. One BIG caution here. If your mass market paperback is decades old (or sometimes, only a few years old, depending on how cheap-o the original publisher was) the paper could be badly degraded to the point where any rough handling can tear it, crinkle or shred pages, or even break off chunks. The best way to cut off their spines is by hand–gently. I use a metal ruler and an X-acto knife (I buy blades in bulk, so I always have fresh blades). If you want to do this at home, a good paper cutter (available at any hobby and craft store) will do the job nicely. (This is also a good job for a bored kid–“Mom, I have noooothing to do!” “Here, darling, chop the spine off this book.”)
It takes me about ten minutes to despine a fragile old paperback by hand. Not a big deal.
What if it’s a rare hardcover and you don’t want it chopped and destroyed? That is going to cost you–even if you do it yourself. You will have to copy each page (one page to a sheet, please–doing it two-up will turn into a restoration nightmare), then scan the copies. Nice thing about this is, though, if you use a heavy weight bond copy paper (at least 20#) you can run the sheets through a high speed scanner and it’ll take minutes instead of hours.
IMPORTANT TIP: If you’re chopping the book apart yourself, make sure you remove ALL the binding glue. It can jam your scanner or copier, or even melt into the works.
What if you don’t have a scanner? Double check because you just might. Most printers sold these days are multi-purpose: print, copy, scan, fax. If you don’t have a scanner, it might be cost effective to invest in one. For less than $200 bucks you can get a really good multi-purpose printer. (My home multi-purpose printer was on sale for under $150 and it will do double-sided scans in bulk at a pretty good clip–ain’t technology grand?)
You want to output your scans as pdf files. And those are huge. Hence, you’ll want either an external storage device (such as a flashdrive or an external hard drive) or a cloud service (such as Dropbox). It will make handling the files ever so much easier and keep your computer from having hissy fits and being draggy.
QUICK TIP: Rubber bands. Keep a good supply on hand. Cats, kids, open windows, fans, a careless hand wave, and there goes all those pages you cut apart. Old paperback pages are so flimsy they’ll glide under furniture. Keep your work banded and save yourself some headaches.
IMPORTANT TIP: Always do a test run with the front or back matter before you run pages through a sheet feeder or a high-speed scanner. Fragile, flimsy, brittle paper can be eaten by the machine. Pages can twist and turn and wrinkle from the heat. Some books must be hand scanned on the bed, one sheet at a time.
Some useful things to know about scanning:
- If your scanner allows it, scan in black and white. Your output files will be smaller and more readable.
- Experiment with the resolution and go with the lowest resolution that gives you a workable scan. The higher the resolution, the bigger your files will be AND the greater the amount of speckling and debris the scan will pick up. The only time you need to scan at a high resolution is if your book has illustrations or photographs. In that case, you might want to do one run at a lower setting for the text, then do a high resolution scan of your images.
- If the pages are so flimsy there is significant bleed-thru from the opposing pages, you will need to scan them via the bed (rather than the sheet feeder). Use a sheet of black card stock as a backer and that will reduce or eliminate the bleed-thru.
The very best program I have found is Adobe Acrobat XI. Not only will it compile all your files (if you have to hand scan the pages, you could end up with hundreds of individual files), but it will quickly and (fairly) cleanly convert the pdf into a workable Word document. It’s a bit pricy and not a program for a person doing one or two jobs. If you have an extensive back list and intend to do the restoration yourself, then it is worth the investment because it will save you tons of time. Some people use it for creating print-on-demand books, too.
There are also hundreds of programs (many as free downloads) and online services (also, many that are free) that will convert your pdf/s into a workable document. Do a Google search for “pdf conversion” and you’ll have a wide variety to choose from.
IMPORTANT TIP: Results will vary. Before you download any program or pay for a subscription or use an online service, test a few pages and see how they look. NO OCR conversion will produce perfect results, but some conversions are much, MUCH better than others and therefore much easier for you to restore the text back to its original glory. It’s worth an hour or so of your time to find the best one for you.
There you go. Your book is scanned and converted and ready for restoration. You all are lucky in that I’ve learned a lot from doing a lot and I’ll save you a LOT of fumbling around with my many tips and tricks. Watch this space for the next post: STEP 2: Gross restoration.
Nice. Like the step by step because if I ever manage to do anything like this I’ll need the detailed instructions.
I do like being useful, Juia. 😉 It makes up for the times when I’m just cranky.
You are a marvel of clarity and generosity, Jaye
Aw, thank you, Jerry.
“There is the right way to do this and there is the super-speed, el-cheapo, don’t give a shit about the quality of product way–and nothing in between.”
…This, right here? This is why I admire you. You take the time to research the right way to do something and then never settle for anything less.
Thank you for the kind words, Margaret. 😀
Pingback: Restore Your Back List Books: Step 1: Scan and Convert | The Passive Voice | A Lawyer's Thoughts on Authors, Self-Publishing and Traditional Publishing
The OCR software crown is usually awarded to the solution I used quite a bit, years ago: ABBYY FineReader. Within the software you can do quite a lot of preprocessing. For example, you can cut out the headers before you tell it to read the text. You can scan and read page by page, correcting each page as you go (this improves the software’s recognition of unfamiliar words, such as odd names for characters and places). When you proof, stay in FineReader on a large monitor, so you see the image scanned right next to the text as it was interpreted. Replace spurious paragraph ends with the proper line breaks, fix problems with italics, and so on.
FineReader also did a much better job reading the image when it was grayscale rather than black and white; I never cut down my books, and the less-bright areas in the gutters was easier to read in shades of gray, so they did not simply fall into black.
Good info, thanks for sharing. Especially about cutting the books. It doesn’t bother me a bit to chop up paperbacks, but I always feel pain when doing it to a hardcover. For those I can’t cut (for whatever reason) yours sounds like a better solution than copying the pages in order to scan the pages.
I often wonder does scanning really save time over retyping the text?
We used to debate this issue when it came to converting documents/manuals/help as a tech writer. I used to retype them from scratch while others went through conversion pretzels. Generally it took me less time with less errors and less frustration than those using the conversion methods. Granted I touch typed 120+ wpm at the time.
I don’t know about anyone else, but I don’t have the hands of steel I used to. 😀 With the equipment I have now, it’s much cheaper and faster to scan and convert than it would be for even a super-Typist (with hands of steel). For instance, with the project I am working on today, it would take a 120wpm typist nearly 34 hours to retype it. If paying someone, figure around $15 an hour, so that would be around $500…
The machines will enslave soon enough. We may as well get as much work out of them as we can until then.
Suggest also eliminating as much front matter as can be cut. Nothing annoys the reader more when they download a sample to the ereader to find that the ‘sample’ consists almost entirely of front matter, leaving very little idea of what the book itself is like. Move this stuff to the back.
A couple of pro tips (use at your own risk). After you have deconstructed the book, sometimes it is better to trim the pages to remove page numbers, headers and footers, etc. It takes a steady hand and good paper cutter. Also, clean the glass of your scanner before you start. For big projects, using a high-speed scanner really helps. Some high-end units have excellent OCR built in. Ask around. If your spousal unit works for Big Corp (or even a small professional group), they probably have a decent scanner. As long as you are scanning to disk, they won’t mind. Or ask your local church, synagogue, mosque, temple, or Cthulhu cultists.
Thanks for weighing in, William. High speed scanners are marvels.
Your suggestion about trimming headers and footers reminds me of something I forgot to add. On many scanners output size can be set. For instance, a standard mass market paperback is around 4″ by 7″. If you set the scan size to match the page dimensions, it eliminates edge shadows that OCR conversion can render as characters.
This HuffPo article about errors in ebooks is almost 3years old, but it’s still relevant.
That’s as horrifying as turning unrestored OCR conversions into ebooks.
I remember one time I got a UK version of one of my print books and as I was flipping through it, I discovered a series of 4-6 pages that had been half-printed. Huge chunks of text had just disappeared. I cried, half in anger, half in sorrow. I’ve glanced inside the books my old publisher has issued as ebooks–the sheer slovenliness of them embarrasses me. So much for all that fabulous professionalism and nurturing from the big pubs.
Pingback: Restore Your Back List Books: Step 2: Part 1: The BIG Clean | J W Manus
If you want to get your feet wet with OCR for free and have Office 2007, you can try this.
Scan a page of text and save as a GIF or JPG. Open Office OneNote. Drag the file onto the page.
With the cursor over the image (and the cursor looks like a compass direction on a map), right-click and choose “Copy Text from Picture.”
When done, open Word and Paste the text.
To clean up the formatting, select all (Ctrl+A) and then Ctrl+Spacebar. Choose a Style if you wish.
If the text is clean and sharp, you should get nearly 100% reproduction. It will certainly take less time to fix the obvious errors. Note that some will evade spell-check, such as “hut” when it should be “but.”
Thanks, Bill. Handy dandy for a short piece such as a short story or magazine article. It sounds like something worth trying.
Pingback: Restore Your Back List Books: Step 2: Part 2: Create a Workable Document | J W Manus