More About Ebook Formatting, Source Files and Tales of Tagging

First an apology for not answering every comment this week. On the “Source Files Update” post there were some great comments. People are coming up with solutions and solving problems. So go read the comments over there. One commenter in particular is hard at work on the subject of formatting ebooks from word processor files. I’ve been corresponding with William Ockham regarding his efforts to create a program that will make it easy to format a word processor file into a good-looking ebook. I’ve sent William some grotty files and he’s been problem solving. I’ve brought one of his comments over to this post so you can get a better idea of what he’s doing:

Wow, I’m flattered. I’ve been busy with my guest blogging stint over at and didn’t see all these comments. Since there is some interest here, I’ll share what I can of my plans. I firmly believe that writers should use whatever tool works for them. For most people, that’s Microsoft Word. Some folks are using Scrivener and almost everyone else is using some word processor (a flavor of OpenOffice or those WordPerfect holdouts).

The first thing I’m going to release is a free document to source file converter service (to use Jaye’s terms). You save your manuscript in RTF format (pretty much every program supports RTF) and upload it to my service. My program will go through and do all the stuff that Jaye talks about. It will strip all the formatting except bold, italics, and chapter headings. You get back a nice clean source file in RTF format. You load it up into your tool and save it back as a .doc file and you have a source file suitable as the input for ebook formatting. It’s not much, but it is a nice little timesaver and your ebook formatter will thank you (even if you DIY). Did I mention it would be free?

I really appreciate all the expressions of support. I hadn’t really given much thought to a Kickstarter, but I am thinking about it now. In the meantime, there is something you could do to help. I need test cases. That is, I need real manuscripts before they’ve been given the Jaye Manus treatment. If anyone has copies of their novels (or short story collections) that they wouldn’t sharing with me, I would really appreciate it. I promise not use them for anything other than perfecting my software. I will send you the cleaned up version and destroy or return the original when I’m done.

If you can help in this way, save your gnarliest files (smart quotes, em dashes, paragraphs indented with tabs and spaces, whatever) in RTF format and
email them
to razoroftruth at
gmail dot

Let me know what program (i.e Microsoft Word) and version (like 2000 or 2007) and whether you are using Windows, Mac, or Linux (or other Unix variant).

Which brings us to another problem I’m working on with source files–tagging. One of the things keeping me so busy this week is learning HTML. Turns out it’s kind of fun and quite the challenge. I also discovered that my resulting ebook files are much smaller–why? Who knows. But that’s a plus since I love using graphics for headers and such. Anyhow, the biggest challenge has been doing an ebook in screenplay format. It’s not difficult. It requires essentially three styles: Centered, Block Quote and Hanging Text. Since it ran about 120 pages in manuscript form, the real challenge was making sure every style was properly applied. I also wanted a way to NOT have to go in and tweak every line of text.

Now me, I happen to think FIND/REPLACE is the greatest invention since the light bulb. I’ve stated before that Word’s F/R is a powerhouse. Indeed. I also made some very interesting discoveries about Word and text editors and how they interact re formatting tags.

Le sigh…

Let’s talk about the two most common special formatting tags in the writing universe. Asterisks to indicate bolded text and underscores to indicate italics. Most editors and agents understand what those marks mean. Sending an e-query with those tags in place would be perfectly acceptable. Except… Even if you turn off the auto-formatting features, Word treats them like special characters and so does a text editor. Meaning, a text editor will strip them out. So those are out. You can use them if you like–they are easy to read–but if you ever have to copy the file into a text editor, you’ll lose the tags and your special formatting.

Anyhow, I’ve been using my own little special formatting tags–ii for italics, BB for bolding, and UU for underlining. Nobody but me sees them or has to read them, so no big deal. BUT, I am in the process of creating a cheat sheet for Source Files, and need to come up with tags that One) Make sense; Two) Are easy to remember and use; Three) Don’t activate “helpfulness” in word processors; Four) Work well in FIND/REPLACE operations. Number three is a bitch. I popped around in different programs to see how they handle various tags. Turns out non-letter characters are a problem when created in strings–Word, especially, kept getting wobbly and persnickety. Plus, some can cause problems in HTML coding because it uses so many characters for commands. For instance, I tried i/TEXT/i for italics. That seems fairly straightforward, right? It didn’t make Word go all wobbly either and it translated into a text editor. Problems arose when I did F/R operations in the text editor. I needed characters that are NOT used in coding. Which leaves out almost all of them.

Ah ha, most FIND operations can be made case sensitive. And there is one non-letter character that gave me no problems at all–the lowly dash/hyphen. So here are a few of the tags I ended up with:

  • -ITAL-   -NOITAL-
  • -CTR-     -NOCTR-
  • -BQ-        -NOBQ-
  • -NBSP-

Those might seem a little “wordy” but they are pretty self-explanatory (italics, centered text, block quote, no break space) and they don’t cause interpretation wars between programs. When I paste the Word file into the text editor, all I have to do is run FIND/REPLACE operations to insert the coding. (ex: -ITAL- becomes <i> and -NOITAL- becomes </i> to make italicized text) Most fiction doesn’t require every paragraph be tagged. So I won’t go in to the nifty little shortcuts I found.

The really important thing I’ve discovered is that not all tagging is equal and some of the old printer’s tags will not work because the programs want to do something with them and it’s not always what the writer intends.

So how about you, folks? What nifty tricks tricks have you come up for tagging the special formatting in your files?