Word to Calibre to MOBI: Part 2: The html File

You finished Part 1 of this tutorial. Now on to Part 2. If you’re not familiar with html, what happens next is going to be freaky. But trust me, if you can copy/paste, you can do this.

NOTE: If your ebook is as simple as the one I’m using as an example, with no images and limited styles, you can stop right now and directly upload your Word file to Amazon. It will convert just fine and work well.

STEP 1: Do a Save As of your styled .doc file as an html file. It will look something like this:

CAL5Now you are done with Word.

STEP 2: Open your html file in Notepad++

Holy Moley! This is what it looks like?!?

CAL6CAL7STEP 3: Turn your special formatting tags into proper html tags

  • Italics <i> </i>
  • Bold <b> </b>
  • Underline <u> </u>

Easy to do with Find/Replace in Notepad++.

CAL8

Very important. ALL tags that are open must be closed. So if you have <i> for italics, then you must have </i> to close the tag. So use Find/Replace and make sure your numbers match up (Notepad++ will tell you how many items it replaced)

STEP 4 (Optional): Get rid of soft returns. Word has a nasty habit of inserting soft returns at the end of lines in paragraphs. In theory, they are meaningless. If you leave them in, they won’t affect your ebook very much. I have noticed, however, that they cause a wobbly quality to the justified text and some unusual behavior in line spacing. Not enough to affect reading quality, but enough to bug hyper-sensitive readers (like me). I prefer to remove them. If they bug you, too, let me know and I’ll show you how to use Find/Replace in Notepad++  to quickly remove them.

CAL9STEP 5: Get rid of the Section junk. If you styled your document the same way I did, you will have two lines of code–one at the beginning that says something like <div class=Section1> and a closing tag at the end of the document, </div>. They are extraneous. Delete them.

CAL11CAL10(by the way, if your Notepad++ file doesn’t look the same as mine, it’s because I have turned off word wrap and eliminated the extra soft returns)

STEP 6: Extract your styles. In my example there are three: MsoNormal, Center, and h1. Select them, copy them and paste them into a new text file.

This is what they look like. Comments in italics are mine.

h1
{mso-style-next:Normal; (Word junk, delete)
margin-top:48.0pt; (We are going to change this)
margin-right:0in;
margin-bottom:48.0pt;
margin-left:0in;
text-align:center;
page-break-before:always;
mso-pagination:none; (Word junk, delete)
mso-outline-level:1; (Word junk, Delete)
font-size:14.0pt; (We are going to change this)
mso-bidi-font-size:16.0pt; (Word junk, delete)
font-family:”Times New Roman”; (Delete)
mso-bidi-font-family:Arial; (Delete)
mso-font-kerning:0pt;} (Delete)

p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-parent:””; (Junk, Delete)
margin:0in; (Delete)
margin-bottom:.0001pt; (Delete)
text-indent:.3in; (Change)
mso-pagination:none; (Delete)
font-size:12.0pt; (Delete)
font-family:”Times New Roman”; (Delete)
mso-fareast-font-family:”Times New Roman”;} (Delete)

p.Center, li.Center, div.Center
{mso-style-name:Center; (Delete)
margin-top:6.0pt; (Change)
margin-right:0in;
margin-bottom:6.0pt;
margin-left:0in;
text-align:center;
mso-pagination:none; (Delete)
font-size:12.0pt; (Delete)
font-family:”Times New Roman”; (Delete)
mso-fareast-font-family:”Times New Roman”;} (Delete)

STEP 7: Modify the styles. The coding in an ebook is actually quite simple. The major bits for your css stylesheet are as follows and most are self-explanatory:

  • margin /This is the margin for each paragraph block. This controls the top, bottom, right and left
  • text-indent /This is for paragraph indents
  • font-size /Kindle books render in either “ems” or percentages. Converters do their best to recognize points (pts) and inches, but results are iffy. That is why we’re going to change them.
  • font-style /For italics
  • font-weight /For bold

We are going to keep this very, very simple. Because there will be some coding for the body text, you don’t need much in these paragraph styles. Basically, we will whittle and adjust so they look like this (feel free to copy/paste these):

p.MsoNormal
{text-indent: 1.4em;}

h1
{margin: 2em 0;
text-indent: 0;
text-align:center;
page-break-before:always;
font-size: 1.4em;
font-weight: bold;}

p.Center
{margin: 0.5em 0;
text-indent: 0;
text-align:center;}

If you want to play with the styling, go to the w3schools website. To know what works in a Kindle book, you can look at their “approved” list (which often seems to change on a whim).

STEP 8: Replace the header. Copy the text that follows:

<?xml version=”1.0″ encoding=”UTF-8″ ?>
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.1//EN” “http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd&#8221; >
<html xmlns=”http://www.w3.org/1999/xhtml&#8221; xml:lang=”en” >
<head>
<meta http-equiv=”Content-Type” content=”application/xhtml+xml; charset=utf-8″ />
<title>BOOK TITLE</title>
<style>
/*===Reset===*/
html, body, div, applet, object, iframe, h1, h2, h3, h4, h5, h6, p, blockquote, pre, acronym, address, code, del, dfn, img, ins, kbd, s, samp, small, strike, strong, sub, sup, tt, var, center, fieldset, form, label, legend, table, caption, tbody, tfoot, thead, article, aside, canvas, details, embed, figure, figcaption, footer, header, hgroup, menu, nav, output, ruby, section, summary, time, mark, audio, video
{margin: 0; padding: 0; border: 0; font-size: 100%; vertical-align: baseline;}
body {text-align: justify; line-height: 120%;}

<!– Insert your paragraph styles here –>
</style>
</head>
<body>

Paste it in your file as follows:

CAL12New header and styles pasted in:

CAL14Step 9: In the menu bar in Notepad++ find Encoding and click it. In the drop down menu it will say: Convert to UTF-8 without BOM. Click that.

To see your styling live, in the menu bar you will see “Run.” Click it and in the drop down menu choose “Launch in (whatever browser you use)” Here is mine in Firefox:

CAL15See, that wasn’t so hard was it? Now you have a serviceable html file you can convert into an ebook. BUT, your job isn’t done quite yet. In Part 3 I’ll show you how to convert your file into a MOBI file that works.

_________________________________________

Styling ebooks isn’t difficult. Armed with only a few lines of code, you can create beautiful ebooks and some very interesting text effects. If anyone is having trouble getting their styles just right, feel free to email me–jayewmanus at gmail dot com–and I can probably come up with just the paragraph style you need.

 

 

Advertisements

5 thoughts on “Word to Calibre to MOBI: Part 2: The html File

  1. Jaye: Great stuff, as always. But, I must point out the document length limitation. I don’t know if MOBI files have this, but I must assume it does. Many people, producing short e-books, won’t have a problem. But longer works really should be comprised of multiple .html files. Note that this is something handled nicely in Sigil, which only creates ePub files. The ePub file is easily converted to MOBI, though. Still, proving that Word CAN be used to create a functional MOBI file is a great achievement.

    • Hidey ho, Jon. Yes, indeedy, you are correct. But read on to part 3. Calibre has recently done something quite amazing. I think you will be impressed. 🙂

      • Hi Jaye: You’re right — I am impressed. What does Calibre do? Break the big document into little bits based on the h1 tag? Pretty slick… if one uses h1 tags, that is. 😉 And are you selling this collection of short stories? The titles… intrigue me. Merry Christmas!!!

  2. That’s exactly what it does, Jon. And it creates the stylesheet based on the style declarations. (Amazon conversion does the same thing with h1 and styles, and I suspect they are pretty good about translating Word junk into ebook code–but I do not recommend direct upload of any Word file with images in it to Amazon, only straight text.) It took me a while to figure out that the Calibre line squish problem comes from the page_styles.css declaration. I have no idea why, but that’s what is happening. I wish the Kindle Previewer was as friendly as Calibre, but it won’t accept a simple html file and there’s no place to upload a cover and it loses images. One needs the total package (as in an EPUB file) to effectively use the Previewer.

    Somebody else is going to have to fine tune this operation. 😀 I’m done, the obsession is over, and come January 1 I have about a zillion projects on my plate.

    And no, the stories aren’t for sale. Sorry. 😉

  3. Pingback: Calibre and Kindle, Not a Good Match | J W Manus

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s