Thursday, February 18, 2010

Page breaks in eBooks

EPUB – the format in which Atlantis saves eBooks – is a reflowable format. This means that all EPUB eBooks are actually formatted as a continuous flow of text without predefined page boundaries. This is so that EPUB files will naturally display properly on all devices regardless of screen size, - from the tiniest mobile phones to the biggest monitors of desktop computers.

But in practice most eReaders "paginate" the EPUB files before displaying them: they divide the book contents into separate "pages" according to their own native display size and settings. Consequently, eBook authors do not have much control over the way their books get paginated in a particular eBook reader. Before actually opening the eBook in an eReader, it is impossible to predict how many pages a particular eBook will have on that eReader and on which page a particular paragraph of the source document will appear.

This said, there are still a couple of things relating to eBook pagination that you should keep in mind when you design an eBook.

Normally eReaders paginate books with "optimal fill": they create a new "page" only when there is no more space for text on the previous page. Even then, you might sometimes notice an unintended break in the middle of a page. To explain this, we need to take a deeper look at the inner workings of EPUB files.

As a matter of fact, all eBooks in the EPUB format are actually ZIP files with .epub as a substitute file extension. If you change the extension of an .epub file to .zip, you can open it with any ZIP software or directly with Windows Explorer, if you have Windows XP or higher.

Now all EPUB files generated by Atlantis contain a "Meta-inf" and a "Ops" folder, plus a "mimetype" file:


The actual contents of the eBook – text and images – are stored under the "Ops" folder. You will always find at least one file with the .html extension within that folder, but sometimes there will be multiple .html files:


Each of these .html files represent a fragment of the eBook. If you extract them from the EPUB-ZIP file, you can view them in any Web browser. eReaders naturally display the contents of each .html file as a continuous flow of text. But at the same time, most eReaders will never put the contents of multiple .html files on the same "page". In other words, eReaders display these neighboring .html files as if there was a page break between them.

So the more HTML files are included in an EPUB file, the more unintended page breaks there will be in the eReader display. This is why Atlantis always tries to divide the document contents into as few .html files as possible when it generates an EPUB file. In many cases, Atlantis creates a single .html file for the entire book contents so that your EPUB file won’t get displayed with unintended page breaks. But there are a few cases when Atlantis has to save the eBook contents to multiple .html files:

1) Atlantis creates a separate .html file for any paragraph from the source document containing nothing but a "big" portrait-oriented picture (a picture whose width is at least 300 pixels and height at least 400 pixels, or whose height:width ratio is in the range 1.1 to 1.7). The first such picture of the source document is also treated by Atlantis as the book cover image. Let’s suppose for instance that one of the paragraphs in a source document contains nothing but a 300x400 or 400x500 pixels picture. Atlantis will save this paragraph (and of course the included picture) to the EPUB file as a separate .html file. This is done in order to allow proper display of big pictures on devices with small screens.

2) Some eReaders are unable to load HTML files larger than 300 KB. Atlantis takes this into account when the source document cannot be saved to an EPUB file as one single HTML file lighter than 300 KB. In such cases, Atlantis splits the book contents into several .html files, each smaller than 300 KB if necessary.

So now you know when and why Atlantis might save a source document to an EPUB file including multiple .html files. As a result, that EPUB file will display with corresponding page breaks in the current eReaders.

You might ask "what if I want to insert page breaks in my eBooks myself?”

It is done very simply. Just create manual page breaks in the source document, and Atlantis will automatically save matching page breaks to the EPUB file.

As you know, there are 2 ways to create manual page breaks in source documents in Atlantis. You can press the Ctrl+Enter hot key and insert a "page break" symbol in the document: any text that follows a "page break" symbol will automatically appear on the next page. But there is another and preferable way to create a manual page break in a document: it is by enabling the "Page break before" attribute in the format of a particular paragraph. When you want a paragraph to be always displayed at the top of a page, choose the "Format | Paragraph..." menu command, and check the "Page break before" box on the "Line & Page Breaks" tab:


Quite often this attribute is enabled by default for the "Heading 1" style (this can be done through the "Format | Style... > Heading 1 > Modify... > Paragraph..." dialog). In this way, all the book chapters preceded by a "Heading 1" paragraph (the chapter "heading") start on a new page.

10 comments:

jeff said...

the other thing to consider with paging that is often overlooked by epub creators (at least at this early stage of the game) is that most epub readers, whether devices or software, have "Next Section" and "Previous Section" buttons or keystrokes which allow the person to skip to the next (or previous) section. This is highly useful especially with long chapters or with slow e-ink devices.

The key to remember is where these e-readers get their queue for where the next section is. You might think they would get it from the h1 or h2 tags. But they normally don't. Instead, these e-readers get their queue for what the next section is, from one of two places:

1) the next section is assumed to be the next .html file contained in the .epub

since most authors tend to break up their manuscript into chapters and put 1 chapter into each html file, this means effectively that for the user to hit the "Next Section" button, they will go to the next chapter.

however, you could break up your document into smaller html files, at A-heads (chapter sub sections) or even B-heads (sub sub sections), and then the e-readers' "Next Section" buttons would take the person to the next section or sub-section, instead of to the whole next chapter.

2) the next section is assumed to be either the next .html file, OR a page-break marker -- whichever comes first.

what does a page-break look like, though, in the epub's internal html? it looks like either of these:

div style="page-break-after: always;"

or

div style="page-break-before: always;"

( i had to remove the tag characters for blogger)

So, theoretically, if you wanted a person to skip to the "Next Section" with their ereader's button, but you didn't want to break up your html files, you could use this marker.

However, there's currently a problem:

Method 1 appears to work in nearly 100% of ereaders.

Method 2 does not currently work in some ereaders (like Calibre and the firefox plug-in, EPUBReader), but does in others (like Adobe Digital Editions and Mobipocket reader). The e-readers that *don't* recognize the page break as a next section will simply pass you on to the next .html file (usually a chapter), skipping over all the rest of the sections in your current .html file.

I'm not sure actually how this is "officially" supposed to be, and how it will play out -- whether future epub viewers will eventually all recognize the page break div/style. (Maybe if you're reading this years later, Method 2 is now widely recognized, and you can use it easily). Using method 2 would certainly be nice, because then we wouldn't have to use method 1 (breaking up the html files) just for the sake of the "Next Section" button.

However, what i do know is that, whether for better or for worse, as of the date of this writing (June 10, 2010), Method 2 does not work in as many places as Method 1. Yet.

Anyway, just thoughts to keep in mind, because paginating has more consequences than just creating an unintended blank area at the bottom of the viewport. It is also used for skipping around by the reader -- a convenience your reader will appreciate.

Atlantis Word Processor Team said...

Jeff, thanks for your comments.

Right, the "page-break-before: always" attribute does not work in all EPUB readers. That is why the next release of Atlantis Word Processor will not use this attribute in EPUBs anymore. You can find details in another post to our blog.

Anonymous said...

In order to prevent unintended page breaks, is it possible to "force" Atlantis to make epub contents (aside from cover) in a single html file regardless of their extension?

Atlantis Word Processor Team said...

Atlantis creates "unintended page breaks" in EPUBs only when the contents of the source document is "too big" to be put into one HTML file. As you might know, some eReaders cannot display EPUBs containing HTMLs larger than 300 KB.

Anonymous said...

Yes, I know. But sometimes you make purposely level-2 or level-3 headers in order to be displayed in the same page as the previous paragraphs, and not in a new page. Yet Atlantis makes new html files for each of them and so they are displayed in new pages. Is there any way to fix that?

Atlantis Word Processor Team said...

Atlantis makes a new HTML file when it encounters a manual page break in the source document. It also makes a new HTML file for a cover image. So please make sure that your level-2 and level-3 headings do not have manual page breaks before.

"Unintended" page breaks are created in eBooks by Atlantis only when your book contents is too long to be put into one HTML file.

Please also make sure that you are using the latest version of Atlantis. Please install the latest beta version of Atlantis. It will be soon released as a public version.

But if your source document is not "too long" to be put into one HTML, and there are no manual page breaks, and you still believe that Atlantis creates unintended page breaks in your eBook, please email your source document to support@AtlantisWordProcessor.com.

Anonymous said...

Dear All,

I'm a newbie with Atlantis and I have the following problem:

When I save a word document (which contains several manual page breaks) as an epub file the page breaks are not included in the ebook file. I checked the html file with TWEAK EPUB and there is nothing like "page-break before" or similar in it. Since the ebook is a haiku-book the page breaks are quite important.

My other question is: is it possible to create an ebook which is a 100% copy of a printed book visually and can it be done in a way that ebook will be displayed the same way on different tools (ebook reader, laptop etc.)

Thank you for your help in advance.

Daniel
Hungary

Atlantis Word Processor Team said...

The latest beta version of Atlantis (1.6.5.2) does not use the "page-break-before: always" CSS attribute in EPUBs anymore. Please click here for details.

Please upgrade your Atlantis Word Processor to the latest beta version 1.6.5.2. It will be soon released as a public version. Just go to the Betatesting page of our site, download the setup file of the latest beta version of Atlantis, run it, and follow the onscreen instructions to upgrade.

If you have a document with manual page breaks which the latest version of Atlantis does not convert to EPUB correctly, please email it to support@AtlantisWordProcessor.com.

>My other question is: is it possible to create an ebook which is a 100% copy of a printed book visually and can it be done in a way that ebook will be displayed the same way on different tools (ebook reader, laptop etc.)

I am sorry, but the eBooks in the EPUB format are "reflowable". They are meant to display differently by different eReaders with different screen sizes. If you need a "100% copy of a printed book", you should use the PDF format instead.

Anonymous said...

I have a question to ask: is it possible to have in "save as... ePub" options the choice of dimension for each html file? I mean it will be convenient for my Opus to read html 100KB in size to increase the speed of loading. Actually the default dimention is 300KB, excluded the trick of page breaks.
Thanks in advance for any reply

Atlantis Word Processor Team said...

Sorry, but we could not add such device-specific and "too technical" settings. It is preferable to divide longer documents into chapters by inserting chapter headings formatted with the "Heading" styles. In this way you would have much smaller XHTMLs within EPUBs.