Saturday, November 15, 2008

The Format Wars

We're having the Miami Book Fair this weekend (report next time), so here's something that affects the future of books, and why you should care how your documents are stored ...

A struggle for world domination has been going on in recent years that will affect your ability to read and write digital documents, whether they are your own or the property of others. 

The most prominent example is the battle between Microsoft's ubiquitous Office formats (consider the chutzpah it takes to trademark the word "office") and the alternative Open Document standard as proposed by a consortium of other companies, individuals, and organizations, and certified by the ISO. There is an old gag among computer hackers that says, "a standard is a good thing, so it's a good thing there are so many of them." The joke of course is that with enough different standards there is no standard at all.

That particular struggle is still unresolved at this point. Microsoft has managed to have their own proposal for an "open" format adopted as a second standard by the ISO, but neglected to fully support it even in their own software, leaving us with a "standard" that no one can use except theoretically. (Joseph Heller must be rolling over in his grave at this example of a Catch 22.)

Meanwhile the opposition -- represented by OpenOffice.org, Sun's StarOffice, and IBM's Lotus Symphony (all free and compatible) -- forges ahead by supporting both Open Documents and the proprietary Microsoft Office formats that people actually use, thus building a bridge across the troubled waters for anyone who wants to take it. In case you can't tell, I count myself in that category.

A footnote to this is that OpenOffice.org and its brethren also support saving documents as Adobe Acrobat PDF (Portable Document Format). If you wonder why Microsoft until recently required third party software to support this widely used feature, you only have to look as far as Microsoft's own answer to PDF -- the XML Paper Specification, or XPS, which they would prefer people to use instead of Acrobat, which is owned by a competitor. However at the moment XPS requires Microsoft Internet Explorer and Microsoft Windows, which kind of defeats the goal of making an electronic document universally readable.

By now you can see where this is going. Governments and big corporations are not the only ones who should be concerned about how their digital documents are archived. The issue of whether and for how long those documents will be readable is something we all have an interest in, whether as small businesses, students, writers, citizens, or anyone who has anything to save on a computer -- which pretty soon will include every last one of us. If your data format dies because your chosen software is driven out of existence by market forces, it is definitely a problem.

We know the nightmare can happen, because it has happened already. Long ago many law offices standardized on WordPerfect because at the time it was the only product that gave them the features they needed for legal documents. But now WordPerfect is struggling to survive in the face of the Microsoft juggernaut on one side and free competitors on the other. If and when it dies, an unknown number of critical legal documents may become unreadable, or garbled by imperfect (no pun intended) conversion programs.

As a writer I have had to migrate my work from the old DOS-based PC-Write to Lotus Ami-Pro, then through two semi-compatible versions of WordPro, and finally two versions of OpenOffice. (The first one was before the new standard.) And this is within a period of 25 years -- an average of just five years per format.

Even those who have exclusively used Microsoft's Word (there's that chutzpah again) will find that they can no longer read their oldest documents unless they have converted them as they went along and new versions came out. Coming up with new formats has proved to be an excellent marketing tool, since it forces all users to buy new software so they can read documents created by others who have bought it already. But clearly what works for the vendor does not work for the consumer.

And Now eBooks

Meanwhile a new front has opened in the arena of electronic publishing. Due to the understandable desire of publishers and authors to protect their work from piracy, an assortment of new standards for Digital Rights Management (or DRM) have sprung up, along with a separate assortment of file formats that work with one or another version of DRM. Unfortunately the result is that hacker joke all over again. (Here's a comparison between the various formats so you can see how bewildering the assortment is.)

Just as with music, when you acquire an ebook you have to choose one of the available formats, and this choice will determine what computers and other devices you will be able to read it with. After that there are many ways to lose access to what you thought you owned. Your format of choice may become obsolete so that it is no longer supported by newer devices. Your computer may die or be stolen, forcing you to plead for a replacement license for a new machine. Or the format that you like on your computer may not work with your phone or PDA or the new reader from Amazon or [insert vendor of your choice].

About the best you can do is to avail yourself of DRM-free documents in a form that is as widely supported as possible. There are many sources for works that are either in the public domain or which have been made available by their authors as freely distributable while remaining under copyright.

In the latter case you are free to read and share them, but not to sell them without arrangement with the author. Cory Doctorow and Bruce Sterling are notable examples of writers who have chosen to "give away" at least some of their work in this way, while continuing to sell traditional books in print form. It is arguably a way to insure wide readership which may in the long run enhance sales.

Free At Last

Alas, even with freely distributable texts there are problems. Since I acquired my Sony Reader last Christmas I've become all too familiar with them.

Project Gutenberg, the first attempt at creating a free library of public domain books, made an early choice to use "plain text" as their format in order to make their books as widely accessible as possible. (Now they are also using HTML and PDF for some books, and even audio -- a whole other can of worms.) While this avoids the pitfalls of those word processors of yore that are no longer with us, it leaves their online books as "plain vanilla" with no ability to use bold or italics or different fonts, and no diagrams or illustrations.

Worse still, their texts have embedded line breaks that make them difficult to adjust for different display sizes. Removing those line breaks can be difficult or impossible depending on what software you use and what your level of expertise is -- and yes, it is maddening that something so simple should be so hard.

In practice this means that when I read a Gutenberg document on my
Sony with its 5x7 page, there
is no combination of font size or orientation that matches where the line
breaks are. It doesn't mean I can't
read it anyway, but it's esthetically unpleasant. (Like this paragraph.)

PDF files can suffer similar problems, because they were not originally meant to have text that would reflow if the page size was changed.

One example of a new solution is the Epub format, created by the International Digital Publishing Forum and supported by Sony and an assortment of other vendors and online distributors like Feedbooks. It is able to handle fonts, page reflow, and even graphics, which makes epub documents a pleasure to read. However, at this point it is yet another new standard -- and the same old joke applies. Come to think of it, they work fine on my Sony Reader, but I didn't have any software that would recognize them on my PC until I downloaded and installed Adobe's Digital Editions.

Hey, wait a minute -- can you still read this? If so, it's because of the universal use of HTML (hypertext markup language) for text on web pages, which is a perfect demonstration of what a true standard can offer us. Let's hope we can someday agree on another one for books and plain old documents.

No comments:

Post a Comment