Tuesday, July 17, 2018

Why Content Should Be Published in HTML and Not PDF

Neil Williams (via Alistair Duggin):

The default should be to create all content in HTML. If you can’t avoid publishing a PDF, ideally it should be in addition to an HTML version and the PDF must meet accessibility standards and archiving standards. We hope this post will help publishers explain the problems with PDFs to their colleagues and support moving towards an HTML-first culture.


PDFs may seem to be the fastest option because they can be easily created from popular applications that people are already using to author and share documents.

Converting content into HTML takes a bit of time. However, as explained earlier, creating a fully usable and accessible PDF from a source document requires specialist knowledge and can actually take longer than creating the content in HTML.

Unfortunately, there is no standard way to download an HTML document and save it in a self-contained format. Also, the tools for reading, searching, and marking up PDF documents are better.

4 Comments RSS · Twitter

> Also, the tools for reading, searching, and marking up PDF documents are better

That's a very important point for cognitive productivity. It's amazing that after all these years, major web browsers do not have annotation capabilities that match Skim ( http://skim-app.sourceforge.net). Skim itself lacks tagging, but is the only PDF reader, to my knowledge, upon which a tagging workflow can be emulated.

Ricky Morse

Although there is not a standard way to download web pages, if you wanted to, it is often possible to create a self-contained webpage by using data URIs for images. (This is a base64 representation of the data, encoded directly into the `img` tag. The only downside is that the .html file size can get large.)

Fred Dulles

Doesn't epub essentially do this? It's essentially HTML based if the Wikipedia page is to be believed.

@Fred Sort of. You can’t just download a Web page as EPUB.

Leave a Comment