Tuesday, January 30, 2024

Portable EPUBs

Will Crichton (via Hacker News):

Despite decades of advances in document rendering technology, most of the world’s documents are stuck in the 1990s due to the limitations of PDF. Yet, modern document formats like HTML have yet to provide a competitive alternative to PDF. This post explores what prevents HTML documents from being portable, and I propose a way forward based on the EPUB format.


The act of working with PDFs is relatively fluid. I can download a PDF, quickly open it in a PDF reading system like Preview, and keep or discard the PDF as needed. But EPUB reading systems feel comparatively clunky. Loading an EPUB into Apple Books or Calibre will import the EPUB into the application’s library, which both copies and potentially decompresses the file. Loading an EPUB on a Kindle requires waiting several minutes for the Send to Kindle service to complete.

Worse, EPUB reading systems often don’t give you appropriate control over rendering an EPUB. For example, to emulate the experience of reading a book, most reading systems will chunk an EPUB into pages. A reader cannot scroll the document but rather turn the page, meaning textually-adjacent content can be split up between pages. Whether a document is paginated or scrolled should be a reader’s choice, but 3/4 reading systems I tested would only permit pagination (Calibre being the exception).

Therefore I decided to build a lighter EPUB reading system, Bene. You’re using it right now. This document is an EPUB — you can download it by clicking the button in the top-right corner. The styling and icons are mostly borrowed from pdf.js. Bene is implemented in Tauri, so it can work as both a desktop app and a browser app.

On the other hand, it doesn’t feel like a normal Web page, rendering in a frame unless you view the main HTML file directly.


6 Comments RSS · Twitter · Mastodon

PDF is bad, because it's inaccessible. Attempts to make it accessible have all failed because they are all fundamentally reliant on the addition of semantics by the document producer, which can and often will be destroyed through various processes, and which, if not present, give no visual indication of their absence. So you'll get no grief from me for taking on the Frankenstein's Monster that is PDF. Certainly, it is time.

Nice demo when you referenced the main HTML file directly. All the readers added extra's have gone:

- no table of content
- no page numbers, scrolling through pages
- no linking to pagagraphs (§)
- clicking on definitions work, but not as tooltip, but it scrolls to the definition

I like the idea.

> The app is much quicker to open on my Macbook (<1sec) than other desktop apps.

Has there been some article or whatnot that explains why native Mac apps are so sluggish to open? I’ve noticed that a fast website (mjtsai.com qualifies) added through Safari’s “Add to Dock” is faster to open and display up-to-date content than NetNewsWire opening and showing cached content on my 2020 Intel MacBook Air, whether from a fresh boot or not.

MacOS 10.15 (booted from an external SSD) seems to be much faster.

@Alexandre Maybe something in the security layer?

HTML and PDF were both released in 1993. But HTML is just a variant of SGML which was released in 1986. So "HTML" is not a "modern document" format in comparison to PDF.

Thanks for the pointers Michael! Alas disabling SIP and Wi-Fi brings very little improvements.

Leave a Comment