Preview in Big Sur Destroying PDFs Again

Manuel Grabowski (Hacker News):

In the lower half is the result after modifying (removed a blank page) and saving that same PDF in Preview.

Hard to believe, but that’s not the first time Apple messed this up. Sure, even Apple can’t account for all use cases when changing complex stuff like internal PDF handling. But:

  • The iX500 is an insanely popular and common scanner
  • I don’t know any OCR software that is more popular than ABBYY FineReader
  • macOS used to be the absolute best in class OS for dealing with PDFs by a long shot

As with the macOS 10.12 bug—was it not added to a regression test suite?—this doesn’t affect all PDFs with text layers, apparently just those created by ABBY FineReader.

As Grabowski says, PDF support on macOS used to be great, but I don’t think it’s yet recovered from the rewrite five or so years ago. I’m still seeing slow progressive rendering, intermittent glitches where pages go blank, and buggy scrolling. Big Sur did fix a Catalina regression that broke clickable links by truncating the URL.


Update (2021-01-04): Jonas M. Ribe:

I wouldn’t put this on Big Sur. The bug has existed in some form since the PDFKit rewrite (Sierra). Fewer people run into it after Preview started doing incremental saves for some PDF operations (High Sierra?). Deleting a page in Preview still does a full save and can break text.

Meanwhile there has been no progress in Windows land. I guess that explain the lack of care/interest

I dimly recall a Steve Jobs keynote in the early days of OS X (maybe the first one?) where he made a big deal about OS X being the best platform for working with PDFs and having the best native support for them.

That was a long time ago.

I can't confirm a general problem of Big Sur's with OCR Layers in PDF documents. My test setting: PDF document scanned with a HP Envy 5010, OCR done with Creaceed's Prizmo app, the result saved in Prizmo as a PDF document with an OCR layer. No problems with opening (and saving again) this document in Preview app; I was able to copy the complete text from the saved document into a text editor app.

@Kurt: It seems to only affect ABBYY this time, yeah. Just to be sure, though: Did you also *modify* the PDF in Preview before saving it there? Remove or rotate one of the pages, then save the PDF with Preview, and then (important step!) close Preview completely and reopen the document. Only after all that it's breaking for me. I would guess that while Preview does break the PDF upon saving, it still has the correct version in memory, so you don't actually notice it until reopening. But of course only after really saving. When you don't change anything and just press CMD+S, it appears to be saving, but apparently it's smart enough not to change the file when you didn't actually change anything in the document.

Quartz uses (basically) the PDF imaging model, so as an OS, it is great at PDF. That doesn't necessarily mean every application is great at every task.

> As with the macOS 10.12 bug—was it not added to a regression test suite?

I thought Apple was infamous for using barely any automated testing.

Glad to see that Craig Federighi is doing such a great job as senior Vice President of software engineering. /s

