Thursday, August 25, 2016

Dropbox Document Scanning Tech

Ying Xiong:

A few weeks ago, Dropbox launched a set of new productivity tools including document scanning on iOS. This new feature allows users to scan documents with their smartphone camera and store those scans directly in their Dropbox. The feature automatically detects the document in the frame, extracts it from the background, fits it to a rectangular shape, removes shadows and adjusts the contrast, and finally saves it to a PDF file. For Dropbox Business users, we also run Optical Character Recognition (OCR) to recognize the text in the document for search and copy-pasting.

[…]

We decided to develop a customized computer vision algorithm that relies on a series of well-studied fundamental components, rather than the “black box” of machine learning algorithms such as DNNs. The advantages of this approach are that it is easier to understand and debug, needs much less labeled training data, runs very fast and uses less memory at run time. It is also more accurate than Apple’s SDK for the kinds of usage scenarios we care about; in an A/B test evaluation, the detections found by our algorithm are 60% less likely to be manually corrected by users than those found by Apple’s API.

Jongmin Baek:

Once we have a rectangular rendering of the document, the next step is to give it a clean and crisp scanned appearance. We can explicitly formulate this as an optimization problem; that is, we solve for the final output image J(x,y) as a function of the input image I(x, y) that satisfies the two aforementioned requirements to the greatest extent possible:
  • The background of the document is mostly a uniform white, with even illumination.
  • The foreground text and figures are crisp and visible with high contrast.

[…]

In our experiments, the output colors using this simple algorithm would look faded, even though the RGB values were exactly the same as the input! The reason is that the human visual system is based on relative brightness, not absolute ones; this makes colors “pop” more relative to the dull gray of the input, but not relative to the bright white background of the enhanced image.

1 Comment RSS · Twitter

[…] it and found it OK. The OCR works pretty well, and the capturing interface is streamlined. As with Dropbox, it creates much larger PDF files than my ScanSnap. It supports iOS share extensions, but before […]

Leave a Comment