Wednesday, March 7, 2018

Constructing Human-grade Parsers

Joe Groff (tweet):

Parsing is one of the most thoroughly explored topics in computer science, but building parsers that give high-quality diagnostics and user feedback is still largely folk art. Here are some observations on how parsers can be constructed in a way that makes it easier to recover from parse errors, produce multiple diagnostics in one pass, and provide partial results for further analysis even in the face of errors, providing a better experience for user-driven command line tools and interactive environments.

[…]

Thinking about it a different way, we want parsing to always succeed at producing some kind of structured result. The result can contain error nodes inside it, but the error nodes don’t have to replace the entire result. How do we make a parser that always succeeds, and how exactly do we recover when we find a parse error? We can look at both problems from the perspective of designing the grammar. Effectively, we want to take a grammar and extend it to make it total, so that every string matches a rule, by adding rules for erroneous inputs.

[…]

If you’re designing a grammar from scratch, it’s also good to think about how your grammar can be parsed in a recoverable way, by considering what kinds of errors or incomplete edits users may make, and what kinds of synchronization points you can design into the grammar so that a parser can recover from malformed input.

Joe Groff:

Yeah, even though whitespace isn’t formally significant most people well-indent their code in practice. I think recent GCC uses indentation as a hint to match up imbalanced { } pairs; Clang and Swift should do the same

Andy Gocke:

My first rule: don’t use a generated parser. The effort in making a hand-written recursive descent parser will pay itself off many times over in maintenance.

Parser combinators are awesome for getting something working, but tend to produce a lot of allocations. For a production compiler, I think the amortized cost of rolling your own is so low I wouldn’t look for a library to help.

2 Comments RSS · Twitter

Parser combinators don't have to produce many allocations. In languages with linear typing such as Rust, and maybe in the future Swift if the ownership manifesto is implemented, it's possible to create safe, zero-copy parser-combinator libraries.

Nom for Rust is one such example

Pratt parsers FTW.

Leave a Comment