{"id":20822,"date":"2018-03-07T15:38:32","date_gmt":"2018-03-07T20:38:32","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=20822"},"modified":"2018-03-07T15:38:32","modified_gmt":"2018-03-07T20:38:32","slug":"constructing-human-grade-parsers","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2018\/03\/07\/constructing-human-grade-parsers\/","title":{"rendered":"Constructing Human-grade Parsers"},"content":{"rendered":"<p><a href=\"http:\/\/duriansoftware.com\/joe\/Constructing-human-grade-parsers.html\">Joe Groff<\/a> (<a href=\"https:\/\/twitter.com\/jckarter\/status\/970394414899019776\">tweet<\/a>):<\/p>\n<blockquote cite=\"http:\/\/duriansoftware.com\/joe\/Constructing-human-grade-parsers.html\"><p>Parsing is one of the most thoroughly explored topics in computer science, but building parsers that give high-quality diagnostics and user feedback is still largely folk art. Here are some observations on how parsers can be constructed in a way that makes it easier to recover from parse errors, produce multiple diagnostics in one pass, and provide partial results for further analysis even in the face of errors, providing a better experience for user-driven command line tools and interactive environments.<\/p>\n<p>[&#8230;]<\/p>\n<p>Thinking about it a different way, we want parsing to always succeed at producing some kind of structured result. The result can contain error nodes inside it, but the error nodes don&rsquo;t have to replace the entire result. How do we make a parser that always succeeds, and how exactly do we recover when we find a parse error? We can look at both problems from the perspective of designing the grammar. Effectively, we want to take a grammar and extend it to make it total, so that every string matches a rule, by adding rules for erroneous inputs.<\/p>\n<p>[&#8230;]<\/p>\n<p>If you&rsquo;re designing a grammar from scratch, it&rsquo;s also good to think about how your grammar can be parsed in a recoverable way, by considering what kinds of errors or incomplete edits users may make, and what kinds of synchronization points you can design into the grammar so that a parser can recover from malformed input.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/jckarter\/status\/970420018545438720\">Joe Groff<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/jckarter\/status\/970420018545438720\">\n<p>Yeah, even though whitespace isn&rsquo;t formally significant most people well-indent their code in practice. I think recent GCC uses indentation as a hint to match up imbalanced { } pairs; Clang and Swift should do the same<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/andygocke\/status\/970435363998113792\">Andy Gocke<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/andygocke\/status\/970435363998113792\">\n<p>My first rule: don&rsquo;t use a generated parser.  The effort in making a hand-written recursive descent parser will pay itself off many times over in maintenance.<\/p>\n<\/blockquote>\n<blockquote cite=\"https:\/\/twitter.com\/andygocke\/status\/970439882630864899\"><p>Parser combinators are awesome for getting something working, but tend to produce a lot of allocations. For a production compiler, I think the amortized cost of rolling your own is so low I wouldn&rsquo;t look for a library to help.<\/p><\/blockquote>","protected":false},"excerpt":{"rendered":"<p>Joe Groff (tweet): Parsing is one of the most thoroughly explored topics in computer science, but building parsers that give high-quality diagnostics and user feedback is still largely folk art. Here are some observations on how parsers can be constructed in a way that makes it easier to recover from parse errors, produce multiple diagnostics [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"","apple_news_api_id":"","apple_news_api_modified_at":"","apple_news_api_revision":"","apple_news_api_share_url":"","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[],"tags":[255,285,270,71],"class_list":["post-20822","post","type-post","status-publish","format-standard","hentry","tag-compiler","tag-gcc","tag-parser","tag-programming"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/20822","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=20822"}],"version-history":[{"count":1,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/20822\/revisions"}],"predecessor-version":[{"id":20823,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/20822\/revisions\/20823"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=20822"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=20822"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=20822"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}