{"id":43061,"date":"2024-05-02T16:28:09","date_gmt":"2024-05-02T20:28:09","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=43061"},"modified":"2024-05-16T12:14:26","modified_gmt":"2024-05-16T16:14:26","slug":"eaglefiler-1-9-14","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2024\/05\/02\/eaglefiler-1-9-14\/","title":{"rendered":"EagleFiler 1.9.14"},"content":{"rendered":"<p><a href=\"https:\/\/c-command.com\/blog\/2024\/05\/02\/eaglefiler-1-9-14\/\">EagleFiler 1.9.14<\/a> is a maintenance release for my Mac information organizer app.<\/p>\n\n<p>Some interesting bugs were:<\/p>\n<ul>\n<li><p>EagleFiler uses <code>NSXMLParser<\/code> to read <a href=\"https:\/\/c-command.com\/eaglefiler\/help\/import-enex-file\">Evernote&rsquo;s ENEX files<\/a>, which it then converts to standard RTF\/RTFD files that any app can read. Some customers had files that were causing it to fail with <code>NSXMLParserUndeclaredEntityError<\/code> even though the referenced entities did not appear anywhere in the XML that I was giving the parser. It turns out that sometimes Evernote notes contain images in SVG format (which is based on XML). If Evernote isn&rsquo;t able to download the image it will in its place save an <em>HTML error page<\/em> and say that it&rsquo;s <code>image\/svg+xml<\/code>. This HTML may contain entities that are not valid in XML. (And, of course, it&rsquo;s not valid SVG.)<\/p>\n<p>But why would this matter when I wasn&rsquo;t asking it to parse that &ldquo;XML&rdquo;? Well, I was putting the bad SVG files into an <code>NSAttributedString<\/code> and saving it as RTFD. I thought it just wrote out the data for the attachments without looking at it. But, apparently, Cocoa creates its <em>own<\/em> <code>NSXMLParser<\/code> to read the SVG. When it encounters the parse error it doesn&rsquo;t report it back to via the RTF-generation method that I had called. Instead, it clobbers the internal state of <em>my<\/em> <code>NSXMLParser<\/code>. The <a href=\"https:\/\/developer.apple.com\/documentation\/foundation\/nsxmlparser\">documentation says<\/a>:<\/p>\n<blockquote cite=\"https:\/\/developer.apple.com\/documentation\/foundation\/nsxmlparser\"><p>Unless used in a callback, the <code>NSXMLParser<\/code> is a thread-safe class as long as any given instance is only used in one thread.<\/p><\/blockquote>\n<p>which led me to think that I would be safe confining my own parser object to a single thread. However, I&rsquo;m not entirely sure what &ldquo;Unless used in a callback&rdquo; means here. Callback for what? I was using <code>NSAttributedString<\/code> in a callback for my <code>NSXMLParser<\/code>. It appears that <code>NSXMLParser<\/code> is not re-entrant even across <em>different<\/em> parser instances. If that&rsquo;s the problem, I don&rsquo;t know how you&rsquo;re supposed to know which APIs might create their own <code>NSXMLParser<\/code>, so you can&rsquo;t really do <em>anything<\/em> in a parser callback. The whole reason I&rsquo;m using a SAX parser is that people want to import multi-GB files without reading them into memory. So I don&rsquo;t want to just store the data from the callback in memory and process it later, nor do I want to temporarily write chunks of partially processed data to disk for later processing. I ended up working around the problem by forcing <code>NSAttributedString<\/code> to create its parser in a different thread.<\/p>\n<\/li>\n\n<li><p>Another <code>NSXMLParser<\/code> issue. I knew that <code>-parser:foundCharacters:<\/code> could be called multiple times for a single block of text, but I thought this only occurred for large blocks. It turns out that, if there are non-ASCII characters, it can be called multiple times even for very short strings, and there were a few cases where this wasn&rsquo;t properly handled. This one is purely my fault.<\/p><\/li>\n\n<li><p>Another <code>NSAttributedString<\/code> issue. Files attached to Evernote notes can have very long names, and I was truncating them to fit the maximum supported filename length. <code>NSAttributedString<\/code> can handle multiple attachments with the same name; it will add a numeric prefix to make the names unique. However, if the name is long, <code>NSAttributedString<\/code> does not check that the filename is still valid after adding the prefix. So EagleFiler needs to pre-process the filenames so that they are still valid after <code>NSAttributedString<\/code> mucks with them.<\/p><\/li>\n\n<li><p>A recent version of Safari broke the <code>document<\/code> AppleScript object, which I think had been supported for over 20 years. The properties that I wanted to access (to <a href=\"https:\/\/c-command.com\/eaglefiler\/help\/importing-from-safari\">import from Safari<\/a>) do seem to work on the <code>tab<\/code> object, so I rewrote my script to use that. It only does this if the old way fails because it&rsquo;s not practical for me to test all the old versions of Safari to make sure the new way works with them. It looks like Apple may have fixed the <code>document<\/code> bug in the macOS 14.5 beta.<\/p><\/li>\n\n<li><p>Text from different fields needed actual token separators so that Search Kit would not find <a href=\"https:\/\/c-command.com\/eaglefiler\/help\/search-query-syntax\">phrase matches<\/a> across field boundaries.<\/p><\/li>\n<\/ul>\n\n<p>Previously:<\/p>\n<ul>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2024\/03\/07\/eaglefiler-1-9-13\/\">EagleFiler 1.9.13<\/a><\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>EagleFiler 1.9.14 is a maintenance release for my Mac information organizer app. Some interesting bugs were: EagleFiler uses NSXMLParser to read Evernote&rsquo;s ENEX files, which it then converts to standard RTF\/RTFD files that any app can read. Some customers had files that were causing it to fail with NSXMLParserUndeclaredEntityError even though the referenced entities did [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2024-05-02T20:28:13Z","apple_news_api_id":"679fc428-b4ae-41a0-9ddd-3556553ee911","apple_news_api_modified_at":"2024-05-16T16:14:31Z","apple_news_api_revision":"AAAAAAAAAAAAAAAAAAAAAA==","apple_news_api_share_url":"https:\/\/apple.news\/AZ5_EKLSuQaCd3TVWVT7pEQ","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[4],"tags":[159,131,69,800,595,1448,30,32,2385,71,820,103,586,866],"class_list":["post-43061","post","type-post","status-publish","format-standard","hentry","category-programming-category","tag-applescript","tag-bug","tag-cocoa","tag-concurrency","tag-eaglefiler","tag-evernote","tag-mac","tag-macapp","tag-macos-14-sonoma","tag-programming","tag-rich-text-format-rtf","tag-safari","tag-svg","tag-xml"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/43061","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=43061"}],"version-history":[{"count":1,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/43061\/revisions"}],"predecessor-version":[{"id":43062,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/43061\/revisions\/43062"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=43061"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=43061"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=43061"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}