{"id":1282,"date":"2006-07-07T14:15:55","date_gmt":"2006-07-07T18:15:55","guid":{"rendered":"http:\/\/mjtsai.com\/blog\/?p=1282"},"modified":"2006-07-07T15:41:40","modified_gmt":"2006-07-07T19:41:40","slug":"character-level-diff-in-bbedit","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2006\/07\/07\/character-level-diff-in-bbedit\/","title":{"rendered":"Character-Level Diff in BBEdit"},"content":{"rendered":"<p>\r\n<a href=\"http:\/\/www.barebones.com\/products\/bbedit\/\">BBEdit<\/a> has a great Find Differences feature that lets you compare files line-by-line, but sometimes I want to be able to see the differences within the lines. This is especially useful when editing paragraphs of text rather than lines of code. These two scripts let you compare two files at the character level and view the differences in your browser. Unlike with FileMerge, the files can be Unicode and the additions, deletions, and changes are color-coded.\r\n<\/p>\r\n\r\n<p>\r\nFirst is a script that uses Python&rsquo;s <code>difflib<\/code> to generate an HTML file with the differences and open it in the browser. It requires \r\n<a href=\"http:\/\/pythonmac.org\/packages\/py24-fat\/index.html\">Python 2.4<\/a>. Save it in a file called <tt>pyopendiff<\/tt> that has execute permissions. The input files can be Unicode, and they are assumed to be in UTF-8 (change the <code>defaultEncoding<\/code> to <code>macroman<\/code> if you want) unless there&rsquo;s a BOM. \r\n<\/p>\r\n<p>\r\nPython&rsquo;s <code>HtmlDiff<\/code> class wasn&rsquo;t designed to handle Unicode, so we HTML-escape the strings before diffing them and undo the (now-redundant) HTML-escaping that <code>HtmlDiff<\/code> would normally do. This approach is buggy in that it will not properly display escaped characters that have been changed (rather than added or deleted). This could be almost fixed by modifying <code>HtmlDiff<\/code> to use\r\nUnicode internally&mdash;since Python uses UTF-16 there would likely still be problems with surrogate pairs.<\/p>\r\n\r\n<pre>\r\n#!\/usr\/local\/bin\/python2.4\r\n\r\ndef main(oldPath, newPath):\r\n    import os, subprocess, tempfile\r\n    differ = MyHTMLDiff()\r\n    output = differ.make_file(htmlStringFromPath(oldPath).splitlines(), \r\n                              htmlStringFromPath(newPath).splitlines(),\r\n                              htmlNameFromPath(oldPath), \r\n                              htmlNameFromPath(newPath))\r\n    outPath = os.path.join(tempfile.mkdtemp(), \"diff.html\")\r\n    writeStringToPath(output, outPath)\r\n    subprocess.call([\"\/usr\/bin\/open\", outPath])\r\n    \r\ndef htmlStringFromPath(path):\r\n    return htmlFromUnicode(unicodeFromPath(path))\r\n\r\ndef htmlNameFromPath(path):\r\n    import os\r\n    return htmlFromUnicode(unicode(os.path.basename(path), \"utf-8\"))\r\n\r\ndef htmlFromUnicode(u):\r\n    escaped = u.replace(\"&amp;\",\"&amp;amp;\").replace(\"&gt;\",\"&amp;gt;\").replace(\"&lt;\",\"&amp;lt;\")\r\n    return escaped.encode(\"ascii\", \"xmlcharrefreplace\")\r\n\r\nfrom difflib import HtmlDiff\r\nclass MyHTMLDiff(HtmlDiff):\r\n    _styles = HtmlDiff._styles.replace(\"Courier\", \"ProFont, Monaco\")\r\n    def _format_line(self, *args):\r\n        return unescapeHTML(super(MyHTMLDiff, self)._format_line(*args))\r\n\r\ndef unescapeHTML(s):\r\n    return s.replace(\"&amp;gt;\", \"&gt;\").replace(\"&amp;lt;\", \"&lt;\").replace(\"&amp;amp;\", \"&amp;\")\r\n\r\ndef unicodeFromPath(path):\r\n    return unicodeFromString(stringFromPath(path))\r\n\r\ndef stringFromPath(path):\r\n    file = open(path, \"r\")\r\n    result = file.read()\r\n    file.close()\r\n    return result\r\n\r\ndef unicodeFromString(data, defaultEncoding=\"utf-8\"):\r\n    import codecs\r\n    bomToEncoding = {\r\n        codecs.BOM_UTF8: \"utf-8\",\r\n        codecs.BOM_UTF16_BE: \"utf-16-be\",\r\n        codecs.BOM_UTF16_LE: \"utf-16-le\",\r\n    }\r\n    for bom, encoding in bomToEncoding.items():\r\n        if data.startswith(bom):\r\n            data = data[len(bom):]\r\n            break\r\n    else:\r\n        encoding = defaultEncoding\r\n    return unicode(data, encoding)\r\n\r\ndef writeStringToPath(string, path):\r\n    file = open(path, \"w\")\r\n    file.write(string)\r\n    file.close()\r\n\r\nimport sys\r\nmain(sys.argv[1], sys.argv[2])\r\n<\/pre>\r\n\r\n<p>\r\nThis AppleScript works like the Compare Two Front Documents command (except that the files must be saved to disk). Put it in <tt>~\/Library\/Application Support\/BBEdit\/Scripts<\/tt> and assign it a keyboard shortcut.<\/p>\r\n\r\n<pre>\r\ntell application \"BBEdit\"\r\n    set p1 to quoted form of POSIX path of ((file of window 1) as alias)\r\n    set p2 to quoted form of POSIX path of ((file of window 2) as alias)\r\n    do shell script \"path\/to\/pyopendiff \" &amp; p1 &amp; \" \" &amp; p2\r\nend tell\r\n<\/pre>\r\n","protected":false},"excerpt":{"rendered":"<p>BBEdit has a great Find Differences feature that lets you compare files line-by-line, but sometimes I want to be able to see the differences within the lines. This is especially useful when editing paragraphs of text rather than lines of code. These two scripts let you compare two files at the character level and view [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"","apple_news_api_id":"","apple_news_api_modified_at":"","apple_news_api_revision":"","apple_news_api_share_url":"","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[4],"tags":[],"class_list":["post-1282","post","type-post","status-publish","format-standard","hentry","category-programming-category"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/1282","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=1282"}],"version-history":[{"count":0,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/1282\/revisions"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=1282"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=1282"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=1282"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}