Character-Level Diff in BBEdit
BBEdit has a great Find Differences feature that lets you compare files line-by-line, but sometimes I want to be able to see the differences within the lines. This is especially useful when editing paragraphs of text rather than lines of code. These two scripts let you compare two files at the character level and view the differences in your browser. Unlike with FileMerge, the files can be Unicode and the additions, deletions, and changes are color-coded.
First is a script that uses Python’s difflib
to generate an HTML file with the differences and open it in the browser. It requires
Python 2.4. Save it in a file called pyopendiff that has execute permissions. The input files can be Unicode, and they are assumed to be in UTF-8 (change the defaultEncoding
to macroman
if you want) unless there’s a BOM.
Python’s HtmlDiff
class wasn’t designed to handle Unicode, so we HTML-escape the strings before diffing them and undo the (now-redundant) HTML-escaping that HtmlDiff
would normally do. This approach is buggy in that it will not properly display escaped characters that have been changed (rather than added or deleted). This could be almost fixed by modifying HtmlDiff
to use
Unicode internally—since Python uses UTF-16 there would likely still be problems with surrogate pairs.
#!/usr/local/bin/python2.4 def main(oldPath, newPath): import os, subprocess, tempfile differ = MyHTMLDiff() output = differ.make_file(htmlStringFromPath(oldPath).splitlines(), htmlStringFromPath(newPath).splitlines(), htmlNameFromPath(oldPath), htmlNameFromPath(newPath)) outPath = os.path.join(tempfile.mkdtemp(), "diff.html") writeStringToPath(output, outPath) subprocess.call(["/usr/bin/open", outPath]) def htmlStringFromPath(path): return htmlFromUnicode(unicodeFromPath(path)) def htmlNameFromPath(path): import os return htmlFromUnicode(unicode(os.path.basename(path), "utf-8")) def htmlFromUnicode(u): escaped = u.replace("&","&").replace(">",">").replace("<","<") return escaped.encode("ascii", "xmlcharrefreplace") from difflib import HtmlDiff class MyHTMLDiff(HtmlDiff): _styles = HtmlDiff._styles.replace("Courier", "ProFont, Monaco") def _format_line(self, *args): return unescapeHTML(super(MyHTMLDiff, self)._format_line(*args)) def unescapeHTML(s): return s.replace(">", ">").replace("<", "<").replace("&", "&") def unicodeFromPath(path): return unicodeFromString(stringFromPath(path)) def stringFromPath(path): file = open(path, "r") result = file.read() file.close() return result def unicodeFromString(data, defaultEncoding="utf-8"): import codecs bomToEncoding = { codecs.BOM_UTF8: "utf-8", codecs.BOM_UTF16_BE: "utf-16-be", codecs.BOM_UTF16_LE: "utf-16-le", } for bom, encoding in bomToEncoding.items(): if data.startswith(bom): data = data[len(bom):] break else: encoding = defaultEncoding return unicode(data, encoding) def writeStringToPath(string, path): file = open(path, "w") file.write(string) file.close() import sys main(sys.argv[1], sys.argv[2])
This AppleScript works like the Compare Two Front Documents command (except that the files must be saved to disk). Put it in ~/Library/Application Support/BBEdit/Scripts and assign it a keyboard shortcut.
tell application "BBEdit" set p1 to quoted form of POSIX path of ((file of window 1) as alias) set p2 to quoted form of POSIX path of ((file of window 2) as alias) do shell script "path/to/pyopendiff " & p1 & " " & p2 end tell
2 Comments RSS · Twitter
Hey Michael,
Do you know if the python devs plan to make difflib/HTMLDiff use unicode any time? have bug reports been filed on this, etc? I wonder what kind of prodding it would take to get such things solved for future versions of python?
-Jacob