Friday, July 7, 2006 [Tweets] [Favorites]

Character-Level Diff in BBEdit

BBEdit has a great Find Differences feature that lets you compare files line-by-line, but sometimes I want to be able to see the differences within the lines. This is especially useful when editing paragraphs of text rather than lines of code. These two scripts let you compare two files at the character level and view the differences in your browser. Unlike with FileMerge, the files can be Unicode and the additions, deletions, and changes are color-coded.

First is a script that uses Python’s difflib to generate an HTML file with the differences and open it in the browser. It requires Python 2.4. Save it in a file called pyopendiff that has execute permissions. The input files can be Unicode, and they are assumed to be in UTF-8 (change the defaultEncoding to macroman if you want) unless there’s a BOM.

Python’s HtmlDiff class wasn’t designed to handle Unicode, so we HTML-escape the strings before diffing them and undo the (now-redundant) HTML-escaping that HtmlDiff would normally do. This approach is buggy in that it will not properly display escaped characters that have been changed (rather than added or deleted). This could be almost fixed by modifying HtmlDiff to use Unicode internally—since Python uses UTF-16 there would likely still be problems with surrogate pairs.


def main(oldPath, newPath):
    import os, subprocess, tempfile
    differ = MyHTMLDiff()
    output = differ.make_file(htmlStringFromPath(oldPath).splitlines(), 
    outPath = os.path.join(tempfile.mkdtemp(), "diff.html")
    writeStringToPath(output, outPath)["/usr/bin/open", outPath])
def htmlStringFromPath(path):
    return htmlFromUnicode(unicodeFromPath(path))

def htmlNameFromPath(path):
    import os
    return htmlFromUnicode(unicode(os.path.basename(path), "utf-8"))

def htmlFromUnicode(u):
    escaped = u.replace("&","&amp;").replace(">","&gt;").replace("<","&lt;")
    return escaped.encode("ascii", "xmlcharrefreplace")

from difflib import HtmlDiff
class MyHTMLDiff(HtmlDiff):
    _styles = HtmlDiff._styles.replace("Courier", "ProFont, Monaco")
    def _format_line(self, *args):
        return unescapeHTML(super(MyHTMLDiff, self)._format_line(*args))

def unescapeHTML(s):
    return s.replace("&gt;", ">").replace("&lt;", "<").replace("&amp;", "&")

def unicodeFromPath(path):
    return unicodeFromString(stringFromPath(path))

def stringFromPath(path):
    file = open(path, "r")
    result =
    return result

def unicodeFromString(data, defaultEncoding="utf-8"):
    import codecs
    bomToEncoding = {
        codecs.BOM_UTF8: "utf-8",
        codecs.BOM_UTF16_BE: "utf-16-be",
        codecs.BOM_UTF16_LE: "utf-16-le",
    for bom, encoding in bomToEncoding.items():
        if data.startswith(bom):
            data = data[len(bom):]
        encoding = defaultEncoding
    return unicode(data, encoding)

def writeStringToPath(string, path):
    file = open(path, "w")

import sys
main(sys.argv[1], sys.argv[2])

This AppleScript works like the Compare Two Front Documents command (except that the files must be saved to disk). Put it in ~/Library/Application Support/BBEdit/Scripts and assign it a keyboard shortcut.

tell application "BBEdit"
    set p1 to quoted form of POSIX path of ((file of window 1) as alias)
    set p2 to quoted form of POSIX path of ((file of window 2) as alias)
    do shell script "path/to/pyopendiff " & p1 & " " & p2
end tell


Hey Michael,

Do you know if the python devs plan to make difflib/HTMLDiff use unicode any time? have bug reports been filed on this, etc? I wonder what kind of prodding it would take to get such things solved for future versions of python?


I don't know, but it certainly seems worth filing a bug.

Stay up-to-date by subscribing to the Comments RSS Feed for this post.

Leave a Comment