Parsing My Apache Logs
I stopped using Google Analytics a couple of months ago. […] But I am still curious about which pages are being read and where the readers are coming from, so a wrote a little script to parse my site’s Apache log file and return the top five pages and referrers for a given day. Along the way, I learned more about Python’s
collections
library and thegroupdict
method for regular expression matches.
I’ve been thinking of doing something similar. Google Analytics is annoying to check because it requires logging in. My different sites are under different accounts, and it takes a lot of clicking around to get to the information that I want. Mint doesn’t filter referrers reliably and seems to increase the load on my server. By writing my own script that accesses the logs directly, I’ll be able to track non-JavaScript requests (e.g. .dmg downloads) and also calculate some custom analytics that wouldn’t be possible with off-the-shelf software.