I was doing a bit of evaluation of log analyzers, and was a bit disappointed by what I found. It seems like the only serious open source log analyzers are Analog, AWStats, and Webalizer. They all seem okay, but not great. They don't seem any different then they were five years ago.
Am I missing something? This doesn't seem like a hard problem. It's also something that would be useful to solve well. All it should take is a egoist who is obsessed with their logs. Isn't there a glut of such people out there, with the necessary programming skill? That described me at one point, though I never really carried it to completion. Why hasn't something better come along? Or has something better come along, and whoever made it forgot to tell everyone?
Sometimes it's easy to understand why some holes in open source exist, but this one doesn't make any sense to me.
* Most of the interesting things to do in log analysis are of more interest to "marketing people" than to "technical people".
* The "okay" analyzers are obviously "good enough" for "technical people". :)
1. What are you looking for but not finding?
2. Would you *stop* working on exactly what I'm working on every day? ;) I just picked Analog from the same candidates on Saturday.# Robert Brewer
awstats can be a pain to install, but I think it's more useful than the others. There's quite a bit of useful information, but what you define as useful depends on what information you're looking for!
Anyone with a website who is interested in how that website is read should be interested in log analysis. There's a reason most blog software has referrer tracking built in, and many have other kinds of tracking as well. Log analysis provides everything the blog software can do, but is more general to other websites. So I really don't think it's a lack of interest by technical people.
Maybe I'll write a list of features I think should exist, but don't, or don't work as well as they should.# Ian Bicking
Phillip was dead on when he said log analysis is only of interest to marketing people. Writing tools for marketing people that are not hard or interesting problems to solve is something open source is bad at doing. The person that can scratch isn't the one who itches.
I won't say this is a completely uninteresting problem, my bread and butter is web analytics (written in python). It is sometimes interesting, using standard x86 boxes to handle hundreds of thousands of hits a day each and report on the information in real time has some challenges. But I certainly wouldn't be doing it as a hobby.
Additionally picking the right metrics to use and presenting them in an understandable way needs at least one statistics guy and one UI guy. Open Source apps have trouble with consistancy here, you either end up with a mountain of uselessly specific reports or a few generic ones.# Jack Diederich
I would expect something that was general enough to make defining reports easy, and displaying those reports somewhat well. The actual detailed reports is something that takes time and thought to figure out, and it isn't very general, so I wouldn't expect that to come out of open source quickly; but we're great at making tools. For instance, Yet Another Advanced Logfile Analyser is an effort in that direction, defining a query language and a generic report based on that.# Ian Bicking
I tried my hand at log analysis, but finally figured that's not where it's at for the more advanced stuff.
Ian's got a point. Look at the feature set of commercial log analysis tools, and you see how they do customer tracking, why shopping carts are abandoned, where are the traffic coming from, what search words were used, how much time do people spend on a page before moving on. Intranets could do with measuring how effective certain pages are, what resources are not getting used because of inattention. Log analysis could answer these questions.
Part of the reason is that once you have a sufficient understanding of HTTP, you realise that virtually all the reasons you wanted log analysis are not suitable for analysis. Things like caching make it practically impossible to get number of visitors etc. There are tricks you can do (e.g. cookies), but there are always things that make the tricks unreliable (e.g. people refusing cookies).
HTTP log analysis is good for server optimisation, and not a lot else.