Our secure and easy-to-use client portal lets you access hundreds of on-demand training sessions, download our proprietary tools and manage your account and billing information. Not client? Take a look around to see what you can expect when you partner with Adswerve.
Subscribe to our monthly newsletter to get the latest updates in your inbox
Editor's note: Google ceased new development for the Urchin Software product line and ended purchases or upgrades on March 28th, 2012. Read more about Urchin’s retirement here.
"60,000 out of 73,000 file fownloads were from bots"
Web analytics based on web server log file data using
Urchin 6 web analytics software provides some very useful points of information that Google Analytics and any other tag-based web analytics solution simply can't deliver. However, one of the biggest challenges with logfile based analytics is pollution of the data by "bots" - search engines robots, crawlers, spiders, scapers, etc... I've found that the data generated by non-human activity can easily account for 60% of the hits in your logfiles, and if you don't exclude it, your resulting reports built on your server logs can be off by that margin. Typical installations of popular logfile analysis tools like Webtrends, AW Stats, Webalizer, and even most Urchin installations
won't exclude robotically-generated data by default.
Old vs. New Filtering Options
I was working on a project for a client recently using
Urchin 6 in our cloud-based hosting environment and needed to process some old logfile data to cross-analyze and validate Google Analytics data. Out of 73,000 hits for download files in the log data, 60,000 were from bots. That's a problem! So, I thought "I have to exclude all those bots". In previous versions of Urchin (prior to 6.6) there has always been a "robots report", but no easy way to exclude robots. Well, I took a look at filtering options in our hosted version of Urchin 6.6.02 and found a convenient filtering field for "robot_agent". This field contains the user-agent for hits that were generated by a bot. Nice!
Creating the anti-bot filter
So, I created a simple filter: exclude all hits where "robot_agent" equals ".*" (i.e. any value). After applying the filter and re-processing the data (yeap - re-processing, you can't do that with Google Analytics! That's one reason I love
backing up my Google Analytics data to our analytics data warehouse) the reports were not completely void of any bot-generated data.