Google Analytics: The Implications of an Easy Setup

On: September 26, 2008
About Andrea Fiore
I am an amateur software designer and wannabe Internet researcher. I am generally interested in non-conventional approaches to software; in both its production and its usage practice. As graduation project for my MA in Media Design at Piet Zwart Institute (Rotterdam), I have been investigating web audiences analytics in relation with content personalization, behavioral marketing and web-based advertising. I am an expert in web-cookies and I try to devise tools and methods to better understand the economy behind them. Since few months, I contribute as a programmer to Digital Methods Initiative, a research group directed by Richard Rogers and based at UVA Amsterdam.

Website
http://blog.cookiecensus.org    

Here at Masters of Media it is time for a redesign. As students of the new year we are encouraged not only to contribute with fresh new content to the blog, but also to re-think its shape; both in the information architecture and the visual appearance.  While different student groups are working out several proposals, many of us have observed that, disposing of up-to-date knowledge about the ways our audience accesses to and interacts with the blog would be a crucial aid in identifying problems and possible improvement directions. As many of us already pointed out, the web based service Google Analytics is arguably an easy, effective and ultimately convenient way to accommodate these needs. Although former contributors of Masters of Media were aware of this solution, one year ago they decided to avoid using it.

This post is an attempt to present some of the main features of Google Analytics and trying to explain some of the most important rationales behind its former non-adoption. Its ultimate aim is to re-open, both within the context of Masters of Media and of the web in general, a critical and informed discussion about audience analysis.

Web analytics: main data collection methods:

In April 2005, Google acquired Urchin Software Corp, a Californian IT company with a business in the field of web statistics. Urchin, the company’s main product, was a proprietary solution for web traffic analysis. The tool was basically a fast and reliable webserver access log analyzer. Although being in many of its aspects similar to open source products such us AWStats and Webalizer, Urchin had a couple of features that allowed it to strongly differentiate itself from its competition. It was, according to the ex-head of Web Analytics for Europe, Middle East and Africa Brian Clifton, the “first web analytics vendor being able to import and integrate PPC cost/click data from Adwords and Overture”. Second, it was capable of  operating with the two most common web-traffic data collection methods: log analysis and page tagging.

Clifton provides a concise and effective explanation of both the methods in a recent whitepaper published by his company Omega Digital Media.

Logfiles refer to data collected by your web server, which is independent of a visitors’ browser. By default, all requests to a web server (pages, images, pdf’s etc) are logged to a file – usually in plain text. This type of technique is known as server-side data collection[…]

Page Tags refer to data collected by a visitors’ web browser, achieved by placing code on each page of your site. Often it is simply a single snippet (tag) of code referencing a separate javascript file – hence the name. Some vendors also add multiple custom tags to set/collect further data. This type of technique is known as client-side data collection”

Beside explaining these methods, the whitepaper also provides a comparison of their respective Total Cost Ownership and discusses their advantages and disadvantages. It’s worth to notice here that the log-analysis method, while ensuring full control over the data (decentralized system), presents the following downsides:

  • No event tracking (javascript, Flash, AJAX, web2.0)
  • Program updates performed by your own team
  • Data storage and archiving performed by your own team.

Consequently, managing your audience data by your hown is more expensive, time consuming and ultimately less accurate than having google doing it for you.

The one-paste set-up and its implications

The claim used by google to advertise its product “Spend on marketing, not on web analytics” is perhaps true: compared to most of its competitors, GA provides a completely free-of-charge and easy to implement web analytics solution. Thus, it allows companies to focus on strategies and decision making without spending time and resources in the process of gathering and processing audience information. Once logged-in with a google account, the user registers her web-sites on the service in few clicks. In order to have its web traffic tracked and  analyzed, she has only has to tag its website’s HTML with a tiny snipped of Google’s tracking code.

The code snippet, generated by GA once the site registration procedure is complete, consists of few lines of javascript:

an HTTP GET request sent to google-analytics.com from a web bug included in nimmagazine.it

Firefox add-on Firebug showing an HTTP GET request sent to google-analytics.com through a web beacon generated by the analytics script

Whenever a browser loads a page located on the site, it will execute this code and append an invisible google web-beacon in page. Through this 1px per 1px image (more specifically, through the list of parameters that the script appends to the HTTP GET request), the software running on google-analytics.com can easily identify and sort the audience of all its affiliated sites. The data-flow of such a process is ultimately centralized since, through the web-beacon expedient, each page-click on any affiliated site will trigger the creation of a new record on the google-analytics databases.

note for the geek reader: in order to avoid being blocked by privacy aware browser settings and similar anti-spyware measures, google Analytics does not rely, as most adservers do, on third party cookies. It uses instead javascript to have its tracking cookies generated directly by the first party affiliated site. (further technicalities are available at google code).

Main service features:

The basic functionalities that GA provides to the average site-owner can be perhaps best explained through the the well known six questions checklist of old school journalism; also known as the 5Ws maxims. What GA stats provide, are indeed not proper answers, but valuable insights for answering yourself to the who, what, where, when, how and ultimately the why of the web traffic generated by your website.

GA core functionalities include:

  • Users reports: analyzable by geographical location, browser language, browser/network capabilities, visitor loyalty, etc..
  • Traffic sources reports: divided in direct traffic, referring sites, search engines. Search engine traffic can be also analyzed by keywords.
  • Content reports: site resources can be ranked by number of views, time of page, bounce rate and other metrics. These reports also provide the possibility to identify and rank landing pages and exit pages. Finally, it is possible to visually overlay the access-stats directly on the website layout, thus being able to contextualize the traffic data within the topological structure of the website.

Although the data is presented visually to the user through an intuitive dashboard populated by charts and configurable widgets, the language spoken by the GA interface is still much the one of marketers and  media planners. The work of reading through the different visualizations and turning these into strategy and publishing decisions is still supposed to be done by a qualified human. Using more advanced features, such as the possibility of setting-up goals and campaigns and thus quantifying monetarily their achievement in real time, requires indeed a certain confidence and proficiency both with the web-marketing jargon and the Google Adwords platform (which is indeed the core part of google business model).

Data-sharing and industry benchmarking

On 5 March 2007, google announced on the Analytics blog the implementation of a new feature meant to give costumers “more choice and control” over their data. This functionality, dubbed benchmark and accessible through the visitors section, is an opt-in only feature. Users allowing Google to share part of their data with other GA users, are rewarded with the possibility to compare their own audience stats with anonymous data aggregate from sites of the same size belonging to the same or to an other industry sector. One could easily read

this new feature, as a strategical move to turn a problematic aspect of the service (its centralized architecture and its implications in terms of privacy and trust) into a added value for the users. Such a move may also be due to the fact that already several free of charge web analytics services (such as Compete and Quantacast) make public part of their users’ traffic statistics in order to rank the web. Although the data-sharing is presented as an option, it may still be read as a rhetoric expedient, a trick useful to give users a feeling of control and privacy.

Conclusions:

Google undoubtedly occupies in today’s web economy a position of prominence and centrality. Within a continuously growing, highly pervasive medium such as the web, a single organization is at the same time the privileged entry point, the information sorting device, the just in time advertisement agency for all business and pockets… and ultimately, the most-reliable analyst of information-consumption behaviors.
The analytics service is yet an other step in the direction of the information monopoly. Once capable of tracking users not only at the entry point (its own  search service), but also within the overall web-space owned by the affiliated sites, Google is probably on the way of massively extending its already valuable database of intentions.

From the prospective of a profit-driven organization, the adoption of the google Analytics is, if not a must-do choice, undeniably convenient. Nevertheless, from the prospective of a public institution or a university project such as our blog, the explicit choice of non-adopting can be an opportunity to make a political statement and rise public awareness about the risks of data-concentration and new information monopolies. Rather than feeding the database of intentions with yet other audience data, it is perhaps more interesting, and at the same time ethical, to open the black box of web analytics and do the dirty data-warehousing work by ourself.

Comments are closed.