Google Analytics: The Implications of an Easy Setup

On: September 26, 2008
Print Friendly
About Andrea Fiore
I am an amateur software designer and wannabe Internet researcher. I am generally interested in non-conventional approaches to software; in both its production and its usage practice. As graduation project for my MA in Media Design at Piet Zwart Institute (Rotterdam), I have been investigating web audiences analytics in relation with content personalization, behavioral marketing and web-based advertising. I am an expert in web-cookies and I try to devise tools and methods to better understand the economy behind them. Since few months, I contribute as a programmer to Digital Methods Initiative, a research group directed by Richard Rogers and based at UVA Amsterdam.

Website
http://blog.cookiecensus.org    

Here at Masters of Media it is time for a redesign. As students of the new year we are encouraged not only to contribute with fresh new content to the blog, but also to re-think its shape; both in the information architecture and the visual appearance.  While different student groups are working out several proposals, many of us have observed that, disposing of up-to-date knowledge about the ways our audience accesses to and interacts with the blog would be a crucial aid in identifying problems and possible improvement directions. As many of us already pointed out, the web based service Google Analytics is arguably an easy, effective and ultimately convenient way to accommodate these needs. Although former contributors of Masters of Media were aware of this solution, one year ago they decided to avoid using it.

This post is an attempt to present some of the main features of Google Analytics and trying to explain some of the most important rationales behind its former non-adoption. Its ultimate aim is to re-open, both within the context of Masters of Media and of the web in general, a critical and informed discussion about audience analysis.

Web analytics: main data collection methods:

In April 2005, Google acquired Urchin Software Corp, a Californian IT company with a business in the field of web statistics. Urchin, the company’s main product, was a proprietary solution for web traffic analysis. The tool was basically a fast and reliable webserver access log analyzer. Although being in many of its aspects similar to open source products such us AWStats and Webalizer, Urchin had a couple of features that allowed it to strongly differentiate itself from its competition. It was, according to the ex-head of Web Analytics for Europe, Middle East and Africa Brian Clifton, the “first web analytics vendor being able to import and integrate PPC cost/click data from Adwords and Overture”. Second, it was capable of  operating with the two most common web-traffic data collection methods: log analysis and page tagging.

Clifton provides a concise and effective explanation of both the methods in a recent whitepaper published by his company Omega Digital Media.

Logfiles refer to data collected by your web server, which is independent of a visitors’ browser. By default, all requests to a web server (pages, images, pdf’s etc) are logged to a file – usually in plain text. This type of technique is known as server-side data collection[…]

Page Tags refer to data collected by a visitors’ web browser, achieved by placing code on each page of your site. Often it is simply a single snippet (tag) of code referencing a separate javascript file – hence the name. Some vendors also add multiple custom tags to set/collect further data. This type of technique is known as client-side data collection”

Beside explaining these methods, the whitepaper also provides a comparison of their respective Total Cost Ownership and discusses their advantages and disadvantages. It’s worth to notice here that the log-analysis method, while ensuring full control over the data (decentralized system), presents the following downsides:

  • No event tracking (javascript, Flash, AJAX, web2.0)
  • Program updates performed by your own team
  • Data storage and archiving performed by your own team.

Consequently, managing your audience data by your hown is more expensive, time consuming and ultimately less accurate than having google doing it for you.

The one-paste set-up and its implications

The claim used by google to advertise its product “Spend on marketing, not on web analytics” is perhaps true: compared to most of its competitors, GA provides a completely free-of-charge and easy to implement web analytics solution. Thus, it allows companies to focus on strategies and decision making without spending time and resources in the process of gathering and processing audience information. Once logged-in with a google account, the user registers her web-sites on the service in few clicks. In order to have its web traffic tracked and  analyzed, she has only has to tag its website’s HTML with a tiny snipped of Google’s tracking code.

The code snippet, generated by GA once the site registration procedure is complete, consists of few lines of javascript:

an HTTP GET request sent to google-analytics.com from a web bug included in nimmagazine.it

Firefox add-on Firebug showing an HTTP GET request sent to google-analytics.com through a web beacon generated by the analytics script

Whenever a browser loads a page located on the site, it will execute this code and append an invisible google web-beacon in page. Through this 1px per 1px image (more specifically, through the list of parameters that the script appends to the HTTP GET request), the software running on google-analytics.com can easily identify and sort the audience of all its affiliated sites. The data-flow of such a process is ultimately centralized since, through the web-beacon expedient, each page-click on any affiliated site will trigger the creation of a new record on the google-analytics databases.

note for the geek reader: in order to avoid being blocked by privacy aware browser settings and similar anti-spyware measures, google Analytics does not rely, as most adservers do, on third party cookies. It uses instead javascript to have its tracking cookies generated directly by the first party affiliated site. (further technicalities are available at google code).

Main service features:

The basic functionalities that GA provides to the average site-owner can be perhaps best explained through the the well known six questions checklist of old school journalism; also known as the 5Ws maxims. What GA stats provide, are indeed not proper answers, but valuable insights for answering yourself to the who, what, where, when, how and ultimately the why of the web traffic generated by your website.

GA core functionalities include:

  • Users reports: analyzable by geographical location, browser language, browser/network capabilities, visitor loyalty, etc..
  • Traffic sources reports: divided in direct traffic, referring sites, search engines. Search engine traffic can be also analyzed by keywords.
  • Content reports: site resources can be ranked by number of views, time of page, bounce rate and other metrics. These reports also provide the possibility to identify and rank landing pages and exit pages. Finally, it is possible to visually overlay the access-stats directly on the website layout, thus being able to contextualize the traffic data within the topological structure of the website.

Although the data is presented visually to the user through an intuitive dashboard populated by charts and configurable widgets, the language spoken by the GA interface is still much the one of marketers and  media planners. The work of reading through the different visualizations and turning these into strategy and publishing decisions is still supposed to be done by a qualified human. Using more advanced features, such as the possibility of setting-up goals and campaigns and thus quantifying monetarily their achievement in real time, requires indeed a certain confidence and proficiency both with the web-marketing jargon and the Google Adwords platform (which is indeed the core part of google business model).

Data-sharing and industry benchmarking

On 5 March 2007, google announced on the Analytics blog the implementation of a new feature meant to give costumers “more choice and control” over their data. This functionality, dubbed benchmark and accessible through the visitors section, is an opt-in only feature. Users allowing Google to share part of their data with other GA users, are rewarded with the possibility to compare their own audience stats with anonymous data aggregate from sites of the same size belonging to the same or to an other industry sector. One could easily read

this new feature, as a strategical move to turn a problematic aspect of the service (its centralized architecture and its implications in terms of privacy and trust) into a added value for the users. Such a move may also be due to the fact that already several free of charge web analytics services (such as Compete and Quantacast) make public part of their users’ traffic statistics in order to rank the web. Although the data-sharing is presented as an option, it may still be read as a rhetoric expedient, a trick useful to give users a feeling of control and privacy.

Conclusions:

Google undoubtedly occupies in today’s web economy a position of prominence and centrality. Within a continuously growing, highly pervasive medium such as the web, a single organization is at the same time the privileged entry point, the information sorting device, the just in time advertisement agency for all business and pockets… and ultimately, the most-reliable analyst of information-consumption behaviors.
The analytics service is yet an other step in the direction of the information monopoly. Once capable of tracking users not only at the entry point (its own  search service), but also within the overall web-space owned by the affiliated sites, Google is probably on the way of massively extending its already valuable database of intentions.

From the prospective of a profit-driven organization, the adoption of the google Analytics is, if not a must-do choice, undeniably convenient. Nevertheless, from the prospective of a public institution or a university project such as our blog, the explicit choice of non-adopting can be an opportunity to make a political statement and rise public awareness about the risks of data-concentration and new information monopolies. Rather than feeding the database of intentions with yet other audience data, it is perhaps more interesting, and at the same time ethical, to open the black box of web analytics and do the dirty data-warehousing work by ourself.

2 Responses to “Google Analytics: The Implications of an Easy Setup”
  • September 27, 2008 at 6:20 am

    Andrea, I read you twitter message about this post, but I didn’t have the time to respond to it immediately. The analysis that you’re making about GA and the downsides that you point out are of course accurate. Though on a personal level I’m still not very concerned about the issues that you’re raising, and therefore I’m not against the experiment of using GA on our blog (next to using the open source alternative).

    The reason that I’m not that concerned about adding our blog contents and statistics to the information ‘monopoly’ of Google mainly has to do with the purpose of our blog, which of course is science and the creation and distribution of knowledge to as many people as possible.

    In your post you point out some of the threats of a centralized information system like that of Google. All data combined and processed according to surveillance principles could create the so called ‘database of intentions’. But what are actually the dangers of this kind of surveillance to us as writers of the content and to our visitors of the blog? Besides that I don’t regard a commercial business like Google as the equivalent of some kind of dangerous poison. And in a way our goal has a commercial nature to it as well when we’re going to use the visitor statistics to make our site as good as it can get, and eventually trying to be as interesting for advertisers as possible.

    Firstly, our content isn’t really of a commercial nature, at least not in the same way as it would be for a real business where the content would be the core product. So why should we bother about Google making some money out of it? They’re already doing it in several ways anyway, for example the listing of the blog in their search database, or Google docs that we’re using for our class projects.

    Secondly, a centralized information database would make it easier for Google or governments to censor specific kinds information. But this doesn’t really pose a threat to the things that we write about. The objects that we review and the ideas that we write about don’t pose a huge risk to Google or governments, therefore the content of the blog isn’t very susceptible to censorship. In relation to this you could argue that the privacy issue concerning the statistics of our visitors isn’t that big a deal either in terms of threats and possible surveillance. And if some of our visitors are that upset about GA, they can probably easily block the JavaScript from executing, or block the google-analythics.com domain, and visit our blog without submitting their statistics to GA.

    Thirdly, if we use a decentralized system to gather statistics, then we’ll need to maintain it ourselves. This means for instance that we’ll have to update the software immediately when a security updates comes out. Also the security on the Google servers will probably be much stronger than on our own server. Therefore I reckon that a decentralized system is more susceptible to an attack by a cracker, or any other possible danger where the security and privacy of our visitors data might be compromised.

    In short, I don’t think that our readers should be to concerned about the accumulation of statistics by GA. Why should they fear it when they’re probably using some of Google’s services (or similar) themselves, or come across many other sites that use GA for their visitor statistics? If we as students or academic researchers want to make statements about the real dangers of these centralized information databases, then we’ll probably reach a much larger audience if we write about these kinds of things. If the only thing that we would be doing is rejecting the use of GA on our blog, then most probably not as many people will notice it compared to writing about the real dangers, and making these reports available to the general public, thus locatable with the help of search engines and high rankings. And as mentioned above, the alternatives have their pros and cons as well.

  • September 27, 2008 at 11:26 am

    @Stephan
    first, thanks for reading and providing feedback!

    In the post I have advocated for a boycott of the Analytics service as a possible way for rising awareness about Google’s role in the contemporary information eco-system of the web.

    As you rightly say, GA does not constitute a privacy threat for us and our visitors more than many other services we use daily do. I am indeed not that interested in framing the question in terms of privacy. I believe that this term lead us to focus only on the individual implications of surveillance-systems while often prevents us from considering the collective, economical and social ones.

    So when we think to a term such as “database of intentions”, its probably interesting also to ask ourselves:
    * what is the political potential of such knowledge in terms of decision making, economic planning and governance of media systems?
    * What happens to the web economy when a single player controls at the same time the search/indexing/ranking of information, the largest part of the advertisement market, and eventually is also the best analyst of information consumption behaviors?

Leave a Reply