Behind the SERP: Google’s Imperfect Algorithm
Google is undoubtedly the leader in the search engine market. Google’s algorithm, based on crowdsourcing, was the major breakthrough in the whole web search industry. Users consulting Google’s search result pages have shown a strong bias towards the first results on the page. This article aims to explore the way Google works and whether this trust in Google’s ability to provide the best result is misplaced or not.
Google is by far the most used search engine, it is used for 67% of the queries in the US and around 90% in Europe. These data don’t come as a surprise since the Mountain View company is considered by many “The Search Engine” and people are, in general, very satisfied with the results provided. The phenomenon reach can be illustrated by the influence it had on our language, it’s getting more and more common to hear “Google it” instead of “search for it on the Internet”.
Google has clearly done a good job in building trust in his audience. A recent study by Pan et al. (2007) has demonstrated how users tend to trust the search engine’s ability to show the best and most relevant results in the highest positions of the SERP (Search Engine Results Page).
Given that people trust Google’s algorithm so much, is it truly reliable? How does it work? The search engine ranks pages using, most likely, a couple of hundred of different factors which can be divided into two main categories: on-page and off-page. On-page factors are the ones that can be extrapolated by the webpage itself like the title, the content, the text-to-code ratio, etc.. Off-page factors are based on the concept of crowdsourcing: the assumption is that a page with more links, and in particular more authoritative links, should have the priority in the SERP. It’s worth mentioning that links are the main but not the only off-page ranking factor, other data like the number of social shares, the server location, etc. are probably used as a ranking factor by Google’s algorithm.
This is the main characteristic of the algorithm: it is based on the popularity of the webpage among other internet users. It works very well and the company keeps adjusting the weight of the various factors in order to provide even better results. The problem arises when people start thinking that Google is infallible and fail to use this tool with a critical mindset. For example, Google results are constantly being manipulated by online businesses trying to improve their rankings in order to increase traffic to their websites and, consequently, their profits. This practice is called SEO (Search Engine Optimization) and, in my opinion, just acknowledging its existence puts search results under a different light.
Others have already highlighted the potential dangers of this phenomenon: “[There is] an increased probability of misinformation, particularly in circumstances of topic naiveté” (Pan et al., 2007) and “the potential for misguided trust to exacerbate what others already fear regarding the non-egalitarian distribution of information , whether as a result of economic resources, indexing policies, or algorithms” (Hindman et al., 2003; Introna & Nissenbaum, 2000).
Google is a very widespread tool, incredibly useful and with an untapped potential. But, as good as it is, it is still imperfect and users should have a better understanding of the way it works in order to make use of search engines correctly. In conclusion, my hope is that knowledge regarding search engines will spread, mitigating the possible negative consequences of limited understanding of and excessive trust in Google and other search engines.
Hindman M., Tsioutsiouliklis K., & Johnson, J. A. (2003). “Googlearchy:” How a few heavily-linked sites dominate the Web. Retrieved September 06, 2013 from http://www.cs.princeton.edu/~kt/mpsa03.pdf
Introna, L. D., & Nissenbaum, H. (2000). Shaping the Web: Why the politics of search engines matters. The Information Society, 16(3), 169–185.
Pan, Bing, et al. “In Google we trust: Users’ decisions on rank, position, and relevance.” Journal of Computer‐Mediated Communication 12.3 (2007): 801-823.
Author: Andrea Fiorentini (Google+)