Search Engine Comparison for Three Controversial People: Greta Thunberg, Donald Trump and Boris Johnson
Alicja Zak, Elise Olthof, Vânia Ferreira and Zdzisław Heydel
Introduction
This project, for which we have taken the perspective that “raw data is an oxymoron”, conveyed by Lisa Gitelman, examines the different query results between three search engines: Google, Bing, and DuckDuckGo. Google was chosen first as it is the most commonly used search engine worldwide (Rieder and Sire 196), whereas Bing, owned by Microsoft, is typically less popular. DuckDuckGo is another alternative to Google, founded eight years ago with the goal of protecting users’ privacy and to circumvent activity tracking. Perhaps unknown to many, DuckDuckGo is now gaining more prominence as a competitor to Google. David Pogue, a technology writer and TV science presenter, recently recommended DuckDuckGo in an October New York Times article, saying that “its search results often aren’t as useful as Google’s, but it’s advertised not to track you or your searches.”
What can we say about search engine specific ranking cultures in the presentation of search query results when comparing three different online search engines? We decided to use search queries that were recently controversial in the context of current politics and online debates: our initial search idea began with Greta Thunberg, the Swedish teen climate activist who has been making headlines worldwide for her outspoken speeches both at the United Nations and at climate marches propelled by her. We decided to expand the search to include two more controversial figures with influence in political discourse and public opinion: Boris Johnson, the prime minister of the United Kingdom (UK), and Donald Trump, the current president of the United States of America (USA).
Algorithms have been gaining increasing scholarly attention because of their ability to influence decision making (Rieder et al. 51). Algorithms are increasingly part of our everyday lives. Therefore, they play an important role in our information selection and decision processes (Gillespie 167), which is exactly what these search engines do. Big tech companies rely on algorithms, and have been making it harder to uncover the logic behind them as that would jeopardize their market position.
By conducting a sentiment analysis on the first 100 query results next to an inquiry into the collection of the top 20 search engine results data for five days (October 9 – 13 2019) on the aforementioned three controversial figures, we tried to audit the three different search engines’ algorithmic preferences.
The search outcome that came up usually displayed less than 20 results on the first page which, considering that most people are likely to not go past the first page, limits their options even more. Therefore, we can say that the order of the results is an example of the nudge effect, which refers to placing objects in a certain order to encourage a certain type of behavior (Yeung 118). Karen Yeung uses an example of placing the salad in front of the lasagna in order to encourage healthy eating (118). This effect can be applied on a larger scale to internet search results and to the algorithms that are at work when news appears in one’s Facebook feed. Yeung argues that the size and power of big data, is an example of a ‘hypernudge’, referring to a nudge on a macro scale, impacting a wide audience (122).
It is important to note that the order of these search results can change based on the user, and can change over time. If one is signed into their Google account while engaging in a Google search, the user’s current and previous behavioral data gets stored and the search engine may show personalized results based on predictive and recommending algorithms. This is an example of ‘algorithmic identity’, a concept introduced by John Cheney-Lippold, demonstrating how users’ behaviors and interactions with other users are tracked, logged and monitored to construct an online identity for that user, making it easier to predict their behaviors and to display personalized results based on predictive algorithms (167).
Tarleton Gillespie in his text “The Relevance of Algorithms” speaks of algorithmic objectivity and the responsibility that search engines have to be transparent with their users. However, as “systems that have grown so complex that no Google engineer fully understands them—operate.” (Morozov quoted in Gillespie 181), one could argue that transparency is impossible, at the moment. Along these lines, Rieder et al. show in their analysis of YouTube ranking cultures how the platform’s search results are heavily influenced by both platform and issue vernaculars (63).
By researching search query results on different search engines we hope to unveil important cues employed by each one’s algorithms, which may help in understanding the ways their ranking cultures work and how these might differ per search engine. An important question to ask is how these search engines determine what is worthy to be listed first. In conducting this project we hope to discern if there were any notable similarities or differences among the three search engines.
Methodology
For five days, from 9 to 13 October 2019, the search query data of Google, Bing and DuckDuckGo were manually collected for Greta Thunberg, Donald Trump and Boris Johnson in two different ways in order to determine whether any patterns would arise in the engines’ presentations of time and results’ sources, and language sentiment. First, the top twenty results of each search engine when querying each person separately were collected. The results collected consisted of the date, source, and source URL. In an attempt to remain impartial, the browser was set to private (incognito), the search engine preferences were set to US English, and the region set to the USA. The results were collected in the Google Creative Collaborative Suite and analyzed with the use of its spreadsheets, pivot tables, and charts.
Secondly, a sentiment analysis with the Python library VADER was conducted on a daily basis on the top 100 results of each search engine per search query. The results were “scraped” using the Search Engine Scraper tool of the Digital Methods Initiative (DMI), a tool for the collection of search engine query data. The results are collected each day between 10 and 11 o’clock in the morning for consistent results. According to VADER’s creators C.J. Hutto and Eric Gilbert, a sentiment analysis “is an active area of study in the field of natural language processing that analyzes people’s opinions, sentiments, evaluations, attitudes, and emotions via the computational treatment of subjectivity in text” (Hutto and Gilbert 216). After scraping the results they were imported to the Jupyter Notebook using Python3 in order to generate the sentiment analysis. VADER provides a polarity score of +1 (positive) to -1 (negative). The outcomes were subsequently ordered into a table and imported into a visualization tool in order to create a graph that shows the development of sentiment per search engine over a 5-day period. By analyzing the data in a Jupyter Notebook the data is available and accessible for others and future analyses.
Analysis
In the following section we analyze the temporal dimension, frequency, and sentiment for the collected results. The graphs below show a visualization of the temporal dimensions of the collected sources:
Figure 1. Results shown based on their temporality for query Boris Johnson between 09-10-2019 – 13-10-2019.
Figure 2. Results shown based on their temporality for query Greta Thunberg between 09-10-2019 – 13-10-2019.
Figure 3. Results shown based on their temporality for query Donald Trump between 09-10-2019 – 13-10-2019.
As the above graphs show, the most prominent results refer to the dates of the data collection, with special emphasis on the dates between the 9th and the 11th, with DuckDuckGo also showcasing high results dating back to October 1st. Indeed, DuckDuckGo was the search engine to present older results, dating back to 31 May 2019 for Boris Johnson to October 2018 for Greta Thunberg. However, for Donald Trump the results remain recent with the oldest source only going back to September 2019. What this shows us is the way different engines maintain different time frames, where especially Google seems to weigh in more heavily on recency than the others.
Greta Thunberg
Throughout the experiment, each search engine delivered substantially different results. In the case of Google, the most frequent results were The Guardian (15%) and Twitter (11%). The Guardian being referenced as a reliable source (mediabiasfactcheck.com) helps Google in providing valuable facts and information. Everyday The Guardian was high in the results lists, linking to two general websites, the first one about Greta Thunberg in general, and the second one about widely understood environmental issues. Twitter was frequently present in the query outcomes, also because Google provides a separate section for Tweets.
Looking at the statistics of Bing’s most frequent results, Bing Videos is the lead with 21% of all the outcomes, although admittedly its own video platform showcases videos from other providers. Bing provides a separate section for the videos, so that every search provides a number of different videos regarding the topic. Bing displays results from Greta Thunberg’s previous speeches, and it is worth nothing that while she was traveling throughout the US, the videos would change, with the newest ones replacing the older.
In the case of DuckDuckGo, The Guardian takes the lead in being the most frequent result of the search process (11%), the second being YouTube. DuckDuckGo does not provide a separate section for videos, however usually there were two separate links to YouTube videos within the first 20 results.
Donald Trump
In the searches for Boris and Trump, Google and DuckDuck Go’s top result was Twitter. In the case of Greta Thunberg, Twitter was prominent as well, being the second most popular site. While conducting the search for Trump, his official website, and his luxury real estate page come up in the top results, later followed by news outlets such as ABC News, CNN, the Guardian. This is in comparison with Boris Johnson and Greta, who do not have their own personal or company websites in the top results.The search results for Trump do not go farther than two months back, as the first entry was from September 8th. This is most likely due to the fact that this is an active political figure, and all the top results are updated by the minute. The oldest source was DuckDuckGo, which was the search engine with the oldest results for all three searches. The DuckDuckGo algorithm might work by determining relevance based on word matches, whereas Google appears to define relevance as recency.
Google deemed ‘relevance’ to be connected with more recent articles, whereas DuckDuckGo matches the relevant search results based on the search terms, displaying articles that are very ‘pertinent’ but not necessarily the most recent.
Boris Johnson
In the aggregated results for all the three search engines, Twitter and The Guardian stand out as the most frequent results for the query, with 12% and 10% respectively, followed by the Financial Times with 8% of the total results. We must consider that Boris Johnson has been handling the Brexit negotiations since he became Prime Minister of Britain in July 2019, therefore also dealing with its financial implications (especially when London has been occupying a position as one of the most important financial hubs worldwide), which seems to justify the presence of Financial Times as one of the top 20 results for the query.
On the other hand, Twitter coming up as the most frequent result, and with Facebook coming up for all search engines, suggests what other researchers have previously concluded: a tendency to make use of social media platforms as a source for news (Shearer and Grieco). Bing was the search engine with the most varied results, with 35 different results, ranging from Twitter (most frequent) to a more ‘esoteric’ Sun Signs. In regards to this point, DuckDuckGo also showed more ‘gossip’ results, with Famous People, Fame Chain and CelebsMoney making 12% of the total results, which when aggregated, surpasses any of the other results, for instance the 10% for Twitter, one of the most prominent.
Bing gives an important spotlight to American sources (led by Fox News), however it also shows some diversity showcasing British, Irish, and Canadian sources. Google showed no tendency towards American sources (privileging British news sources) and had the least varied pool of sources, with only 21 results (DuckDuckGo offered 28 out of the possible 100), showing concentration in specific sources, which were Twitter and The Guardian, with 20% and 17% respectively, contributing the most to the aforementioned aggregated results of the three search engines, in this regard. Ten percent of the results belonged to YouTube, owned by Google, and here we would like to point out to conflicts of interest of a “multi-sided market” (Rieder and Sire).
From the data we can also extract a tendency towards biographical contextualization, in this case with profiling from Wikipedia, with more relevant expression on Google (7%), complemented by the official page of Boris Johnson on Parliament UK. DuckDuckGo also features Boris’ page on Parliament UK (5%), whereas Bing prioritizes Encyclopaedia Britannica (5%).
Search engines layout comparison
Figure 4. Screenshot of the query results for Boris Johnson in Google (14-10-2019)
Google search results are divided into two main sections: the main feed that shows the most recent events along with the actual online findings and a smaller section on the right-hand side that displays the most basic information about the query, especially in the case of browsing information about an individual. The ‘box’ on the right-hand side provides sample pictures from Google Images, simple biographical information, references to a Wikipedia page, as well as to social media platforms and recommendations. There is also a section that shows people for whom “people also search for” to make a reference to similar queries, and encourage people to browse through Google even more.
The section displayed on the left side of the page begins with the “Top Stories” section, presenting the most recent news regarding the search topic. Depending on the day, the news is usually a couple of hours to a number of days old. Below that, before the actual list of search results, there are two more sections: videos (mainly referencing to Google-owned YouTube), and Twitter’s most recent posts regarding the subject. From that point down there is a list of pages that Google’s search engine algorithm treated as the most relevant regarding that topic.
Figure 5. Screenshot of the query results for Boris Johnson in Bing (14-10-2019)
Bing’s layout is similar to Google’s in a way that it is also divided into sections of metadata about the search and the section with news, Tweets and the query results, with a box for a simple biography follows. Bing tends to find and present more news than Google does: the news feed part is composed of more elements, having the most recent one displayed as a bigger headline. Interestingly enough, in the case of Boris Johnson, Bing’s display was slightly different than in other queries: the news feed and current tweets would arrive in bigger size above all the other results. Additionally, Bing owns its own video platform and when it shows videos related to the subject, the content is embedded within Bing’s platform, no matter who the author was. The Bing Videos widget is usually positioned high in the hierarchy of the search results.
Figure 6. Screenshot of the query results for Boris Johnson in DuckDuckGo (14-10-2019)
As for DuckDuckGo, its style resides more in a simple layout with minimalistic widgets. It consists of two sections: on the right-hand side there is an information box about the subject, and a feed of information that begins with the most recent news. The search results usually start with Wikipedia, followed by social media platforms, then leading to articles and other websites. In order to see the first twenty results in DuckDuckGo, the second page of results has to be opened.
Sentiment Analysis
The sentiment analysis is based on scraped results of each subject separately. The results are ordered by date, search engine. Per subject a graph shows the outcomes of the sentiment analysis over time (Fig. 7, 8 and 9).
Figure 7. Search query sentiment for Boris Johnson from 09-10-2019 – 13-10-2019.
Figure 8. Search query sentiment for Greta Thunberg from 09-10-2019 – 13-10-2019.
Figure 9. Search query sentiment for Donald Trump from 09-10-2019 – 13-10-2019.
Looking at the above graphs a few things are to be noted. First, it seems that different engines show results that are constant in their sentiment about the subject. For example, the results on Greta Thunberg show consistency in Bing’s results as these are seemingly more negative during the period shown while the other engines show more fluctuating results (see fig. 8). In Donald Trump’s results Google shows a consistency in negative results while DuckDuckGo and Bing are mainly positive (see fig. 9). While the time frame remains too short for drawing hard conclusions, it does give us some clues about the algorithms and search engine specific ranking cultures. As the above figures show, search engines sometimes seem to stick with a certain sentiment concerning a subject that is specific to the search engine.
Conclusion
During our research we acknowledged some limitations. Our data was limited to a five-day timespan in which we extracted information at different times of the day. In the future we conduct this study over a longer period of time, with more standardized practices. Moreover, personalized results were not completely ruled out even though we took measures to prevent this type of bias.
As with any technical tool, we must add the human element to the interpretation of the data, which inherently carries biases. For example, Greta is associated with the climate ‘crisis’, and the word crisis already has a negative connotation. An article featuring the words ‘crisis’ and ‘fight’ might be deemed negative by the tool, even if it would be deemed ‘positive’ by a human reader. We should be aware of and study these biases in search engines to better understand the media we are using, scrutinizing the information that has been presented by thinking critically about what lies behind it.
More research is still needed to be able to produce coherent, large and long-term datasets for analysis. However, our research can be seen as an early step in that direction and should be used as a starting point for further inquiry. Given the rapid developments on the internet, we are still unaware of all of the ways that companies are using our data and fully unclear about algorithms since many companies are increasingly choosing to privatize their data. As this research shows, search engines order data in ways that influence how the user is informed. Different search engines conduct different ranking logics that differ in sentiment, time and source type and it remains necessary to be wary of these differences when using these applications.
References
Cheney-Lippold, John. “A New Algorithmic Identity: Soft Biopolitics and the Modulation of Control.” Theory, Culture & Society, 28.6 (2011): 164-81. doi:10.1177/0263276411424420.
Gillespie, Tarleton. “The Relevance of Algorithms.” Media Technologies, ed. by Tarleton Gillespie et al., Cambridge – Massachusetts & London – England: The MIT Press, 2014. 167-194. doi:10.7551/mitpress/9780262525374.003.0009.
Gitelman, Lisa (ed.). “Raw Data” is an Oxymoron, Cambridge – Massachusetts & London – England: The MIT Press, 2013. 1-14.
Hutto, C. J., and Eric Gilbert. “VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text.” Eighth International AAAI Conference on Weblogs and Social Media. 2014. www.aaai.org. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8109.
Media Bias Factcheck. https://mediabiasfactcheck.com/the-guardian/. Accessed 24 October 2019.
Pogue, David. “10 Tips to Avoid Leaving Tracks Around the Internet.” New York Times. 4 October 2019. https://www.nytimes.com/2019/10/04/smarter-living/10-tips-internet-privacy-crowdwise.html. Accessed 24 October 2019.
Rieder, Bernhard, and Ariadna Matamoros-Fernández & Oscar Coromina. “From Ranking Algorithms to ‘Ranking Cultures’: Investigating the Modulation of Visibility in YouTube Search Results.” Convergence. 24.1(2018): 50-68. doi: 10.1177/1354856517736982
Rieder, Bernhard, and Guillaume Sire. “Conflicts of Interest and Incentives to Bias: A Microeconomic Critique of Google’s Tangled Position on the Web.” new media & society 16.2 (2014): 195-211. DOI: 10.1177/1461444813481195
Shearer, Elisa, and Elizabeth Grieco. “Americans Are Wary of the Role Social Media Sites Play in Delivering the News.” Journalism.org. Pew Research Center. 2 October 2019. https://www.journalism.org/2019/10/02/americans-are-wary-of-the-role-social-media-sites-play-in-delivering-the-news/. Accessed 23 October 2019.
Weltevrede, Esther, and Anne Helmond & Carolin Gerlitz. “The Politics of Real-time: A Device Perspective on Social Media Platforms and Search Engines.” Theory, Culture and Society. 31.6(2014): 125-150. doi:10.1177/0263276414537318.
Yeung, Karen. “‘Hypernudge’: Big Data as a Mode of Regulation by Design.” Information, Communication & Society. 20.1(2017): 118-136. Doi: 10.1080/1369118X.2016.1186713