How “Trump COVID-19” Provides Different Search Results on Google and Baidu?
Since the dramatical development of the internet, different companies researched and developed various kinds of search engines, such as Baidu, Google, Yahoo, DuckDuckGo and so on. Dijck stated that in contemporary society, “search engines have become indispensable tools in the construction of scholarly knowledge” (574), and they can assist people with navigating “massive databases of information, or the entire web” as well (Gillespie 167). However, as Chinese students studying abroad, we found that the media environment in China is quite different from other countries. By searching the same keyword, the results on Baidu and Google display differently, especially when it comes to political news. Thus, different search engines could provide users with varying answers to what they have searched. This project will focus on such differences between Google and Baidu. For this purpose, “trump covid-19” and “特朗普 新冠” (the Chinese version) will be the example keyword used in this project. Comparisons and analysis on the titles, sources and content orientations of the news will be operated to seek the similarities and differences between these two search engines.
If search bias does exist, what should we do?
Search engines seem like returning the result depending on what was queried automatically, so people often assume that search results are neutral and without bias. According to Purcell, Brenner and Rainie, 66% of search engine users think search engines are a fair and unbiased source of information. However, after comparing the query results of Google and Baidu, Jiang found out the overlap rate of two search engines is only 6.8%, so she concluded that different search engines present different results and different social realities (212). Grimmelmann and Goldman’s study found that “every search engine design choice necessarily and unavoidably reflects normative values. Thus, the term ‘search neutrality’ implies a Platonic ideal of a search engine that cannot be achieved” (Goldman 107).
Even Eric Goldman confirms the bias is unavoidable, but he suggests that search engine bias is “the beneficial consequence of search engines optimising content for their users” (121). Take the Hummingbird algorithm as an example; Google upgraded the search algorithm to Hummingbird in 2013, which means faster and more precise as its name. “Hummingbird is all about delivering search results that reflect internet users’ experiences on websites” (Weekly Marketing News 2013), indicating different users will see different results even if they input the same keyword. “This essentially creates a much smarter search engine which can decipher intent to a greater degree and return results which are more in line with user intent” (Lin and Yazdanifard 52).
However, Jean−Noël Jeanneney warned against Google’s claim to “organise the world’s information”. He believes that “how a search engine selects, organises, and presents the information can destroy or invisibly distort the context” (ix). Algorithms on search engines “can lead to a distortion of our perception” (Merkel). Search bias is reflected when search engines rank their contents higher on purpose (Jiang 1091), which will cause unfair competition and information monopoly. Both Baidu and Google have faced multiple allegations of unfair competition.
Now it seems that everyone has the equal right to obtain information from the internet, but it is a kind of illusion of knowledge democratization provided by search engines. The control of knowledge by politics and commercials is becoming more subtle and difficult to detect. Few net citizens will compare the search results on different search engines to verify the authenticity and neutrality of the information, even though multiple search engines are available to searchers, and few barriers to switching between them (Goldman 197). Google is the search market leader in most countries all over the world, but it partially exited from China due to the censorship system of the Chinese government and moved its servers from Mainland China to Hong Kong in 2010 (Jiang 213). This resulted in Baidu becoming the largest search engine in China. One is a superior global search engine; the other is the largest Chinese search engine. They are two typical representatives for comparison. We are aiming to display the differences and biases on Google and Baidu by conducting empirical studies. When users aware the search bias does exist, they will have the “capacity to scrutinize and think critically” (Lovink), which may release them from the information manipulation and inequality.
How can we research on Google and Baidu?
The research project aimed to find out the differences and samenesses between Google and Baidu. Consequently, the comparative method is significant. The comparative method refers to “the systematic analysis of a small number of cases” (Collier 107). In this research, different aspects of Baidu and Google have been compared, such as the news titles (is there any same news titles shown on these two search engines. In this way, we can find out the news overlap). In order to keep the data as valid as possible, except the keywords “trump covid-19” have been controlled, some other variables are also controlled. For instance, URL (only search on https://www.google.co.uk/), the location (we set the UK as the location), and time (which is October 2nd to October 4th, because Trump is infected on October 2nd and the search frequency is typically high during these days). We limited the search results as well; only the first five pages of search results on Baidu and Google have been recorded. Since we searched both English and Chinese versions of “trump covid-19” on Google and Baidu, four groups of data have been collected. We recorded 144 English titles on Google and 115 English titles on Baidu. Moreover, we piled 158 Chinese titles on Google and 150 Chinese titles on Baidu (the number of search results displayed on Baidu and Google is different, so the data size is different). We then made comparisons between the same language data sets.
Additionally, content analysis is another methodology to collect qualitative data in this research. People can understand it as a tool “of analysing written, verbal or visual communication messages” (Elo and Kyngas 107). Data can be collected from books, speeches, films, and even web contents (Luo). Our research utilised titles of search results, and we will analyse particular words. In the initial stage of the study, we used the tag cloud to explore the Chinese and English search results on the two search engines within three days. The results showed that the high-frequency words in the title mainly include: “US presidential election”, “US stocks, market”, “International Impact”, and “Therapeutic Plans”. Simultaneously, in terms of web content, both search engines have a certain amount of newsflash that cannot be ignored. That is, web articles state the fact that Trump is sick without more critical analysis or in-depth discussion of other fields. Based on this initial discovery, we divide relevant topics into six major categories when we conduct content analysis, which is “American presidential election”, “Other countries”, “Economy”, “COVID-19”, “Newsflash”, and “Others”. By doing this, we can find out the orientation of contents, for instance, which news is more political, and which tend to be more entertaining.
We should also consider the ethical issue in this project. There are no participants taking part in this research; therefore, harmfulness and informed consent will not be involved. However, some news or articles are published by people with their names (rather than just a website). Consequently, confidentiality is a problem of this study. Furthermore, because all the information can be found on the internet, in order to protect the privacy of publishers or reporters of the news, we will not present the full title of our data. Thus, we can keep them anonymous.
What is the Overlap?
Firstly, we compared the overlap among the search results of Baidu and Google for the same keywords. Searching for the keyword “Trump Covid-19”, out of a total of 115 pairs of search results among Baidu and Google, three pairs of titles (including contents) are the same, the overlap among these two search engines as 2.6%. Querying for the keyword “特朗普 新冠” (the Chinese version), out of a total of 150 pairs of search results on Baidu and Google, 11 pairs of titles (include contents) are same, it has an overlap of 7.3% among the search results of Baidu and Google. Overall, there is low overlap between the search results of these two search engines in Chinese and English; it implies that different search engines yield different search results, people obtain some information that can vary in significant ways from other search engines. “Declining overlap between search engines is an ongoing trend among English and Chinese language search engines. Various reasons may have contributed to it, not the least of which may be an enlarged Web, variant methods of crawling, indexing, ranking and political filtering” (Jiang 224). And the coverage of search engines, the relevance algorithm of search results, and the frequency of search engine index updates can also lead to different search results on different search engines, which causes the low overlap rate of search results among various search engines (Wang and Liu 380). Based on Google and Baidu, Google is famous for its PageRank algorithm. However, when it upgrades to Google Hummingbird, its focus moves to correlation from indexing and crawling. In terms of Baidu, its products’ content would be displayed with higher priority in the ranking of search results, to fulfil the combination of information and commercial advertisement, it makes that there are low overlap and little ranking similarity among Google and Baidu’s results. Besides, we also observe that the overlap rate of first-page search results is zero among Baidu and Google, as Spink & Jansen point out that “web search engine’s first page results are primarily unique, meaning the other engines did not return the same result on the first result page for a given query” (1388),
it shows that search results rankings are different. However, there is overlap among the search results of Baidu and Google. For instance, one news from BBC may display on the first page of Google but the third page of Baidu. However, because the sample of this study is small, and sometimes the ranking of search results changes, we cannot make an exact conclusion on this finding.
On the other hand, we also compared the overlap of search results on Baidu and Google separately. Searching for the keyword “Trump Covid-19”, among Baidu’s 115 search results, seven pairs of titles (including contents) are the same, it means that this yields an overlap of 6.1%. Out of a total of 144 search results on Google, four pairs of titles (including contents) are the same. The overlap rate is 2.8 %. Querying for the keyword “特朗普 新冠” (the Chinese version), out of a total of 150 search results on Baidu, 19 pairs of search results or 12.7% are overlapped. Among Google’s 158 search results, 18 pairs of titles are the same; the overlap of the first five result pages search results is 11.4%. Based on these data, we can find that the overlap of search results on Baidu is higher than Google. The overlap of search results in English is lower than Chinese. As Jiang argues that” although search results can vary for a variety of reasons, regional differences are notable. […] This is not only because of politics but also because of linguistic and cultural differences among other things” (142). Google dominates the global search engine; however, Baidu merely faces the Chinese market. In this project, due to the keyword, “Trump Covid-19”, is about the United States current event, Google provides more relevant information and news than Baidu, it leads that the overlap of search results on Google is lower than Baidu.
According to Table 4, the most significant difference of Google and Baidu existed in the American-political-related topics (39.58% vs. 27.85%, 28.95% vs. 16.00%) and COVID-19’s relevance (40.97% vs. 27.85%, 28.07% vs. 24.67%). To some extent, Google showed a greater propensity for pushing websites about the upcoming US presidential election than Baidu, exemplified in several reports on how Trump’s diagnosis has multidimensional political impacts on his campaign and the pandemic. Also, because most of the sources of Google’s were from major international news outlets such as BBC, CNN, etc., which were invisible on Baidu. Wang stated that China has relatively stricter internet control and censorship as the national government has blocked some political, religious, and other sensitive contents online to maintain social stability as well as national security (16). Such circumstance embodies the outcome of China’s own set-up internet border that isolates its netizens from diverse alternatives of transnational information sources and in turn, further narrowing citizens’ global horizons.
On the other hand, “Newsflash” illustrated a salient exposure on Baidu, with 24.56% and 12.28%. Combined with the above, this verifies Goldman’s argument of the varying normative values behind different search engines’ choices (107) as the query results of the two displayed inclinations of comprehensive cosmopolitan political as well as medical knowledge and streamlined national news respectively. To some extent, Fang discussed that the content rendering of search engines is a mechanism for technological and social interaction (35), implicating the influence of personalised recommendation mechanisms based on individuals’ using habits. To illustrate, as of March 2020, 72.4 per cent of the country’s 904 million internet users earned less than ¥5,000 (CNNIC 27), and Google is blocked in mainland China currently. Hence, given the fact that Baidu’s nearly monopoly position in China’s internet search engine and grassroots make up the vast majority of China’s internet user base, the recommended results of Baidu somehow represent the general query demands of Chinese netizens, which reflects the catering for their preferences to brief reports influenced by the fast-paced information consumption. More specifically, with TikTok and Sina Weibo penetrating in the grassroots masses, this indicates a tendency of simplifying every word-loaded information in the most intuitive and concise form. Therefore, in order to customise search engines to be applicable for the mass users in China, Baidu’s query results showed a lower proportion of detailed news while pushing more newsflash than Google’s. Besides, out of 32 relevant topics of “other countries” and “economy” from Baidu, 25 conveyed a sole focus on China, elaborating on the event’s multifaceted meanings for China. This outcome goes some way to confirm the characteristics of impartiality of Baidu’s results as it has a lower tendency to direct Chinese users to content beyond national borders (Jiang 1107). To some extent, this partially derived from the fact that the users’ average literacy quality of Baidu is relatively lower than Googles (considering the grassroots user base) and consequently have a limited focus on local affairs.
Furthermore, in Jiang’s definition, search bias refers to “search engine practices that favour their content at the expense of competitive services” (1091), which can be captured in the visibility of Baidu’s products shown in the two subjects. Table 5 illustrated that while users would unavoidably be exposed to Baidu’s own product contents regardless of languages exemplified Baijiahao, Baidu Baike (encyclopedia) and Baidu Tieba (forum), accounting for 6.14% and 13.33% respectively. Nonetheless, the above is absent from Google’s search platform, with both English and Chinese retrieved results of its competitor’s products are 0%. As a result, two kinds of biases are indicated here: Google’s shielding of rivals’ contents and Baidu’s priority push of its products.
Rethinking search engines
To sum up, search engines are necessary for our daily life since we can find answers to our questions quicker, and access to various information in a more convenient way. However, based on the results our group achieved, the overlap of search results on Baidu and Google is low no matter which language we use. This means that different search engines will provide users with different search results even though the phrase we search is the same. However, accounting to the data, we also find out that there are more same results displaying on Baidu compared to Google. It indicates that Baidu is more likely to provide repetitive search results.
Moreover, our data also demonstrates that Baidu provided less opportunity for netizens to access international news which was published by other countries’ news-press. At the same time, Baidu users are more likely to see a newsflash that gives an abstract to an event rather than more detailed information. Such results reveal that Chinese netizens can obtain less useful information from narrow aspects compared to those netizens who can access Google, and this could cause more problems for the development of Chinese society.
- Luo, Amy. “Content Analysis | A Step-By-Step Guide With Examples”. Scribbr. 2020. 19 October 2020. <https://www.scribbr.com/methodology/content-analysis/>.
- CNNIC, I. “The 45th China Statistical Report on Internet Development.” China Internet Network Information Center (CNNIC), China (2020).
- Collier, David. “The comparative method.” Political Science: The State of Discipline II, Ada W. Finifter, ed., American Political Science Association (1993): 105-119.
- Connolly, Kate. “Angela Merkel: internet search engines are “distorting perception”. The Guardian. 2016. 17 October 2020. <https://www.theguardian.com/world/2016/oct/27/angela-merkel-internet-search-engines-are-distorting-our-perception>.
- Elo, Satu, and Helvi Kyngäs. “The qualitative content analysis process.” Journal of advanced nursing 62.1 (2008): 107-115.
- Gillespie, Tarleton. “The relevance of algorithms.” Media technologies: Essays on communication, materiality, and society 167.2014 (2014): 167.
- Goldman, Eric. “Revisiting Search Engine Bias.” William Mitchell law review 38.1 (2011): 96–110.
- Internet Marketing Company Cyberset Adapts to Google’s New Algorithm, Hummingbird. NewsRX LLC, 2013.
- Jean-Noël Jeanneney, Jean-Noël Jeanneney, and Teresa Lavender Fagan. Google and the Myth of Universal Knowledge: A View from Europe. University of Chicago Press, 2007.
- Jiang, Min. “Search concentration, bias, and parochialism: A comparative study of Google, Baidu, and Jike’s search results from China.” Journal of communication 64.6 (2014): 1088-1110.
- Jiang, Min. “The Business and Politics of Search Engines: A Comparative Study of Baidu and Google’s Search Results of Internet Events in China.” New Media & Society 16.2 (2014): 212–233.
- König, René, and Miriam Rasch. Society of the Query Reader: Reflections on Web Search. Inst. of Network Cultures, 2014.
- Lin, Chauntelle Ong Yi, and Rashad Yazdanifard. “How Google’s new algorithm, Hummingbird, promotes content and inbound marketing.” American journal of industrial and business management 2014 (2014): 51-57.
- Lovink, G. “The society of the query and the Googlization of our lives. A tribute to Joseph Weizenbaum.” Eurozine (2008): 1-7.
- Purcell, Kristen, Lee Rainie, and Joanna Brenner. “Search engine use 2012.” Pew Internet (2012): 1-42.
- Spink, Amanda, etc. “A Study of Results Overlap and Uniqueness among Major Web Search Engines”. Information Processing & Management 42. 5 (2006): 1379–1391.
- van Dijck, José. “Search Engines and the Production of Academic Knowledge.” International Journal of Cultural Studies 13.6 (2010): 574–592. 17 October 2020. <doi:10.1177/1367877910376582>.
- Wang Yiming and Liu Fei, “Study of Results Overlap and Uniqueness Among Major Chinese Web Search Engines”, Journal Of The China Society For Scientific And Technical Information 28.3 (2009): 374-381.
- Wang, Nan. Control of Internet search engines in China–A study on Google and Baidu. Diss. (Master Thesis). Unitec Institute of Technology, 2008. New Zealand, 2008.