Big data behind China’s “Health Code” system

On: October 3, 2021
Print Friendly, PDF & Email
About Kristen Zheng


According to the 47th Statistical Report on the Development Status of the Internet in China released by the China Internet Network Information Centre, China’s “Health Code” launched during the COVID-19 epidemic prevention and control has been used by nearly 900 million people in total, with more than 40 billion uses. As a common application of digital epidemic prevention, China’s “Health Code” system has played an important role in epidemic prevention and control and has become one of the highlights of China’s digital epidemic response to the COVID -19 outbreak. Some Western countries, such as the Netherlands, have started to launch a similar initiative, the CoronaCheck-app, by their governments to fight the COVID-19 pandemic.

However, behind the seemingly simple “Health Code”, big data plays an important role time and time again. According to Boyd and Crawford, big data could be understood as the capacity search, aggregate and cross-reference large datasets. (Boyd and Crawford, 2012) In the Health Code system, big data is used to search and cross-reference personal data such as health data and location information of all people, to identify personal movements within 14 days and to achieve accurate and traceable epidemic prevention and control. While Big Data has done a remarkable job in controlling the COVID-19 outbreak in China, Big data has also given rise to utopian and dystopian discussions about it. So, this blog will focus on the application of big data under the COVID-19 pandemic and discuss the prominence of big data in the fight against the COVID-19 pandemic and some of the issues that may arise.

How big data can help beat the COVID-19 pandemic

Since its development in early February 2020, China’s Health Code system has proliferated rapidly within a month and is being used widely across the country. During the information collection phase, the Health Code system requires applicants to fill in personal information, including their health status, travel history, place of residence, and whether they have been exposed to suspected or confirmed COVID-19 patients. In addition to the information declared by the applicant, the Health Code system also accesses transportation data from civil aviation, railways and buses. Through real-time comparison, update and comprehensive research and judgement by big data, the Health Code System generates red, yellow or green QR codes to accurately grasp the movement trajectory of citizens and accurately identify high-risk groups. Thus, in the COVID-19 outbreak, big data provided governments and epidemiologists with the ability to “collect and analyse data with unprecedented breadth, depth and scale”. (Lazer et al, 2019)

In addition, big data plays an important role in risk assessment. Through the collection of user data, as well as the analysis of crowd travel data and the comparison of data such as crowd gathering heat maps, the analytical and predictive capabilities of big data help the China CDC to model epidemic data and predict risks such as travel.

The China Health Code system can therefore be seen as a prime example of the application of big data. Its use overcomes the fragmented and scattered nature of traditional data collection by consolidating information collected from applicants, merging and analysing data from various platforms, and generating three-coloured health codes, which has greatly assisted the Chinese government in implementing policies in epidemic prevention and control more efficiently and accurately.

However, in the context of the normalisation of COVID-19 outbreak prevention and control, the Health Code may become an electronic health credential that accompanies individuals in the long term. There are several problems with the Health Code system that cannot be ignored and even urgently need to be addressed.

The problems of the Health code system

The collection of big data often has the problem of leakage of user privacy and the associated misuse of data. This may be a common problem with the Internet, but it is more strongly reflected in the Health Code system. When applicants apply for a health code, they give up some of their data rights by filling in a large amount of personal information to obtain a Health Code. In particular, most health codes are developed by local governments in collaboration with the relevant internet companies. When users choose to trust the government, which delegates some of its rights to companies, and these companies “have the tools and access” to each applicant’s information (Boyd and Crawford, 2012), it becomes crucial that these companies are firm in protecting user data from disclosure and malicious misuse in storage and use.

At the same time, as the operation of the Health Code is largely undertaken by companies, there is the potential for the strongest to get stronger when these relevant internet companies have a large amount of data. The COVID-19 epidemic has strengthened the big data hierarchy, with giant internet companies working with governments to once again get hold of large amounts of data, creating a kind of Matthew effect that makes it more difficult for smaller companies to catch up with the big players.

In addition, the Health Codes has given rise to discussions of excessive government regulation. In addition to the information declared by the user, the Health Code system also accesses transport data such as civil aviation, railways, roads and buses. With these data, the government can not only find out how long and how often a person has entered and stayed in an infected area, but also pinpoint specific towns or streets, etc. While it is necessary for prevention and control purposes to pinpoint and prevent populations at the time of the COVID-19 outbreak. However, when epidemic prevention is likely to become a regular feature, it becomes an open question whether data surveillance will also become a part of everyday life at the same time. From targeted surveillance in the early days of the COVID-19 to universal surveillance now (Andrejevic & Gates 2014), the balance between government data surveillance and the protection of individual privacy has also become an issue to be considered in the future Health Code system.

In short, the power and potential of this little ‘health code’ are extraordinary and should not be underestimated. Not only does it give us an intuitive sense of the powerful analytical, integrative and predictive capabilities of big data and its enormous role in supporting COVID-19 prevention; it also shows us the problems that can arise from big data, such as the misuse of data, the complications that can result from uneven data distribution, and the questioning of data and government surveillance. As the normalisation of the ‘health code’ becomes more likely, it becomes increasingly important to discuss and reflect on it in depth.


Andrejevic, M. and Gates, K., 2014. Big data surveillance: Introduction. Surveillance & Society, 12(2), pp.185-196.

Boyd, D. and Crawford, K., 2012. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, 15(5), pp.662-679. 2021. 47th Statistical Report on the Development of the Internet in China released by the China Internet Network Information Centera. CNNIC发布第47次《中国互联网络发展状况统计报告》-中共中央网络安全和信息化委员会办公室. [online] Available at: <> [Accessed 25 September 2021].

Fang, X. and Yan, F., 2020. A study of the digital social governance challenges behind the “Health Code” system. China National Knowledge Infrastructure.. 方兴东 and 严 峰. “健康码” 背后的数字社会治理挑战研究. 中国知网, 2020.

Haleem, A., Javaid, M., Khan, I.H. and Vaishya, R., 2020. Significant applications of big data in COVID-19 pandemic. Indian journal of orthopaedics, 54(4), pp.526-528.

Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M. and Jebara, T., 2009. Social science. Computational social science. Science (New York, NY), 323(5915), pp.721-723. 2021. Going back to work, what the Hainan Qr colour health codes mean for you – [online] Available at: <> [Accessed 26 September 2021].

Comments are closed.