Reflections on Genealogy Data in the Era of Dataveillance

On: October 4, 2021
Print Friendly, PDF & Email


Living in a highly digitally mediated world, it is of crucial importance to critically consider the issues of Bid Data. The rapid ongoing process of datafication, where most of the social reality is transformed into quantifiable pieces of information, gave rise to academic debates about the ways in which this data can be collected and used. In their article “CRITICAL QUESTIONS FOR BIG DATA Provocations for a cultural, technological, and scholarly phenomenon”, danah boyd and Kate Crawford refer to Big Data as a “troubling manifestation of Big Brother”, where it is “enabling invasions of privacy, decreased civil freedoms, and increased state and corporate control.” (boyd and Crawford, 2012, p. 198) Indeed, it seems that a number of institutions are keen to extract and collect some kind of value from these large amounts of data sets. This gives rise to issues of surveillance, or more accurately referred to as dataveillance, where citizens are being watched by means of online communication technologies through constant (meta)data usage (Raley 2013). This involves the complex power relationships between corporate platforms and their users (van Dijk 2014). The unequal power relationship between the two raises questions regarding the actual incentives of corporate businesses and the government’s usage of data. In this light, it is important to consider boyd and Crawford’s critical approach on analysing how the data is being used and who gets access to it, keeping in mind ethical and privacy issues at stake.

In her work “Raw Data” is an Oxymoron”, Lisa Gitelman points out that “if data are somehow subject to us, we are also subject to data” (p. 2). Further, she brings up the notion that data do not come “raw”, but they are already “cooked” in some form, meaning that data gains its value through interpretation and various prevalent personal and historical contexts (Gitelman, 2013).

The use of genealogy databases and personal DNA to trace one’s ancestors is not a new trend and has been existing long since the late 90s. Some of the most known online genealogy platforms include, for example, and MyHeritage, which accounts for hosting 48 million family trees (Contreras et al., 2020). On their platform, AncestryDNA claims that it “gives you much more than just the places you are from”.  Thus, the platform is considered to carry good intentions of helping its users to find out more about their ethnicity, family origins, and even find faraway relatives. However, these platforms carry information of significant importance and therefore this could give rise to a number of various issues, as receiving access to, controlling, and storing this type of personal information can severely impact individuals’ lives.

Online genealogy database work according to a ‘network’ effects principle. This means that any biological relative of the person whose DNA has been tested could be possibly identified in the system. Hence, the information about one person automatically effects the other even without their awareness (O’Leary, 2018). Moreover, the user can find his possible relatives by means of platform’s assessment of genetic distance. This implies that two individuals with the nearest or most related data to each other could potentially receive information about each other (O’Leary, 2018). Nevertheless, the companies understand the high value of such data and so most of it is protected by the online Terms of Use.

There were a number of issues, where even though the companies justify their users of privacy, personal information was nevertheless revealed. One of the most notable issues is the use of genealogy websites and their consumer genetic data for criminal investigations, where law enforcement allows access to the personal data of the users. A well-known ‘Golden State Killer’ case in 2018, where the killer was found through publicly accessible database GEDmatch by means of “networked” effects, raised public debates about privacy and use of DNA data. Even though the GSK did not undergo a DNA test, it was still possible to trace him through his family DNA in a public database (Schwab et al., 2018). Another case, where the individuals were identified because of commercial genealogy databases includes John Bohannon’s being able to identify anonymous DNA donors for scientific studies (Bohannon, 2013). All of these cases show how such systems pose a serious surveillance threat. Lastly, it came to be known that Ancestry’s DNA data was bought by an American investment management company Blackstone and even though they claim that they won’t access the private DNA profiles, one can never know how it can play out in the future (David Lazarus, 2021).

Our personal DNA is the most valuable piece of data about ourselves that could be shared, it is part of our identity. Therefore, users must keep in mind that a simple search for a relative through public commercial genealogy databases can have significant consequences for them in the future. With the rise of datafication and increased tendencies for dataveillance, it is important keep this data protected and private. There is a digital divide (boyd and Crawfard, 2012), where commercial genealogy databases have clear power over large sets of personal data. Yet, only the future will show how this data could be potentially employed by governments and in what specific historical contexts they can be “cooked”.  In Chinese western province, for example, the police is already tracking the Muslim Uyghur minority of population by means of national DNA database, as well as facial scanners and video cameras (Moreau, 2019). I believe that the use and interpretation of this kind of data can lead to complete surveillance, or dataveillance of the state, completely violating personal boundaries. Therefore, it is more important now than ever before to protect and preserve personal data from external use and unforeseen interpretation that may change society at large.


“Ancestry® | Genealogy, Family Trees & Family History Records.” Ancestry, -, Accessed 25 Sept. 2021.

Bohannon, J. “Genealogy Databases Enable Naming of Anonymous DNA Donors.” Science, vol. 339, no. 6117, 2013, p. 262. Crossref, doi:10.1126/science.339.6117.262.

Boyd, Danah, and Kate Crawford. “CRITICAL QUESTIONS FOR BIG DATA.” Information, Communication & Society, vol. 15, no. 5, 2012, pp. 662–79. Crossref, doi:10.1080/1369118x.2012.678878.

Contreras, Jorge L., et al. “Legal Terms of Use and Public Genealogy Websites.” Journal of Law and the Biosciences, vol. 7, no. 1, 2020. Crossref, doi:10.1093/jlb/lsaa063.

Dijck, Jose van. “Datafication, Dataism and Dataveillance: Big Data between Scientific Paradigm and Ideology.” Surveillance & Society, vol. 12, no. 2, 2014, pp. 197–208. Crossref, doi:10.24908/ss.v12i2.4776.

Lazarus, David. “Why Buy Ancestry’s DNA Data If You Don’t Plan to Use It?” Los Angeles Times, 13 Apr. 2021,

Moreau, Yves. “Crack down on Genomic Surveillance.” Nature, vol. 576, no. 7785, 2019, pp. 36–38. Crossref, doi:10.1038/d41586-019-03687-x.

O’Leary, Daniel E. “DNA Mining and Genealogical Information Systems: Not Just for Finding Family Ethnicity.” Intelligent Systems in Accounting, Finance and Management, vol. 25, no. 4, 2018, pp. 190–96. Crossref, doi:10.1002/isaf.1439.

Raley, R. 2013. Dataveillance and Countervailance. In: ‘Raw Data’ is an Oxymoron, ed. L. Gitelman, 121-1 46. Cambridge, MA: MIT Press.  

“Raw Data” Is an Oxymoron by Lisa Gitelman (Jan 25 2013). The MIT Press, 2013.

Schwab, Abraham P., et al. “Genomic Privacy.” Clinical Chemistry, vol. 64, no. 12, 2018, pp. 1696–703. Crossref, doi:10.1373/clinchem.2018.289512.

Leave a Reply