Open Data
There is an open data movement afoot, now, around the world. (Berners-Lee, 2010)
Tim Berners-Lee is optimistic in his 2010 Ted talk The year open data went worldwide. Berners-Lee is one of the advocates for open data, he is trying to archieve that governments, companies and communities put their data sets online. When all this data is accessible, online communities and individuals can do new things with it. They can be re-used and recombined and maybe reveal unexpected connections. As an example Berners-Lee shows a map created in 2008 by a lawyer. The map showed a neighborhood of Zanesville, Ohio. The lawyer investigated the correlation between water service and race. By combining different sources he found out that most houses that had no access to public water services were occupied by non-white people. There was a lawsuit and the residents of the neighborhood were awarded with nearly $11 million by a federal jury.
The mash-up map that Berners-Lee showed made me think of the Cholera map made by John Snow in 1854. Snow was a doctor who was convinced that Cholera was spread through the contamination of food or water. At that time, most people opposed his theory. When Cholera struck England again in 1854, Snow investigated the epidemic. Soho was one of the areas which was hit hard, in three days there were 127 casualties. Snow plotted the locations of death on a map. The map showed that most deaths were nearby a water pump at Broadway street. Snow convinced with this visual evidence the local authority to remove the handle of the pump. Within days the number of fatalities dropped significantly.
Both these examples show the social significance of recombining data. Snow’s map was conceived in an unsanitary time were no one had water in their own houses, people depended on wells that everyone used. The map of Zanesville was created in the 2000s, approximately 150 years later, and shows an area where some houses still have no access to public water service. But even though the map look the same, there is a great distinction in how these maps came into being. Snow created his map with investigative field work, he interviewed the victim families and examined a water example of the contaminated pump. The Zanesville was created with the help of technology, the information of databases were recombined and digitally mapped. While the Zanesville map is a child of the open data movement, Snow’s map is clearly not.
The database often plays an important part in open data. We can see this for example with the Afghan War Diary. Wikileaks disclosed in the summer of 2010 an extraordinary amount of reports that covered the war in Afghanistan from 2004 to 2010. Several weeks earlier the information was obtained by the New York Times, London’s Guardian and Der Spiegel. These three mainstream news corporations were able to mine the data for news and analysis. David Leigh, one of investigative reporters of the Guardian, tells that the research done on the project was mainly facilitated by the Internet and the database:
The extraordinary thing about this investigation was that something couldn’t have happened before the Internet age. Because, first of all the leak itself could not have happened. This was a database of 92.000 odd entries in a database that is then leaked over the Internet system […] we couldn’t have investigated this stuff on paper because it is just too much of it, 92.000 files. Because we were able to build a database and interrogate it by using keywords of free text searches, we could actually make sense of this massive material. So the whole thing is like a product of the Internet world. (Media Talk, 29-07-2010)
Leigh does have a point here. Without the database and computation power there is no way to make sense out of this enormous set of data. Tim O’Reilly wrote in his article “What is Web 2.0” about the database. He wrote the following:
Database management is a core competency of Web 2.0 companies, so much so that we have sometimes referred to these applications as “infoware” rather than merely software. (O’Reilly, 2005)
O’Reilly also asks the loaded question, “Who owns the data?” Matthew Dames remarks that this question addresses a major policy issue; data management is also about the ownership of data and the privacy issues that arise from this ownership. Copyright is an important issue in contemporary society and especially in the debate over open data. In most cases copyright is the arch enemy of open data. But in some cases copyright can ensure that data stays in the domain of the commons. This kind of copyright is often termed copyleft. The programmer Richard Stallman described it as followed:
Copyleft uses copyright law, but flips it over to serve the opposite of its usual purpose: instead of a means of privatizing software, it becomes a means of keeping software free. (Stallman, 1999)
Copyleft is, as you can assemble from the quote above, has it’s roots in the open source movement (although Stallman will say ‘free software movement’). The open source community and the open data community share the same attitude. They are focused on free information and with the help of copyleft, open information stays open. In the last years, special licenses have been created for databases. Which is great because this area can be from a legal perspective incomprehensible.
Personally I love the fact that data is becoming more open. Initiatives like the datablog of the Guardian, Pachube and data.gov.uk are really exciting for programmers, journalists, information visualizers and citizens. Open data can bring transparency and help discover unknown ‘errors‘ in society. It can also create lively communities that become more involved with governmental issues. It is probably in the best interest of institutions and governments to unlock the databases because, as we have seen with the Afghan War Diary, hackers will otherwise pry the lock. And, in most cases, they won’t be gentle.
Reference
Berners-Lee, T. (2010). < http://www.ted.com/talks/lang/eng/tim_berners_lee_the_year_open_data_went_worldwide.html>
Dames, K.M. (2009). “Data is the New Oil” in Information Today sep 2009, volume 26 issue 8, 14-15
Davie, T (2010). <http://www.guardian.co.uk/media/audio/2010/jul/29/media-talk-podcast-wikilieaks-hbo-channel-five>
Stallman, R. (1999). “The GNU Operating System and the Free Software Movement,” in Chris DiBona, Sam Ockman and Mark Stone, eds. Open Sources: Voices from the Open Source Revolution. Sebastopol: O’Reilly, 53-70.