Towards a methodology for Web-based investigative reporting

On: April 3, 2010
Print Friendly
About Jelle Kamsma
I am a MA student of New Media at the University of Amsterdam. I have a bachelor degree in Media and Culture. After three years mostly focussing on film and visual culture I've made the switch to new media. Mostly because I'm interested in journalism and how it has to adapt to the new media. I just finished an internship at ANP Video, a Dutch press agency that makes short video-items for different newssites. I'm curious to find out how new media theories will aply to my experiences in the practical field.


The news industry is in the midst of a ‘perfect storm’: the economic downturn causes an increasing decline in advertising revenues, there is a structural and irreversible shift of advertising to the Internet, and ‘fragmented audiences [are] moving increasingly to non-linear consumption which is less susceptible to commercial impact and therefore less valuable to advertisers.’ (Barnett, 2009: 1) As consumers and advertisers move to the web it becomes apparent that the commercial model that underpins the gathering, production and distribution of news faces significant economic pressures. Media enterprises are desperately seeking a new model to make money off the Web. The basic challenge is two-fold: the consumers expect news content to be free, whilst advertisers are expecting much lower rates for advertising. The digital revolution forces publishers to distribute the news into ‘bite-size, multi-media friendly packages that can attract enough click to sustain the interest of advertisers.’ (Currah, 2009: 1) This is a worrying development since newsrooms are being sucked into the ‘clickstream’ and determine their editorial agenda by what proves successful on the Web. The underlying logic seems to be that news has to be fast, cheap and easy to comprehend. Journalist Loretta Tofani explains: ‘The dominant message, amid buyouts and pink slips, is produce, produce, produce! The result is that reporters tend to produce more good and mediocre stories at the expense of the great and vital stories, which are still out there.’ (Tofani, 2001: 64)

This might also explain why investigative reporting is the main victim of the digital revolution in the news industry. Besides being inextricably linked to the general crisis in journalism, investigative reporting has some unique vulnerabilities of its own. As Edward Wasserman explains:

It’s expensive, offers uncertain payback, ties up resources that could be used in more conventionally productive ways, fans staff jealousies, offends powerful constituencies (including touchy readers), invites litigation, and usually comes from the most endangered class in the newsroom, the senior reporters whose ranks are being thinned aggressively through forced retirement. (Wasserman, 2008: 7)

However, the aim of this article is not to discuss the negative influences that the Internet is supposed to have on journalism in general and on investigative reporting in particular. What I suggest on the other hand is to look at new ways in which the Web can be used and appropriated for investigative reporting. That is to move away from the more common approach of looking at the Web as solely a place of distribution of news content. Instead my interest lies in examining how Web-based content can be used in the gathering of information and the production of investigative news stories. The overall purpose of this essay is to describe contours of a methodology for investigative reporting that incorporates the concepts, strategies and techniques linked to the Web. Methodology here refers broadly to the logic of analysis and the methods that are in use within a given field. By building upon the well-established methodologies of social science and Digital Methods I hope to make a beginning with the methodology of, what I call, Web-based investigative reporting. In the end it is up to the practitioners of the field, the investigative journalists, to determine the requirements, procedures, and rules of analysis in conducting research on the Web. The intention of this essay is not to prescribe what is right and wrong but to start a debate on what should constitute a shared methodology for Web-based investigative reporting (WBIR).


The changing nature of investigative reporting

It is often argued that freedom of the press is essential for a flourishing society. It is an argument that extends back to the position set forth by John Milton in the late 1600s that freedom of the press is an acknowledgement of the ‘peoples’ right to know’ and the need for a ‘marketplace of ideas’. (Altschull, 1990) Investigative journalism can be seen as a logical consequence of these political theories. A strong press can investigate and unveil the actions, policies and performances of those who are in power. Dennis and Ismach defined investigative reporting by stating that it should focus on ‘all sectors of society that require examination, explanation or airing, but that are hidden from the public view.’ (Dennis & Ismach, 1981: 81) Thanks to the almost impenetrable complexity of contemporary institutions it has been argued that the task of investigative reporters is now more difficult that ever.

It is difficult to analyze and understand the incredible range of agencies and bureaucratic systems that have been devised for managing the problems with which society must cope. Therefore, the potential for inefficient, irresponsible, unethical, or even outrageously illegal behavior on the part of those we trust has never been higher. Correspondingly, the importance of the press as the eyes and ears of the public in monitoring governmental activities has never been greater and its task has never been more difficult. (DeFleur, 1997: 9)

This complexity has with the introduction of computers and later the Internet only increased. New skills are required to access and understand all the information available. This complexity became manifest during the very first instance the computer was used in the production of news. The introduction of computers into the newsroom took place in the early days of television. One of the earliest examples of computer-assisted news took place during the presidential election of 1952 between Dwight D. Eisenhower and Adlai E. Stevenson. A computer was programmed to predict the outcome of the election on basis of the early returns. With only seven percent of the votes counted it predicted a landslide victory for Eisenhower which no one believed since it was supposed to be a very close contest. Eventually the final count was unbelievably close to the early predictions made by the computer. (Cox, 2000) Around the 1980s newsrooms also adopted online databases. When these developments came together, reporters began to have access to ‘resources that would change the nature of news reporting considerably.’ (DeFleur, 1997: 37) Besides increasing the amount of information available to develop a news story, it also made reporters more familiar with software for spreadsheet analysis, database construction and statistical manipulation. Skills that would become incredibly important since governments and other institutions also began to use computers for record keeping.

This paved the way for computer-assisted investigative reporting; a form of journalism that applies computers to the practice of investigative reporting. For example, computers were used to analyze electronic records from governments in order to find newsworthy situations. One of the pioneers in this field, David Burnham, developed a conceptual framework to use when investigating the records of a government or public agency. When conducting a data analysis Burnham asked two key questions: (a) What is the stated purpose or mission of the agency? (b) What problems or procedures prevent the agency from achieving its stated purpose or goal? (Walker, 1990) Philip Meyer, another pioneer in the field of database journalism, carried this concept on to the second step by stating that many stories derived from databases can be developed by comparing groups or subgroups as a simple first step. (Meyer, 1991) With these methods journalists were able to test popular theories. For example, following the civil riots in Detroit in 1967 survey data from African Americans who lived in the riot area were analyzed by computer. Popular believe was that those who participated in the riots did so because they were at the bottom of economic ladder and poorly educated. The analysis, however, showed that people who attended college were just as likely to participate in the riots as those who failed to finish high school. (DeFleur, 1997) The number of stories that have been uncovered through database analysis has since only increased and is to this day an important part of investigative journalism.

The introduction of the Internet marked another major development in the journalistic profession, especially in investigative reporting. While at first the Internet was mainly used among journalist to communicate with each other, it quickly became more common to use the Internet for conducting research. (Cox, 2000) Barry Sussman called the Web the ‘investigative reporter’s tool.’ As he explains:

What the Web does incomparably well is to provide information —instantly— on just about anything. Want to know about where there have been cases of bird flu? Or what can go wrong with voting machines? […] Googling not only provides answers, but it connects reporters and anyone else with possible places and sources to go to find out more. (Sussman, 2008: 45)

As the pressure on journalists to produce only increases, the Web can proof to be an incredibly useful tool. Naturally Sussman understands that information on the Web can be unreliable as he for example points out that a number of edits on Wikipedia that have been traced back to the CIA. Yet he argues that it is up to the reporter to determine how trustworthy the information is. ‘There are plenty of reliable, dedicated groups and individuals responsibly sharing important information through the Web.’ (Sussman, 2008: 45) Therefore Sussman sees using information on the Web for journalistic purposes as one of the few ways in which news producers can continue and maintain their essential watchdog role. However, he also points out that few investigative assignments should be completed online and he underlines the importance of working with actual sources; people who have stories to tell and documents to back up what they know.

I agree with Sussman that investigative stories still need to be about real people and real people’s lives. Yet my approach to the Web in respect to investigative reporting is a bit more ambitious. Instead of seeing the Web as basically a large database of information on society, I would like to argue that the Web in fact is part of society. This sheds a whole new light on the ways in which the Web can be used in investigative reporting.


Seeing society through the Web

Within the field of new media new approaches and methodologies are being developed for the study of culture and science. Chief-editor of Wired magazine Chris Anderson argued in 2008 for a methodological turn in science. In the article “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete” he claims that since it is possible to analyze information at petabyte scale, the scientific models have become obsolete.

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves. (Anderson, 2008)

New media scholar Lev Manovich uses in his Cultural Analytics program a similar approach but instead of science he focuses on culture. With the application of new computational techniques to large quantities of data provided by the Internet, he hopes to usher a revolution in the study of culture. Manovich has announced Cultural Analytics as a way to analyze the huge quantities of user-generated content on the Internet as well as other cultural artifacts. He claims that these huge quantities of data will turn ‘culture into data’ and make possible the study of ‘culture as data.’ (Manovich, 2007: 4) Instead of a qualitative content analysis, Manovich’ focus lies on finding patterns. By visualizing the results it becomes possible to see ‘real-time cultural flows around the world’ (Manovich, 2009: 3)

Another approach to study culture through computational means is Digital Methods, a research program started by Richard Rogers. He proposes a research practice which ‘grounds claims about cultural change and societal conditions in online dynamics.’ (Rogers, 2009: 5) He does not consider the Web a virtual space separated from the ‘real world’. Communications scholar Steve Jones was one of the first to propose to move beyond the perspective of the Internet as a realm apart. (Jones, 1999) The Virtual Society? (1997-2002) further critiqued the digital divide model with a number of empirical studies. They argued in relation with the real/virtual dichotomy that virtual interactions not substitute but rather supplement the ‘real’. Identities are grounded in both the offline and the online. (Woolgar, 2002) Building upon these ideas Richard Rogers proposed Digital Methods:

It concerns a shift in the kinds of questions put to the study of the Internet. The Internet is employed as a site of research for far more than just online culture. The issue no longer is how much of society and culture is online, but rather how to diagnose cultural change and societal conditions using the Internet. The conceptual point of departure for the research program is the recognition that the Internet is not only an object of study, but also a source. (Rogers, 2009: 8)

The goal of Digital Methods is therefore to reconceptualize the relation between online and offline culture. It introduces the term online groundedness as the effort to make claims about cultural and societal change grounded in the online.

Both Digital Methods and Cultural Analytics deal with large quantities of information. This is an aspect of both methodologies that has to be critically examined. Because unlike Anderson claims, numbers do not always speak for themselves. As is argued in a response to Andersons bold claims, it is very easy to find false correlations when dealing with petabytes of data. ‘In a huge amount of data, you expect to find tons of patterns. Many of those patterns will be statistical coincidences. And it can be very difficult to identify the ones that are.’(Chu-Carroll, 2008) This critique especially concerns Cultural Analytics since datasets are approached in a very open way without formulating a research question or hypothesis. It is all about the patterns. Digital Methods is different in this respect because it is not following the quantity but the methods of the Web. Digital Methods focuses on how ‘information, knowledge and sociality [on the Web] are organized by recommender systems – algorithms and scripts that prepare and serve up orders of URLs, media files, friends, etc.’ (Rogers, 2009: 12) This requires examining how objects like the permalink, IP address, URL, etc. are handled on the Web.

In this sense Digital Methods is more useful in developing a methodology for Web-based investigative reporting. Computer-generated data do not stand by themselves. They reveal facts that could not easily have come to light by other means but do require thorough interpretation and reflection. Moreover, they provide leads that must be followed by more conventional journalistic means. One example of how Digital Methods can manifest itself in investigative reporting is an article in the Dutch newspaper NRC Handelsblad on the rise of extremist language in the Netherlands. (Dohmen, 2007) Instead of analyzing pamphlets or interviewing extremists they analyzed 150 right-extremist sites over the course of a decade. They used the Wayback Machine of the Internet Archive (Internet Archive, 2010) to find and compare different versions of the sites. They found that right-wing sites were increasingly employing more extremist language. By looking at the online, through the analysis of particular sets of archived sites, they were able to make statements about cultural changes. Their claims were online grounded as the online became the baseline against which society was judged. However, their conclusions were followed by more traditional means such as interviews with key players like the director of the discrimination hotline. It is important to realize that the Internet Archive is a tool that can be uses with great effectiveness but must be supplemented by a more traditional approach.


Towards a methodology

An adequate definition of Web-based investigative reporting should begin by stating that it is not replacement of traditional investigative journalism but an extension. The characteristics of investigative reporting like creative and analytical thinking are still of vital importance for the success of the method. Traditional investigative reporting is based on two principles: (a) investigative reporting concerns matters that are important to the public, and (b) these matters are not easily discovered. (Ullman & Colbert, 1991) These two principles are also the foundation for WBIR. Broadly speaking, WBIR can be defined as the application of computers to collect facts about society that are grounded in the online. WBIR is not solely the analysis of the content of the Web but goes in fact much deeper by also critically examining the object and its methods. For example in the analysis of archived websites as done by NRC Handelsblad, requires a critical reflection on how these sites are archived and organized. Web archiving scholar Niels Brügger argued that the Internet, unlike other media, does not exist in a form that can be easily archived. He suggest we should always examine who does the archiving and for what purpose. (Brügger, 2005) In this sense WBIR is related to the basic methodology of social science research. Both are basically quantitative modes of doing research that also make use of qualitative information. In both fields the research is guided by questions that are to be answered by specific datasets, rather than by a random tour through numbers. Although social science and WBIR have some features in common, there are also distinct differences. The primary goal of social science research is to develop or verify a certain theory of cause and effect in order to explain the consequences of human behavior. WBIR, on the other hand, is simply to seek news and is less interested in the explanation of scientific theories.

Before describing a methodology for WBIR it might be a good idea to describe what is actually meant by the term. In the philosophy of science a methodology is defined as a set of constructs about how research in a particular field should be conducted. It refers to the logic of analysis used in a given field. Philosopher Maurice Natanson explained this as follows:

By “methodology,” I understand the underlying conceptual framework in terms of which concrete studies in history, sociology, economics and the like are carried out, and in terms of which they receive a general rationale. (Natanson, 1963: 271)

The aim of this essay is to describe this rationale by describing a methodology uniquely designed for WBIR. By describing, explaining and justifying the steps of analysis, I hope to make to make the beginning of a formal methodology for WBIR.

Gain a clear understanding of the object and its methods – The first step in each Web-based research should be to gain a clear understanding of the object of study and its methods. The way information, knowledge and sociality is organized on the Web can be enormously complex. So before anything else a reporter must ask himself questions like why certain data is privileged over other. This requires knowledge on how certain web objects organize their content by recommender systems. For example, a growing trend in news articles is to cite Google search results in support of certain statements. However, before these search results can be interpreted, one should realize that these results are organized by PageRank. The logarithm of Google should be examined before the search results.

Determine your goals – After the nature of the object has been examined, the reporter can begin with determining the goals of the investigation. This can be in the form of either a research question or a hypothesis. In case of the article on right-wing sites the question could have been how extremist language has evolved in the course of a decade. The reason why it is important to formulate a clear research question or hypothesis is that when dealing with huge quantities of data it is fairly easy to find false correlations. Patterns can pop up anywhere in the analysis of large datasets and it is very difficult to distinguish between coincidental and meaningful ones.

Strategically examine the data – After the goals are determined, the reporter can begin with collecting and analyzing the data. To achieve the goals a carefully devised plan should be made. This can be done in a great number of ways of which some will be discussed here. Bear in mind that the following list is by no means complete.

  • The strategy used in the article in NRC Handelsblad is to examine a trend over a certain time. If the results show a clear upward or downward trend, a newsworthy issue may have been discovered. This issue can further explored by experts familiar with the phenomena. Especially the Internet Archive is great source for this strategy since the Wayback Machine privileges the single site histories.
  • Another approach could be to examine an unusual situations or phenomena. An anomaly in the dataset could be the foundation for an unusual news story. For example, the high ranking of an anti-Semitic website in the Google search results for the query “Jew” (Grimmelman, 2009) could be examined. Another great example of this strategy is the use of the Wikiscanner software to reveal who is behind certain Wikipedia edits. It turned out that the Dutch Royal family was behind some censoring Wikipedia edits in the entry on Princess Mabel’s relationship with the Dutch drug lord Klaas Bruinsma.
  • A reporter could make a comparison between before and after an event. There are special collections of sites around certain events available on the Web. Sites concerning the attacks on 9/11 have been archived and may contain all sorts of cues on how the world dealt with 9/11 and its aftermath.

Supplement your findings with traditional means of investigative reporting – Once the datasets are examined the result should be interpreted. This requires creative and analytical thinking not unlike in traditional investigative reporting. Results should me supplemented with reactions from key figures and made comprehensible for the public. The research should be explained in such ways that the public can readily understand it without prior knowledge. In order to successfully present your conclusions, your results could be visualized. Putting the big numbers into graphs and charts can help people to grasp the large quantities. It can also help in showing correlations and patterns.

Before the results of a WBIR study can be presented to the public, the investigative reporter must take in consideration a number of responsibilities. Some of these derive from the general responsibilities of any investigative reporter. Reporters are responsible for getting the facts straight and the conclusions reached from those fact must represent a fair and balanced picture of reality. These very basic responsibilities apply to all forms of investigative reporting or journalism for that matter. However, WBIR implies a set of new concerns and responsibilities. As said before the Web is complex and it is not difficult to draw false conclusions from it. The above described steps should be of assistance in conducting a responsible WBIR investigation but it is also important to make these responsibilities explicit. First of all the reporter should always verify the data used in the analysis. The Web is invested with disinformation. Barry Sussman describes in his article on digital journalism an entire disinformation industry ‘consisting of corporate funded think tanks, phony grassroots groups, co-opted citizens organizations, experts for hire, fake news and fake reporters, government officials on all levels who lie and promote disinformation.’ (Susman, 2008: 46) He notes, however, that the Internet provides journalists also with reliable sources to help sort out what is real and solid from what is fake and disingenuous. It is up to the reporter to make responsible decisions in this respect. Another unique responsibility in WBIR stems from the fact that information and its organization is always changing on the Web. It is therefore important to keep records of the analyzed data to ensure that facts can always be (re)checked. One last responsibility in WBIR has also been expressed in the first step of my methodology but bears repeating. Understanding the nature of the studied object as well as the tools used for analysis is of vital importance and should never be neglected.



As was explained in the introduction, the profession of journalism is under enormous pressure of the market. News organizations have to deal with budget cuts which leads to journalists losing their job. The same amount of news has to be made with less contributors. Especially investigative reporting suffers from this tendency. It is expensive, ties up resources that could be used in more productive ways and has an uncertain outcome. Because of these reasons, investigative reporters are often the first to go. New media and especially the Internet are regularly blamed for the demise of journalism. My claim, however, is that the Internet makes possible new, promising and cost-efficient ways of doing journalism and especially investigative reporting. By providing a historical perspective on the use of new technology in the field of investigative reporting and by examining some of the ways in which Internet research is conducted, I sketched the contours of a methodology for what I call Web-based investigative reporting. This article is not meant to be seen as a definite way of doing Web-based investigative reporting. It is merely the start of a debate on how such reporting should be done. In the end it is up to the practitioners in the field to decide on which methods are acceptable. Important to remember in this respect is the fact that WBIR is an extension of the traditional means of investigative reporting. Old values and responsibilities will and must remain intact.. Examples of WBIR are still scarce but already show what is possible when using the Web as object of study. Although I am aware of the limitations of this study, both the concepts and the methodology should be further examined, I do believe that this essay clears the path for exiting new ways of doing journalism.



Altschull, Herman. From Milton to McLuhan: The Ideas Behind American Journalism. White Plains: Longman, 1990.

Anderson, Chris. “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.” Wired Magazine 16.07 (2008). < > (Accessed January 14, 2010)

Barnett, Steven. “Journalism, Democracy and the Public Interest: rethinking media pluralism for the Digital Age.” RISJ Working Essay, 2009. <> (Accessed January 12, 2010)

Brügger, N. Archiving Websites: General Considerations and Strategies. Centre for Internet Research, Aarhus, 2005.

Chu-Carroll, Mark. “Petabyte Scale Data-Analysis and the Scientific Method.” Weblog. July 4, 2008. <> (Accessed January 15, 2010)

Cox, Melisma. “The development of computer-assisted reporting.” Unpublished essay, 2000. <> (Accessed January 13, 2010)

Currah, Andrew. “Navigating the Crisis in Local and Regional News: A Critical Review of Solutions.” RISJ Working Essay, 2009. <> (Accessed January 12, 2010)

DeFleur, Margaret. Computer-assisted investigative reporting: development and methodology. Mahwah: Lawrence Erlbaum Associates, 1997.

Dennis, Everette. Arnold Ismach. Reporting Processes and Practices. Belmont: Wadsworth Publishing, 1981.

Dohmen, Joep. “Opkomst en ondergang van extreemrechtse sites.” NRC Handelsblad. August 25, 2007. < > (Accessed January 15, 2010)

Grimmelmann, J. “The Google Dilemma.” New York Law School Law Review 53 (2009): 939-950.

Internet Archive. <> (Accessed January 15, 2010)

Jones, S. “Studying the Net: Intricacies and Issues.” In: S. Jones (ed.), Doing Internet Research: Critical Issues and Methods for Examining the Net. London: Sage, 1999: 1-28.

Manovich, Lev. “Cultural Analytics: Visualizing Cultural Patterns in the Era of More Media.” DOMUS (2009). <> (Accessed January 14, 2010)

Manovich, Lev. “White essay: Cultural Analytics: Analysis and Visualizations of Large Cultural Data Sets.” 2007. <> (Accessed January 14, 2010)

Meyer, Philip. “Reporting in the 21st Century.” Presentation at AEJMC in Montreal, 1992. In: DeFleur, Margaret. Computer-assisted investigative reporting: development and methodology. Mahwah: Lawrence Erlbaum Associates, 1997.

Natanson, Maurice. Philosophy of the Social Sciences. New York: Random House, 1963.

Rogers, R. The end of the virtual – Digital Methods. Amsterdam: Amsterdam University Press, 2009.

Sussman, Barry. “Digital Journalism: Will It Work for Investigative Journalism?” Nieman Reports 62.1 (2008): 45-47.

Tofani, Loretta. “Investigative Journalism Can Still Thrive at Newsessays.” Nieman Reports 55.2 (2001): 64.

Ullman, John. Jan Colbert. The Reporter’s Handbook: An Investigator’s Guide to Documents and Techniques. New York: St. Martin’s Press, 1991.

Walker, Ruth. “Computer Databases Can Be Valuable Sources.” Christian Science Monitor, September 25 (1990): 14.

Wasserman, Edward. “Investigative Reporting: Strategies for Its Survival.” Nieman Reports 62.3 (2008): 7-10.

Woolgar, S. “Five Rules of Virtuality.” In: S. Woolgar (ed.), Virtual Society? Technology, Cyberbole, Reality. Oxford: Oxford University Press, 2002: 1-22.

3 Responses to “Towards a methodology for Web-based investigative reporting”
Leave a Reply