Wikipedia’s Datasource Google

On: September 29, 2008

Comments »

About Marijn de Vries Hoogerwerff
Marijn de Vries Hoogerwerff is a New Media theorist, Web researcher and Internet entrepreneur. In 1999 he started working as IT professional at the broadband Internet Service Provider @home (a franchise of the ISP and search engine company Excite@Home). After working here for over eight years he decided to pursue a study in New Media at the University of Amsterdam. During this study he has been an active member of the Digital Methods Initiative (DMI) research group, working together in a strong team of designers, programmers and theorists to develop new Web-specific methods and tools for doing online research and has written in depth about Internet censorship research, code consciousness and cyber-cosmopolitanism. Next to several stand-alone projects he also started up CYBERLIFE, focusing on building Web-applications, sites and tools, Web hosting and doing Web research. After receiving his Master degree in New Media he continued his contributions to the DMI, has helped organize the Society of the Query conference for the Institute of Network Cultures and has been a thesis supervisor at the University of Applied Sciences (HvA) for Interactive Media. His current company, nochii BV, focusses on utilizing theoretical knowledge and practical experience to help companies get a better understanding about the Web, their network and the space they occupy and its relation to the offline. He holds the strong believe that the Web, both as infrastructure and as concept, can aid in dealing with the increasing complexity of the world (both online as offline) and the relating problematics.

Website
http://nochii.nl

Today Erinc Salor gave an excellent presentation called Redefining Encyclopaedia’s, about his research on Britannica, the history of encyclopaedia’s and its relation to Wikipedia. He feels there is too much speculation going on at the current moment. Most critical post and articles are based on assumption and not actual reseach or experience. He presents a very extensive and relevant history of the encyclopedia, from Aristotle who introduced categories on papyrus, to the Speculum Maius in the middle ages (the divine book of nature) to the first alphabetical Cyclopaedia in the 17th century to its French translation Encyclopédie. This last one is what later became Brittanica.

The question he posed is whether Wikipedia can be seen as the next evolution or redefinition of these list of Encyclopaedias. His own PHD research covers a broad field, but in the context of this presentation he limits it to basic but very relevant questions. He explains that within Wikipedia there is a inclusion/exclusion debate going on about what and in what detail topics and articles should be accepted. Salor however explains that for him the result of this discussion is of no particular relevance, more-so the fact that the discussion itself can take place.

Inspired by the presentation I start taking mental notes and feel excited to continue my personal saga of the first Wikipedia entry. Just the previous night I had added an article in the English wikipedia about Geert Lovink. Stressed by limited time and the realistic fear that my article would be removed within the first hours, I decided to edit Geert Lovink’s personal biography and use that as my starting point. It took exactly one minute before my article was tagged for deletion on the grounds of copyright infringement.

Although I was under the impression that his blog was public domain, the first though was that the wikipedia bot was reffering to the resemblance of the article with this website. This was however not the case, the CorenSearchBot message stated that ‘it appears to include a substantial copy of http://creativecapital.nl/speakers_geertlovink.php’. This page has itself copied the biography of Geert Lovink’s website on its own website, and thus I made a desperate attempt to convince the person behind the bot to stop deletion (funny enough the creative capital conference website is all about issues surrounding the public domain). The next morning I was glad to see my article still online, only to discover it being removed an hour later by a much more grumpy bot stating ‘G12: Blatant copyright infringement’.

Analysing non-removed entries made me realize that a very small article, correctly formatted to comply with Wikipedia laws and politics whould probably work much better. Leaving room for others to contribute or edit it myself at a point that the article is of the hotlist of the main wikibots. The speed of the edit and the (faulty) copyright reference of the bots however did make me think about their own source of reverence. The only source providing that much speed and that could return the Creative Capital Conference website would be a search engine. In the past we have already see that bots and moderators use Google as source of truth, as in the case of the Spinplant, where deletion was done as there was no reference found in Google. There has been considerable debate already about this issue, but the thing that strikes my is the enormous dependence of Wikipedia on Google. Although people keep stressing the user generated content aspect of Wikipedia, isn’t it much more plausible that most of the content is actually the structuring of the hypertextual data of the internet, made accessible by search engines like Google? Is it a grand project to structure the objective knowledge of Google? The morphing of the Google directory into Wikipedia?

Needles to say this calls for some digging into. Don’t want to make the mistake of doing to many assumtions as Salor has warned about. The whole Wikipedia story however has gave me some nice things to think about and some motivation in making a Wikipedia entry which will not be removed.

Tags: bots, britannica, google, presentation, wikipedia

Comments are closed.