Dan Cohen

Wikipedia vs. Encyclopaedia Britannica for Digital Research
Posted to Google and the World of Search on 30 January 2006, 12:07 PM EST

Google and the World of SearchIn a prior post I argued that the recent coverage of Wikipedia has focused too much on one aspect of the online reference source's openness—the ability of anyone to edit any article—and not enough on another aspect of Wikipedia's openness—the ability of anyone to download or copy the entire contents of its database and use it in virtually any way they want (with some commercial exceptions). I speculated that, as I discovered in my data-mining work with H-Bot, which uses Wikipedia in its algorithms, having an open and free resource such as this could be very important for future digital research—e.g., finding all of the documents about the first President Bush in a giant, untagged corpus on the American presidency. For a piece I'm writing for D-Lib Magazine, I decided to test this theory by pulling out significant keywords and phrases from matching articles in Wikipedia and the Encyclopaedia Britannica on George H. W. Bush to see if one was better than the other for this purpose. Which resource is better? Here are the unedited term lists, derived by running plain text versions of each article through Yahoo's Term Extraction web service. Vote on which one you think is a better profile, and I'll reveal which list belongs to which reference work later this week.

Article #1
president bush
saddam hussein
fall of the berlin wall
tiananmen square
thanksgiving day
american troops
manuel noriega
halabja
invasion of panama
gulf war
help
saudi arabia
united nations
berlin wall

Article #2
president george bush
george bush
mikhail gorbachev
soviet union
collapse
reunification of germany
thurgood marshall
union
clarence thomas
joint chiefs of staff
cold war
manuel antonio noriega
iraq
george
nonaggression pact
david h souter
antonio noriega
president george



xml Subscribe to this blog

Comments or questions? Contact me. [Editor's note: This blog post was written before August 2007, when I converted this blog from my own blogging software to WordPress and added commenting to the end of posts.]

Visit this blog's home page for the latest posts.