The Single Box Humanities Search
Posted to Google and the World of Search on 17 April 2006, 11:42 AM EDT
First, a quick test of Google Scholar and Windows Live Academic. Can either one produce the source of the famous "frontier thesis," probably the best-known thesis in American historiography?


Clearly, the usefulness of these search results are dubious, especially Windows Live Academic (The Political Economy of Land Conflict in the Eastern Brazilian Amazon as the top result?). Why can't these giant companies do better than this for humanities searches?
Obviously, the people designing and building these "academic" search engines are from a distinct subset of academia: computer science and mathematical fields such as physics. So naturally they focus on their own fields first. Both Google Scholar and Windows Live Academic work fairly well if you would like to know about black holes or encryption. Moreover, "scholarship" in these fields generally means articles, not books. Google Scholar and Windows Live Academic are dominated by journal-based publications, though both sometimes show books in their search results. But when Google Scholar does so, these books seem to appear because articles that match the search terms cite these works, not because of the relevance of the text of the books themselves.
In addition, humanities articles aren't as easy as scientific papers to subject to bibliometrics—methods such as citation analysis that reveal the most important or influential articles in a field. Science papers tend to cite many more articles (and fewer books) in a way that makes them subject to extensive recursive analysis. Thus a search on "search" on Google Scholar aptly points a researcher to Sergey Brin's and Larry Page's seminal paper outlining how Google would work, because hundreds of other articles on search technology dutifully refer to that paper in their opening paragraph or footnote.
Most important, however, is the question of open access. Outlets for scientific articles are more open and indexable by search engines than humanities journals. In addition to many major natural and social science journals, CiteSeer (sponsored by Microsoft) and ArXiv.org make hundreds of thousands of articles on computer science, physics, and mathematics freely available. This disparity in openness compared to humanities scholarship is slowly starting to change—the American Historical Review, for instance, recently made all new articles freely available online—but without a concerted effort to open more gates, finding humanities papers through a single search box will remain difficult to achieve. Microsoft claims in its FAQ for Windows Live Academic that it will get around to including better results for subjects like history, but like Google they are going to have a hard time doing that well without open historical resources.
UPDATE [18 April 2006]: Microsoft has contacted me about this post; they are interested in learning more about what humanities scholars expect from a specialized academic search engine.
UPDATE [21 April 2006]: Bill Turkel makes the great point that Google's main search does a much better job than Google Scholar at finding the original article and author of the frontier thesis:

Comments or questions? Contact me. [Editor's note: This blog post was written before August 2007, when I converted this blog from my own blogging software to WordPress and added commenting to the end of posts.]
Visit this blog's home page for the latest posts.



