Dan Cohen

Archive for the ‘Research’ Category

The Promise of Digital History

Thursday, September 11th, 2008

Back in January of this year I mentioned in this space that I was participating in an online discussion on digital history for the Journal of American History. That discussion has just been published in the September 2008 issue under the title “The Promise of Digital History.” The discussion ended up being extremely wide-ranging, including research possibilities in the digital age, the future of scholarly communication, training, and teaching. I’m obviously biased since I’m one of the interlocutors, but I believe the article is the perfect introduction to digital history for those who are new to the subject, and it also includes some important debates about where the field is headed. The article is available online at the History Cooperative, which is, alas, gated. Open access is another topic discussed in the article; I hope the JAH will make the article freely available soon.

Many thanks to the seven other digital historians—Bill Turkel, Will Thomas, Amy Murrell Taylor, Patrick Gallagher, Michael Frisch, Kristen Sword, and Steven Mintz—who participated in such a lively exchange!

The Pirate Problem

Tuesday, April 22nd, 2008

Jolly Roger FlagLast summer, a few blocks from my house, a new pub opened. Normally this would not be worth noting, except for the fact that this bar is staffed completely by pirates, with eye patches, swords, and even the occasional bird on the shoulder. These are not real pirates, of course, but modern men and women dressed up as pirates. But they wear the pirate garb with no hint of irony or thespian affect whatsoever; these are dedicated, earnest pirates.

At this point I should note that I do not live in Orlando, Florida, or any other place devoted to make-believe, but in a sleepy suburb of Washington, D.C., that is filled with Very Serious Professionals. When the pirate pub opened, the neighborhood VSPs (myself very much included) concluded that it was strange and silly and that it was an incontrovertible fact that no one would patronize the place. Or if they did, it would be as a lark.

We clung to this belief for approximately 24 hours, until, upon a casual stroll by the storefront, we witnessed six pirate-garbed pubgoers outside. Singing sea chanteys. Without sheet music. The tavern has been filled ever since.

Such an experience usefully reminds oneself that there are ways of acting and thinking that we can’t understand or anticipate. Who knew that there was a highly developed pirate subculture, and that it thrived among the throngs of politicos and think-tankers and professors of Washington? Who are these people?

My thoughts turned to pirates during my experience at a workshop at the University of North Carolina at Chapel Hill a week ago, which was devoted to the digitization of the unparalleled Southern Historical Collection, and—in a less obvious way—to thinking about the past and future of humanities scholarship. Dozens of historians came to the workshop to discuss the way in which the SHC, the source of so many books and articles about the South and the home of 16 million archival documents, should be put on the web.

I gave the keynote, which I devoted to prodding the attendees into recognizing that the future of archives and research might not be like the past, and I showed several examples from my work and the work of CHNM that used different ways of searching and analyzing documents that are in digital, rather than analog, forms. Longtime readers of this blog will remember some of the examples, including an updated riff on what a future historian might learn about the state of religion in turn-of-the-century America by data mining our September 11 Digital Archive.

The most memorable response from the audience was from an award-winning historian I know from my graduate school years, who said that during my talk she felt like “a crab being lowered into the warm water of the pot.” Behind the humor was the difficult fact that I was saying that her way of approaching an archive and understanding the past was about to be replaced by techniques that were new, unknown, and slightly scary.

This resistance to thinking in new ways about digital archives and research was reflected in the pre-workshop survey of historians. Extremely tellingly, the historians surveyed wanted the online version of the SHC to be simply a digital reproduction of the physical SHC:

With few exceptions, interviewees believed that the structure of the collection in the virtual space should replicate, not obscure, the arrangement of the physical collection. Thus, navigating a manuscript collection online would mimic the experience of navigating the physical collection, and the virtual document containers—e.g., folders—and digital facsimiles would map clearly back to the physical containers and documents they represent. [Laura Clark Brown and David Silkenat, "Extending the Reach of Southern Sources," p. 10]

In other words, in the age of Google and advanced search tools and techniques, most historians just want to do their research they way they’ve always done it, by taking one letter out of the box at a time. One historian told of a critical moment in her archival work, when she noticed a single word in a letter that touched off the thought that became her first book.

So in Chapel Hill I was the pirate with the strange garb and ways of behaving, and this is a good lesson for all boosters of digital methods within the humanities. We need to recognize that the digital humanities represent a scary, rule-breaking, swashbuckling movement for many historians and other scholars. We must remember that these scholars have had—for generations and still in today’s graduate schools—a very clear path for how they do their work, publish, and get rewarded. Visit archive; do careful reading; find examples in documents; conceptualize and analyze; write monograph; get tenure.

We threaten all of this. For every time we focus on text mining and pattern recognition, traditionalists can point to the successes of close reading—on the power of a single word. We propose new methods of research when the old ones don’t seem broken. The humanities have an order, and we, mateys, threaten to take that calm ship into unknown waters.

[Image credit: &y.]

Enhancing Historical Research With Text-Mining and Analysis Tools

Monday, February 4th, 2008

Open BookI’m delighted to announce that beginning this summer the Center for History and New Media will undertake a major two-year study of the potential of text-mining tools for historical (and by extension, humanities) scholarship. The project, entitled “Scholarship in the Age of Abundance: Enhancing Historical Research With Text-Mining and Analysis Tools,” has just received generous funding from the National Endowment for the Humanities.

In the last decade the library community and other providers of digital collections have created an incredibly rich digital archive of historical and cultural materials. Yet most scholars have not yet figured out ways to take full advantage of the digitized riches suddenly available on their computers. Indeed, the abundance of digital documents has actually exacerbated the problems of some researchers who now find themselves overwhelmed by the sheer quantity of available material. Meanwhile, some of the most profound insights lurking in these digital corpora remain locked up.

For some time computer scientists have been pursuing text mining as a solution to the problem of abundance, and there have even been a few attempts at bringing text-mining tools to the humanities (such as the MONK project). Yet there is not as much research as one might hope on what non-technically savvy scholars (especially historians) might actually want and use in their research, and how we might integrate sophisticated text analysis into the workflow of these scholars.

We will first conduct a survey of historians to examine closely their use of digital resources and prospect for particularly helpful uses of digital technology. We will then explore three main areas where text mining might help in the research process: locating documents of interest in the sea of texts online; extracting and synthesizing information from these texts; and analyzing large-scale patterns across these texts. A focus group of historians will be used to assess the efficacy of different methods of text mining and analysis in real-world research situations in order to offer recommendations, and even some tools, for the most promising approaches.

In addition to other forms of dissemination, I will of course provide project updates in this space.

[Image credit: Matt Wright]

Two Misconceptions about the Zotero-IA Alliance

Friday, December 14th, 2007

Thanks to everyone for their helpful (and thankfully, mostly positive) feedback on the new Zotero-IA alliance. I wanted to try to clear up a couple of things that the press coverage and my own writing failed to communicate. (Note to self: finally get around to going to one of those media training courses so I can learn how to communicate all of the elements of a complex project well in three minutes, rather than lapsing into my natural academic long-windedness.)

1. Zotero + IA is not simply the Zotero Commons

Again, this is probably my fault for not communicating the breadth of the project better. The press has focused on items #1 and 2 in my original post—they are the easiest to explain—but while the project does indeed try to aggregate scholarly resources, it is also trying to solve another major problem with contemporary scholarship: scholars are increasingly using and citing web resources but have no easy way to point to stable URLs and cached web pages. In particular, I encourage everyone to read item #3 in my original post again, since I consider it extremely important to the project.

Items #4 and 5 also note that we are going to leverage IA for better collaboration, discovery, and recommendation systems. So yes, the Commons, but much more too.

2. Zotero + IA is not intended to put institutional repositories out of business, nor are they excluded from participation

There has been some hand-wringing in the library blogosphere this week (see, e.g., Library 2.0) that this project makes an end-run around institutional repositories. These worries were probably exacerbated by the initial press coverage that spoke of “bypassing” the libraries. However, I want to emphasize that this project does not make IA the exclusive back end for contributions. Indeed, I am aware of several libraries that are already experimenting with using Zotero as an input device for institutional repositories. There is already an API for the Zotero client that libraries can extract data and files from, and the server will have an even more powerful API so that libraries can (with their users’ permission, of course) save materials into an archive of their own.

Zotero and the Internet Archive Join Forces

Wednesday, December 12th, 2007

IA LogoZotero LogoI’m pleased to announce a major alliance between the Zotero project at the Center for History and New Media and the Internet Archive. It’s really a match made in heaven—a project to provide free and open source software and services for scholars joining together with the leading open library. The vision and support of the Andrew W. Mellon Foundation has made this possible, as they have made possible the major expansion of the Zotero project over the last year.

You will hear much more about this alliance in the coming months on this blog, but I wanted to outline five key elements of the project.

1. Exposing and Sharing the “Hidden Archive”

The Zotero-IA alliance will create a “Zotero Commons” into which scholarly materials can be added simply via the Zotero client. Almost every scholar and researcher has documents that they have scanned (some of which are in the public domain), finding aids they have created, or bibliographies on topics of interest. Currently there is no easy way to share these; giving them a central home at the Internet Archive will archive them permanently (before they are lost on personal hard drives) and make them broadly available to others.

We understand that not everyone will be willing to share everything (some may not be willing to share anything, even though almost every university commencement reminds graduates that they are joining a “community of scholars”), but we believe that the Commons will provide a good place for shareable materials to reside. The architectural historian with hundreds of photographs of buildings, the researcher who has scanned in old newspapers, and scholars who wish to publish materials in an open access environment will find this a helpful addition to Zotero and the Internet Archive. Some researchers may of course deposit materials only after finishing, say, a book project; what I have called “secondary scholarly materials” (e.g., bibliographies) will perhaps be more readily shared.

But we hope the second part of the project will further entice scholars to contribute important research materials to the Commons.

2. Searching the Personal Library

Most scholars have not yet figured out how to take full advantage of the digitized riches suddenly available on their computers. Indeed, the abundance of digital documents has actually exacerbated the problems of some researchers, who now find themselves overwhelmed by the sheer quantity of available material. Moreover, the major advantage of digital research—the ability to scan large masses of text quickly—is often unavailable to scholars who have done their own scanning or copying of texts.

A critical second part to this alliance of IA and Zotero is to bring robust and seamless Optical Character Recognition (OCR) to the vast majority of scholars who lack the means or do not know how to convert their scans into searchable text. In addition, this process will let others search through such newly digitized texts. After a submission to the Commons, the Internet Archive will subsequently return an OCRed version of each donated document to enable searchability. This text will be incorporated into the donor’s local index (on the Zotero client) and thus made searchable in Zotero’s powerful quick search and advanced search panes. In short, this process will provide a tremendous incentive for scholars to donate to the Commons, since it will help them with their own research.

3. Enabling Networked References and Annotations

One of the pillars of scholarship is the ability for distributed scholars to be sure they are referencing the same text or evidence. As noted in #1, one of the great advantages of the Zotero Commons at IA will be the transport of scholarly materials currently residing on personal hard drives to a public space with stable, rather than local, addresses. These addresses will become critical as scholars begin to use, refer to, and cite items in the Commons.

Yet the IA/Zotero partnership has another benefit: as scholars begin to use not only traditional primary sources that have been digitized but also “born digital” materials on the web (blogs, online essays, documents transcribed into HTML), the possibility arises for Zotero users to leverage the resources of IA to ensure a more reliable form of scholarly communication. One of the Internet Archive’s great strengths is that it has not only archived the web but also given each page a permanent URI that includes a time and date stamp in addition to the URL.

Currently when a scholar using Zotero wishes to save a web page for their research they simply store a local copy. For some, perhaps many, purposes this is fine. But for web documents that a scholar believes will be important to share, cite, or collaboratively annotate (e.g., among a group of coauthors of an article or book) we will provide a second option in the Zotero web save function to grab a permanent copy and URI from IA’s web archive. A scholar who shares this item in their library can then be sure that all others who choose to use it will be referring to the exact same document.

Moreover, unlike most research software the sophisticated annotation tools built into Zotero—the ability to highlight passages, add virtual Post-It notes, as well as regular notes on the overall document—maintain these annotations separately from the underlying document. This presents the exciting possibility for collaborative scholarly annotation of web pages.

4. Simplifying Collaborative Sharing

Groups of scholars also have the need to create more private “commons,” e.g., for documents that they would like to share in a limited way. In addition to the fully open Zotero Commons we will establish a mechanism for such restricted sharing. Via the Zotero Server, a user will be able to create a special collection with a distinct icon that shows up in the client interface (left column) for every member of the group.

Files added to these collections will be stored on the Internet Archive but will have restricted access. We believe that having these files reside on the IA server will encourage the donation of documents at the end of a collaborative project. The administrator of a shared collection will be able to move its contents into the fully open Zotero Commons via a single click in the administrative interface on the Zotero Server.

5. Facilitating Scholarly Discovery

The multiple libraries of content created by Zotero users and the multi-petabyte digital collections of the Internet Archive are resources that can potentially be of great use to the scholarly community. We believe that neither has experienced the level of exploration and usage we believe is possible through further development and collaboration.

The combined digital collections present opportunities for scholars to find primary research materials, to discover one another’s work, to identify materials that are already available in digital form and therefore do not need to be located and scanned, to find other scholars with similar interests and to share their own insights broadly. We plan to leverage the combined strengths of the Zotero project and the Internet Archive to work on better discovery tools.

Steven Johnson at the Italian Embassy

Thursday, October 4th, 2007

Well, they didn’t have my favorite wine (Villa Cafaggio Chianti Classico Reserva, if you must know), but I had a nice evening at the Italian Embassy in Washington. The occasion was the start of a conference, “Using New Technologies to Explore Cultural Heritage,” jointly sponsored by the National Endowment for the Humanities and the Consiglio Nazionale delle Ricerche (National Research Council) of Italy. The setting was the embassy’s postmodern take on the Florentine palazzo (see below); the speaker was bestselling author and digerati Steven Johnson (Everything Bad is Good for You: How Today’s Popular Culture Is Actually Making Us Smarter; Outside.in).

Italian Embassy

Steven Johnson

Johnson’s talk was entitled “The Open Book: The Future of Text in the Digital Age.” (I present his thoughts here without criticism; it’s late.) Johnson argued that despite all of the hand-wringing and dire predictions, the book was not in decline. Indeed, he thought that because of new media books have new channels to expand into. While some believed ten years ago that we were entering an age of image and video, the rise of web instead led to the continued dominance of text, online and off. He noted that more hardcover books were sold in 2006 than 2005; and more in 2005 than in 2004. Newspapers have huge online audiences that dwarf their paper readership, thus strengthening their importance to culture.

Johnson pointed to four important innovations in online writing:

1) Collaborative writing is in a golden age because of the Internet. One need only look at Wikipedia, especially the social process of its underlying discussion pages (in addition to the surface article pages).

2) Fan fiction is also in its heyday. There are almost 300,000 (!) fan-written, unauthorized sequels to Harry Potter on fanfiction.net. There are even countless reviews of this fan fiction.

3) Blogging has become an important force, and great for authors. Blogs often provide unpolished comments about books by readers that are just as helpful as professional reviews.

4) Discovery of relevant materials and passages has been made much easier by new media–just think about the difference between research for a book now and roaming through the stacks in a library. Software like DEVONthink has made scholarship easier by connecting hidden dots and sorting through masses of text.

Finally, Johnson argued that despite the allure of the web, physical books are still the best way for an author to get inside someone’s head and convince them about something important. The book still has much greater weight and impact than even the most important blog post.

Social and Semantic Computing for Historical Scholarship

Monday, May 14th, 2007

Under the assumption that many readers of this blog don’t receive the American Historical Association’s magazine Perspectives, you might be interested in this article I wrote for the May 2007 issue. In the piece I discuss the Zotero project’s connection to several recent trends in computing, and think ahead to what the Zotero server might mean for academic fields like history.

Second Chicago Colloquium on Digital Humanities and Computer Science

Thursday, April 26th, 2007

I went to the first of these last November and it’s well worth attending. This year’s theme is “exploring the scholarly query potential of high quality text and image archives in a collaborative environment.” The colloquium will take place on October 21-22, 2007, with proposals due July 31, 2007.

First Issue of Digital Humanities Quarterly

Monday, April 9th, 2007

I had started to worry that this wouldn’t actually launch, so I’m glad to see that the inaugural issue of Digital Humanities Quarterly is online. Seems like a good mix of theory and practice, and well designed.

2007 Vectors Summer Fellowships

Wednesday, March 14th, 2007

Vectors: Journal of Culture and Technology in a Dynamic Vernacular has announced its fourth annual summer fellowship program to take place in June 2007 at USC. They are seeking proposals for projects related to “reading” and “noise.” About Vectors: “Vectors publishes work which need necessarily exist online, ranging from archival to experimental projects.”