The Vision of ORE

ORE logoOne form of serious intellectual work that could use much more respect and appreciation within the humanities is the often unglamorous—but occasionally revolutionary—work of creating technical standards. At their best, such standards transcend the code itself to envision new forms of human interaction or knowledge creation that would not be possible without a lingua franca. We need only think of the web; look at what the modest HTML 1.0 spec has wrought.

The Object Reuse and Exchange (ORE) specification that was unveiled today at Johns Hopkins University has, beyond all of the minute technical details, a very clear and powerful vision of scholarly research and communication in a digital age. It is thus worth following the specification as it moves toward a final version in the fall of 2008, and to begin thinking about how we might use it in the humanities (even though it will undoubtedly be adopted faster in the sciences).

The vision put forth by Carl Lagoze, Herbert Van de Sompel, and others in the ORE working group for the first time tries to map the true nature of contemporary scholarship onto the web. The ORE community realized in 2006 that neither basic web pages nor advanced digital repositories truly capture today’s scholarship.

This scholarship cannot be contained by web pages or PDFs put into an institutional repository, but rather consists of what the ORE team has termed “aggregates,” or constellations of digital objects that often span many different web servers and repositories. For instance, a contemporary astronomy article might consist of a final published PDF, its metadata (author, title, publication info, etc.), some internal images, and then—here’s the important part—datasets, telescope imagery, charts, several publicly available drafts, and other matter (often held by third parties) that does not end up in the PDF. Similarly, an article in art history might consist of the historian’s text, paintings that were consulted in a museum, low-resolution copies of those paintings that are available online (perhaps a set of photos on Flickr of the referenced paintings), citations to other works, and perhaps an associated slide show.

How can one reliably reference and take full advantage of such scholarly constellations given the current state of the web? As Herbert Van de Sompel put it, ORE tries to identify in a commonsensical way “identified, bounded aggregations of related objects that form a logical whole.” In other words, ORE attempts to shift the focus from repositories for scholarship to the complex products of scholarship themselves.

By forging semantic links between pieces entailed in a work of scholarship it keeps those links active and dynamic and allows for humans, as well as machines that wish to make connections, to easily find these related objects. It also allows for a much better preservation path for digital scholarship because repositories can use ORE to get the entirety of a work and its associated constellation rather than grabbing just a single published instantiation of the work.

The implementation of ORE is perhaps less commonsensical for those who do not wish to dive into lots of semantic web terms and markup languages, but put simply, the approach the ORE group has taken is to provide a permanent locator (i.e., a URI, like a web address) that links to what they call a “resource map,” which in turn describes an aggregation. Think of a constellation in the night’s sky. We have Orion, which consists of certain stars; a star map specifies which stars comprise Orion and where to find each of them. The creators of ORE have chosen to use widely adopted formats like RDF and Atom to “serialize” (or make available in a machine-readable and easily exchangeable text format) their resource maps. [Geeks can read the full specification in their user guide.]

In the afternoon today several compelling examples of ORE in action were presented. Ray Plante of the NCSA and National Virtual Observatory showed how astronomers could use ORE and a wiki to create aggregates and updates about unusual events like supernovas, as different observatories add links to images and findings about each event (again, think of Van de Sompel’s “logical whole”). Several presenters mentioned our Zotero project as an ideal use case for ORE, since it already downloads associated objects as part of a single parent item (e.g., it stores metadata, a link to the page it got an item from, and perhaps a PDF or web snapshot). Zotero is already ORE Lite, in a way, and it will be good to try out a full Zotero translator for ORE resource maps that would permit Zotero users to grab aggregates for their research and subsequently publish aggregates back onto the web—object reuse and exchange in action.

Obviously it’s still very early and the true impact of ORE remains to be seen. But it would be a shame if humanities scholars fail to participate in the creation of scholarly standards like ORE, or to help envision their uses in research, communication, and collaboration.

There has been much talk recently of the social graph, the network of human connections that sites like Facebook bring to light and take advantage of. If widely adopted, ORE could help create the scholarly graph, the networked relations of scholars, publications, and resources.

Zotero and the Internet Archive Join Forces

IA LogoZotero LogoI’m pleased to announce a major alliance between the Zotero project at the Center for History and New Media and the Internet Archive. It’s really a match made in heaven—a project to provide free and open source software and services for scholars joining together with the leading open library. The vision and support of the Andrew W. Mellon Foundation has made this possible, as they have made possible the major expansion of the Zotero project over the last year.

You will hear much more about this alliance in the coming months on this blog, but I wanted to outline five key elements of the project.

1. Exposing and Sharing the “Hidden Archive”

The Zotero-IA alliance will create a “Zotero Commons” into which scholarly materials can be added simply via the Zotero client. Almost every scholar and researcher has documents that they have scanned (some of which are in the public domain), finding aids they have created, or bibliographies on topics of interest. Currently there is no easy way to share these; giving them a central home at the Internet Archive will archive them permanently (before they are lost on personal hard drives) and make them broadly available to others.

We understand that not everyone will be willing to share everything (some may not be willing to share anything, even though almost every university commencement reminds graduates that they are joining a “community of scholars”), but we believe that the Commons will provide a good place for shareable materials to reside. The architectural historian with hundreds of photographs of buildings, the researcher who has scanned in old newspapers, and scholars who wish to publish materials in an open access environment will find this a helpful addition to Zotero and the Internet Archive. Some researchers may of course deposit materials only after finishing, say, a book project; what I have called “secondary scholarly materials” (e.g., bibliographies) will perhaps be more readily shared.

But we hope the second part of the project will further entice scholars to contribute important research materials to the Commons.

2. Searching the Personal Library

Most scholars have not yet figured out how to take full advantage of the digitized riches suddenly available on their computers. Indeed, the abundance of digital documents has actually exacerbated the problems of some researchers, who now find themselves overwhelmed by the sheer quantity of available material. Moreover, the major advantage of digital research—the ability to scan large masses of text quickly—is often unavailable to scholars who have done their own scanning or copying of texts.

A critical second part to this alliance of IA and Zotero is to bring robust and seamless Optical Character Recognition (OCR) to the vast majority of scholars who lack the means or do not know how to convert their scans into searchable text. In addition, this process will let others search through such newly digitized texts. After a submission to the Commons, the Internet Archive will subsequently return an OCRed version of each donated document to enable searchability. This text will be incorporated into the donor’s local index (on the Zotero client) and thus made searchable in Zotero’s powerful quick search and advanced search panes. In short, this process will provide a tremendous incentive for scholars to donate to the Commons, since it will help them with their own research.

3. Enabling Networked References and Annotations

One of the pillars of scholarship is the ability for distributed scholars to be sure they are referencing the same text or evidence. As noted in #1, one of the great advantages of the Zotero Commons at IA will be the transport of scholarly materials currently residing on personal hard drives to a public space with stable, rather than local, addresses. These addresses will become critical as scholars begin to use, refer to, and cite items in the Commons.

Yet the IA/Zotero partnership has another benefit: as scholars begin to use not only traditional primary sources that have been digitized but also “born digital” materials on the web (blogs, online essays, documents transcribed into HTML), the possibility arises for Zotero users to leverage the resources of IA to ensure a more reliable form of scholarly communication. One of the Internet Archive’s great strengths is that it has not only archived the web but also given each page a permanent URI that includes a time and date stamp in addition to the URL.

Currently when a scholar using Zotero wishes to save a web page for their research they simply store a local copy. For some, perhaps many, purposes this is fine. But for web documents that a scholar believes will be important to share, cite, or collaboratively annotate (e.g., among a group of coauthors of an article or book) we will provide a second option in the Zotero web save function to grab a permanent copy and URI from IA’s web archive. A scholar who shares this item in their library can then be sure that all others who choose to use it will be referring to the exact same document.

Moreover, unlike most research software the sophisticated annotation tools built into Zotero—the ability to highlight passages, add virtual Post-It notes, as well as regular notes on the overall document—maintain these annotations separately from the underlying document. This presents the exciting possibility for collaborative scholarly annotation of web pages.

4. Simplifying Collaborative Sharing

Groups of scholars also have the need to create more private “commons,” e.g., for documents that they would like to share in a limited way. In addition to the fully open Zotero Commons we will establish a mechanism for such restricted sharing. Via the Zotero Server, a user will be able to create a special collection with a distinct icon that shows up in the client interface (left column) for every member of the group.

Files added to these collections will be stored on the Internet Archive but will have restricted access. We believe that having these files reside on the IA server will encourage the donation of documents at the end of a collaborative project. The administrator of a shared collection will be able to move its contents into the fully open Zotero Commons via a single click in the administrative interface on the Zotero Server.

5. Facilitating Scholarly Discovery

The multiple libraries of content created by Zotero users and the multi-petabyte digital collections of the Internet Archive are resources that can potentially be of great use to the scholarly community. We believe that neither has experienced the level of exploration and usage we believe is possible through further development and collaboration.

The combined digital collections present opportunities for scholars to find primary research materials, to discover one another’s work, to identify materials that are already available in digital form and therefore do not need to be located and scanned, to find other scholars with similar interests and to share their own insights broadly. We plan to leverage the combined strengths of the Zotero project and the Internet Archive to work on better discovery tools.

Symposium on the Future of Scholarly Communication

For those who missed it, between October 12 and 27, 2007, there was a very thoughtful and insightful online discussion of how the publication of scholarship is changing—or trying to change—in the digital age. Participating in the discussion were Ed Felton, David Robinson, Paul DiMaggio, and Andrew Appel from Princeton University (the symposium was hosted by the Center for Information Technology Policy at Princeton), Ira Fuchs of the Mellon Foundation, Peter Suber of the indispensable Open Access News blog (and philosophy professor at Earlham College), Stan Katz, the President Emeritus of the American Council of Learned Societies, and Laura Brown of Ithaka (and formerly the President of Oxford University Press USA).

The symposium is really worth reading from start to finish. (Alas, one of the drawbacks of hosting a symposium on a blog is that it keeps everything in reverse chronological order; it would be great if CITP could flip the posts now that the discussion has ended.) But for those of us in the humanities the most relevant point is that we are going to have a much harder transition to an online model of scholarship than in the sciences. The main reason for this is that for us the highest form of scholarship is the book, whereas in the sciences it is the article, which is far more easily put online, posted in various forms (including as pre- and e-prints), and networked to other articles (through, e.g., citation analysis). In addition, we’re simply not as technologically savvy. As Paul DiMaggio points out, “every computer scientist who received his or her Ph.D. in computer science after 1980 or so has a website” (on which they can post their scholarly production), whereas the number is about 40% for political scientists and I’m sure far less for historians and literature professors.

I’m planning a long post in this space on the possible ways for humanities professors to move from print to open online scholarship; this discussion is great food for thought.

Tony Grafton on Digital Texts and Reading

Anthony Grafton was the first person to turn me onto intellectual history. His seminar on ideas in the Renaissance was one of the most fascinating courses I took at Princeton, and I still remember well Tony rocking in his seat, looking a bit like a young Karl Marx, making brilliant connections among a broad array of sources.

So it’s not unexpected given his wide-ranging interests but still terrific to see a scholar who has spent so much time with early books thinking deeply about “digitization and its discontents” in his article “Future Reading” in the latest issue of The New Yorker. And it’s even more gratifying to see Tony note in his online companion piece to “Future Reading,” “Adventures in Wonderland,” that “One of the best ways to get a handle on the sprawling world of digital sources is through George Mason University’s Center for History and New Media.”

Steven Johnson at the Italian Embassy

Well, they didn’t have my favorite wine (Villa Cafaggio Chianti Classico Reserva, if you must know), but I had a nice evening at the Italian Embassy in Washington. The occasion was the start of a conference, “Using New Technologies to Explore Cultural Heritage,” jointly sponsored by the National Endowment for the Humanities and the Consiglio Nazionale delle Ricerche (National Research Council) of Italy. The setting was the embassy’s postmodern take on the Florentine palazzo (see below); the speaker was bestselling author and digerati Steven Johnson (Everything Bad is Good for You: How Today’s Popular Culture Is Actually Making Us Smarter;

Italian Embassy

Steven Johnson

Johnson’s talk was entitled “The Open Book: The Future of Text in the Digital Age.” (I present his thoughts here without criticism; it’s late.) Johnson argued that despite all of the hand-wringing and dire predictions, the book was not in decline. Indeed, he thought that because of new media books have new channels to expand into. While some believed ten years ago that we were entering an age of image and video, the rise of web instead led to the continued dominance of text, online and off. He noted that more hardcover books were sold in 2006 than 2005; and more in 2005 than in 2004. Newspapers have huge online audiences that dwarf their paper readership, thus strengthening their importance to culture.

Johnson pointed to four important innovations in online writing:

1) Collaborative writing is in a golden age because of the Internet. One need only look at Wikipedia, especially the social process of its underlying discussion pages (in addition to the surface article pages).

2) Fan fiction is also in its heyday. There are almost 300,000 (!) fan-written, unauthorized sequels to Harry Potter on There are even countless reviews of this fan fiction.

3) Blogging has become an important force, and great for authors. Blogs often provide unpolished comments about books by readers that are just as helpful as professional reviews.

4) Discovery of relevant materials and passages has been made much easier by new media–just think about the difference between research for a book now and roaming through the stacks in a library. Software like DEVONthink has made scholarship easier by connecting hidden dots and sorting through masses of text.

Finally, Johnson argued that despite the allure of the web, physical books are still the best way for an author to get inside someone’s head and convince them about something important. The book still has much greater weight and impact than even the most important blog post.

Shakespeare’s Hard Drive

Congrats to Matt Kirschenbaum on his thought-provoking article in the Chronicle of Higher Education, “Hamlet.doc? Literature in a Digital Age.” Matt makes two excellent points. First, “born digital” literature presents incredible new opportunities for research, because manuscripts written on computers retain significant metadata and draft tracking that allows for major insights into an author’s thought and writing process. Second, scholars who wish to study such literature in the future need to be proactive in pushing for writing environments, digital standards, and archival storage that will provide accessibility and persistence for these advantages.

2007 Vectors Summer Fellowships

Vectors: Journal of Culture and Technology in a Dynamic Vernacular has announced its fourth annual summer fellowship program to take place in June 2007 at USC. They are seeking proposals for projects related to “reading” and “noise.” About Vectors: “Vectors publishes work which need necessarily exist online, ranging from archival to experimental projects.”

It’s About Russia

One of my favorite Woody Allen quips from his tragically short period as a stand-up comic is the punch line to his hyperbolic story about taking a speed-reading course and then digesting all of War and Peace in twenty minutes. The audience begins to giggle at the silliness of reading Tolstoy’s massive tome in a brief sitting. Allen then kills them with his summary of the book: “It’s about Russia.” The joke came to mind recently as I read the self-congratulatory blog post by IBM’s Many Eyes visualization project, applauding their first month on the web. (And I’m feeling a little embarrassed by my post on the one-year anniversary of this blog.) The Many Eyes researchers point to successes such as this groundbreaking visualization of the New Testament:

News flash: Jesus is a big deal in the New Testament. Even exploring the “network” of figures who are “mentioned together” (ostensibly the point of this visualization) doesn’t provide the kind of insight that even a first-year student in theology could provide over coffee. I have been slow to appreciate the power of textual visualization—in large part because I’ve seen far too many visualizations like this one, that merely use computational methods to reveal the obvious in fancy ways.

I’ve been doing some research on visualizations of texts recently for my next book (on digital scholarship), and trying to get over this aversion to visualizations. But when I see visualizations like this one, the lesson is clear: Make sure your visualizations expose something new, hidden, non-obvious.

Because War and Peace isn’t about Russia.