Standards – Dan Cohen

The Vision of ORE

ORE logo One form of serious intellectual work that could use much more respect and appreciation within the humanities is the often unglamorous—but occasionally revolutionary—work of creating technical standards. At their best, such standards transcend the code itself to envision new forms of human interaction or knowledge creation that would not be possible without a lingua franca. We need only think of the web; look at what the modest HTML 1.0 spec has wrought.

The Object Reuse and Exchange (ORE) specification that was unveiled today at Johns Hopkins University has, beyond all of the minute technical details, a very clear and powerful vision of scholarly research and communication in a digital age. It is thus worth following the specification as it moves toward a final version in the fall of 2008, and to begin thinking about how we might use it in the humanities (even though it will undoubtedly be adopted faster in the sciences).

The vision put forth by Carl Lagoze, Herbert Van de Sompel, and others in the ORE working group for the first time tries to map the true nature of contemporary scholarship onto the web. The ORE community realized in 2006 that neither basic web pages nor advanced digital repositories truly capture today’s scholarship.

This scholarship cannot be contained by web pages or PDFs put into an institutional repository, but rather consists of what the ORE team has termed “aggregates,” or constellations of digital objects that often span many different web servers and repositories. For instance, a contemporary astronomy article might consist of a final published PDF, its metadata (author, title, publication info, etc.), some internal images, and then—here’s the important part—datasets, telescope imagery, charts, several publicly available drafts, and other matter (often held by third parties) that does not end up in the PDF. Similarly, an article in art history might consist of the historian’s text, paintings that were consulted in a museum, low-resolution copies of those paintings that are available online (perhaps a set of photos on Flickr of the referenced paintings), citations to other works, and perhaps an associated slide show.

How can one reliably reference and take full advantage of such scholarly constellations given the current state of the web? As Herbert Van de Sompel put it, ORE tries to identify in a commonsensical way “identified, bounded aggregations of related objects that form a logical whole.” In other words, ORE attempts to shift the focus from repositories for scholarship to the complex products of scholarship themselves.

By forging semantic links between pieces entailed in a work of scholarship it keeps those links active and dynamic and allows for humans, as well as machines that wish to make connections, to easily find these related objects. It also allows for a much better preservation path for digital scholarship because repositories can use ORE to get the entirety of a work and its associated constellation rather than grabbing just a single published instantiation of the work.

The implementation of ORE is perhaps less commonsensical for those who do not wish to dive into lots of semantic web terms and markup languages, but put simply, the approach the ORE group has taken is to provide a permanent locator (i.e., a URI, like a web address) that links to what they call a “resource map,” which in turn describes an aggregation. Think of a constellation in the night’s sky. We have Orion, which consists of certain stars; a star map specifies which stars comprise Orion and where to find each of them. The creators of ORE have chosen to use widely adopted formats like RDF and Atom to “serialize” (or make available in a machine-readable and easily exchangeable text format) their resource maps. [Geeks can read the full specification in their user guide.]

In the afternoon today several compelling examples of ORE in action were presented. Ray Plante of the NCSA and National Virtual Observatory showed how astronomers could use ORE and a wiki to create aggregates and updates about unusual events like supernovas, as different observatories add links to images and findings about each event (again, think of Van de Sompel’s “logical whole”). Several presenters mentioned our Zotero project as an ideal use case for ORE, since it already downloads associated objects as part of a single parent item (e.g., it stores metadata, a link to the page it got an item from, and perhaps a PDF or web snapshot). Zotero is already ORE Lite, in a way, and it will be good to try out a full Zotero translator for ORE resource maps that would permit Zotero users to grab aggregates for their research and subsequently publish aggregates back onto the web—object reuse and exchange in action.

Obviously it’s still very early and the true impact of ORE remains to be seen. But it would be a shame if humanities scholars fail to participate in the creation of scholarly standards like ORE, or to help envision their uses in research, communication, and collaboration.

There has been much talk recently of the social graph, the network of human connections that sites like Facebook bring to light and take advantage of. If widely adopted, ORE could help create the scholarly graph, the networked relations of scholars, publications, and resources.

March 3, 2008 4 Comments

Shakespeare’s Hard Drive

Congrats to Matt Kirschenbaum on his thought-provoking article in the Chronicle of Higher Education, “Hamlet.doc? Literature in a Digital Age.” Matt makes two excellent points. First, “born digital” literature presents incredible new opportunities for research, because manuscripts written on computers retain significant metadata and draft tracking that allows for major insights into an author’s thought and writing process. Second, scholars who wish to study such literature in the future need to be proactive in pushing for writing environments, digital standards, and archival storage that will provide accessibility and persistence for these advantages.

August 13, 2007 2 Comments

Creating a Blog from Scratch, Part 5: What is XHTML, and Why Should I Care?

In prior posts in this series (1, 2, 3, and 4), I described with some glee my rash abandonment of common blogging software in favor of writing my own. For my purposes there seemed to be some key disadvantages to these popular packages, including an overemphasis on the calendar (I just saw the definition of a blog at the South by Southwest Interactive Festival—”a page with dated entries”—which, to paraphrase Woody Allen, is like calling War and Peace “a book about Russia”), a sameness to their designs, and comments that are rarely helpful and often filled with spam. But one of the greatest advantages of recent blog software packages is that they generally write standards-compliant code. More specifically, blog software like WordPress automatically produces XHTML. Some of you might be asking, what is XHTML, and who cares? And why would I want to spend a great deal of effort ensuring that this blog complied strictly with this language?

The large digital library contingent that reads this blog could probably enumerate many reasons why XHTML compliance is important, but I had two reasons in mind when I started this blog. (Actually, I had a third, more secretive reason that I’ll mention first: Roy Rosenzweig and I argue in our book Digital History that XHTML will likely be critical for digital humanists to adhere to in the future—don’t want to be accused of being a hypocrite.) For those for whom web acronyms are Greek, XHTML is a sibling of XML, a more rigorously structured and flexible language than the HTML that underlies most of the web. XHTML is better prepared than HTML to be platform-independent; because it separates formatting from content, XHTML (like XML) can be reconfigured easily for very different environments (using, e.g., different style sheets). HTML, with formatting and content inextricably combined, for the most part assumes that you are using a computer screen and a web browser. Theoretically XHTML can be dynamically and instantaneously recast to work on many different devices (including a personal computer). This flexibility is becoming an increasingly important feature as people view websites on a variety of platforms (not just a normal computer screen, e.g., but cell phones or audio browsers for the blind). Indeed, according to the server logs for this blog, 1.6% of visitors are using a smart phone, PDA, or other means to read this blog, a number that will surely grow. In short, XHTML seems better prepared than regular HTML to withstand the technological changes of the coming years, and theoretically should be more easily preserved than older methods of displaying information on the web. For these and other reasons a 2001 report the Smithsonian commissioned recommended the institution move to XHTML from HTML.

Of course, with standards compliance comes extra work. (And extra cost. Just ask webmasters at government agencies trying to make their websites comply with Section 508, the mandatory accessibility rules for federal information resources.) Aside from a brief flirtation with the what-you-see-is-what-you-get, write-the-HTML-for-you program Dreamweaver in the late 1990s, I’ve been composing web pages using a text editor (the superb BBEdit) for over ten years, so my hands are used to typing certain codes in HTML, in the same way you get used to a QWERTY keyboard. XHTML is not that dissimilar from HTML, but it still has enough differences to make life difficult for those used to HTML. You have to remember to close every tag; some attributes related to formating are in strange new locations. One small example of the minor infractions I frequently trip up on writing XHTML: the oft-used break tag to add a line to a web page must “close itself” by adding a slash before the end bracket (not <br>, but <br />). But I figured doing this blog would give me a good incentive to start writing everything in strict XHTML.

Yeah, right. I clearly haven’t been paying enough attention to detail. The page you’re reading likely still has dozens of little coding errors that make it fail strict compliance with the World Wide Web Consortium’s XHTML standard. (If you would like a humbling experience that brings to mind receiving a pop quiz back from your third-grade teacher with lots of red ink on it, try the W3C’s XHTML Validator.) I haven’t had enough time to go back and correct all of those little missing slashes and quotation marks. WordPress users out there can now begin their snickering; their blog software does such mundane things for them, and many proudly (and annoyingly) display little “XHTML 1.0 compliant” badges on their sites. Go ahead, rub it in.

After I realized that it would take serious effort to bring my code up to code, so to speak, I sat back and did the only thing I could do: rationalize. I didn’t really need strict XHTML compliance because through some design slight-of-hand I had already been able to make this blog load well on a wide range of devices. I learned from other blog software that if you put the navigation on the right rather than the more common left you see on most websites, the body of each post shows up first on a PDA or smart phone. It also means that blind visitors don’t have to suffer through a long list of your other posts before getting to the article they want to read.

As far as XHTML is concerned, I’ll be brushing up on that this summer. Unless I move this blog to WordPress by then.

Part 6: One Year Later

January 5, 2006 Add Comment