Web Services – Dan Cohen

Eliminating the Power Cord

[My live talk at the Shape of Things to Come conference at the University of Virginia, March 27, 2010. It is a riff on a paper that will come out in the proceedings of the conference.]

As I noted in my paper for this conference, what I find interesting about this panel is that we got a chance to compare two projects by Ken Price: the Walt Whitman Archive and Civil War Washington. How their plans and designs differ tell us something about all digital humanities projects. I want to spend my brief time spinning out further what I said in the paper about control, flexibility, creativity, and reuse. It’s a tale of the tension between content creators and content users.

But before I get to Ken’s work, I’d like to start with another technological humanist, Jef Raskin, one of the first employees of Apple Computer and the designer, with Steve Jobs, of the first Macintosh. Just read the principles Raskin lays out in 1979 in “Design Considerations for an Anthropophilic Computer”:

This is an outline for a computer designed for the Person In The Street (or, to abbreviate: the PITS); one that will be truly pleasant to use, that will require the user to do nothing that will threaten his or her perverse delight in being able to say: “I don’t know the first thing about computers.”

You might think that any number of computers have been designed with these criteria in mind, but not so. Any system which requires a user to ever see the interior, for any reason, does not meet these specifications. There must not be additional ROMS, RAMS, boards or accessories except those that can be understood by the PITS as a separate appliance. As a rule of thumb, if an item does not stand on a table by itself, and if it does not have its own case, or if it does not look like a complete consumer item in [and] of itself, then it is taboo.

If the computer must be opened for any reason other than repair (for which our prospective user must be assumed incompetent) even at the dealer’s, then it does not meet our requirements.

Seeing the guts is taboo. Things in sockets is taboo. Billions of keys on the keyboard is taboo. Computerese is taboo. Large manuals, or many of them is taboo.

There must not be a plethora of configurations. It is better to manufacture versions in Early American, Contemporary, and Louis XIV than to have any external wires beyond a power cord.

And you get ten points if you can eliminate the power cord.

Many digital humanities projects implicitly believe strongly in Raskin’s design principle. They take care of what to the content creators and designers seems like hard and annoying work for the end users, freeing those users “to do what they do best.” These editorial projects bring together at once primary sources, middleware, user interfaces, and even tools.

Like the Macintosh, this can be a very good thing. I mostly agree with what Ken has just said, that in the case of Whitman, we probably cannot rely on a loose network of sites to provide canonical texts. Moreover, students new to Walt Whitman can clearly use the contextualization and criticism Ken and his colleagues provide on the Walt Whitman site. Similarly, scholars dipping for the first time into ethnomusicology will appreciate the total research environment provided by EVIA. As Matt Kirschenbaum noted in the last session, good user interfaces can enable new interpretations. I doubt that many scholars would be able to do Hypercities-grade geographical scholarship without a centralized Hypercities site.

But at the same time, like Raskin, sometimes these projects strive too hard to eliminate the power cord.

Raskin thought that the perfect computer would enable creativity at the very surface of the appliance. Access to the guts would not be permitted because to allow so would hinder the capacity of the user to be creative. The computer designers would take care of all of the creativity from the base of the hardware to the interface. But as Bethany Nowviskie discussed this morning, design decisions and user interface embody an argument. And so they also imply control. It’s worth thinking about the level of control the creators assume in each digital humanities project.

I would like to advance this principle: Scholars have uses for edited collections that the editors cannot anticipate. One of the joys of server logs is that we can actually see that principle in action (whereas print editorial projects have no idea how their volumes are being used, except in footnotes many years later). In the September 11 Digital Archive we assumed as historians that all uses of the archive would be related to social history. But we discovered later that many linguists were using the archive to study teen slang at the turn of the century, because it was a large open database that held many stories by teens. Anyone creating resources to serve scholars and scholarship needs to account for these unanticipated uses.

When we think through the principle of unanticipated uses, we begin to realize that there is a push and pull between the scholar and the editor. It is perhaps not a zero sum game, but surely there is a tension between the amount of intellectual work each party gets to do. Editors that put a major intellectual stamp on their collection through data massaging and design and user tools restrict the ability of the scholar to do flexible work on it. Alan Burdette of EVIA was thinking of this when he spoke about his fear of control vs. dynamism this morning.

Are digital humanities projects prepared to separate their interfaces from their primary content? What if Hypercities was just a set of KML files like Phil Ethington’s KML files of LA geography? What about the Grub Street Project? Or Ken’s Civil War Washington? This is a hard question for digital projects—freeing their content for reuse.

I believe Ken’s two projects, one a more traditional editorial project and one a labor of love, struggle with how much intellectual work to cede to the end user. Both projects have rather restrictive terms of use pages and admonishments about U.S. copyright law. Maybe I’m reading something into the terms of use page for Civil War Washington site, but it seems more half-hearted. You can tell that here is a project that isn’t a holding place for fixed perfected primary resources like Whitman’s, but an evolving scholarly discussion that could easily involve others.

Why not then allow for the download of all the data on the site? I don’t think it would detract from Civil War Washington; indeed, it would probably increase the profile of the site. The site would not only have its own interpretations, but allow for other interpretations—off of the site. Why not let others have access to the guts that Raskin wished to cloak? This is the way networked scholarship works. And this is, I believe, what Roger Bagnall was getting at yesterday when he said “we need to think about the death of the [centralized website] project” as the greater success of digital humanities.

Jim Chandler and I have been formulating a rule of thumb for these editorial projects: the more a discipline is secure in its existence, its modes of interpretation, and its methods of creating scholarship, the more likely it is to produce stripped-down, exchangeable data sets. Thus scholars in papyrology just want to get at the raw sources; they would be annoyed by a Mac-like interface or silo. They have achieved what David Weinberger, in summarizing the optimal form of the web, called “small pieces, loosely joined.”

On the other hand, the newer and less confident disciplines, such as the digital geographic history of Civil War Washington, Hypercities, and Grub Street feel that they need to have a Raskin-like environment—it’s part of the process of justifying their existence. They feel pressure to be judge, jury and executioner. If the Cohen-Chandler law holds true, we will see in the future fewer fancy interfaces and more direct, portable access to humanities materials.

Of course, as I note in my paper, the level of curation apparent in a digital project is related to the question of credit. The Whitman archive feels like a traditional editorial project and thus worthy of credit. If Ken instead produced KML files and raw newspaper scans, he would likely get less credit than a robust, comprehensive site like Civil War Washington.

The irony about the long-suffering debate about credit is that every day humanities scholars deal with complexity, parsing complicated texts, finding meaning in the opaque. And yet somehow when it comes to self-assessment, we are remarkably simple-minded. If we can understand Whitman’s Leaves of Grass, surely we can tease out questions of credit and the intellectual work that goes into, say, complex KML files.

To help spur this transition along, Christine Madsen has made this weekend the important point that the separation of interface and data makes sustainability models easier to imagine (and suggests a new role for libraries). If art is long and life is short, data is longish and user interfaces are fleeting. Just look at how many digital humanities projects that rely on Flash are about to become useless on millions of iPads.

Finally, on sustainability, I made a comparison in my paper between the well-funded Whitman archive and the Civil War Washington site, which was produced through sweat equity. I believe that Ken has a trump card with the latter. Being a labor of love is worth thinking about, because it’s often the way that great scholarship happens. Scholars in the humanities are afflicted with an obsession that makes them wake up in the morning and research and write about topics that drive them and constantly occupy their thoughts. Scholars naturally want to spend their time doing things like Civil War Washington. Being a labor of love is often the best sustainability model.

March 28, 2010 12 Comments

Project Bamboo Launches

Project Bamboo Logo If you’re interested in the present and future of the digital humanities, you’ll be hearing a lot about Project Bamboo over the next two years, including in this space. I was lucky enough to read and comment upon the Bamboo proposal a few months ago and was excited by its promise to begin to understand how technology—especially technology connected by web services—might be able to transform scholarship and academia. Bamboo is somewhat (and intentionally) amorphous right now—this doesn’t do it justice, but you can think of its initial phase as a listening tour—but I expect big things from the project in the not-so-distant future. From the brief description on the project website:

Bamboo is a multi-institutional, interdisciplinary, and inter-organizational effort that brings together researchers in arts and humanities, computer scientists, information scientists, librarians, and campus information technologists to tackle the question:

How can we advance arts and humanities research through the development of shared technology services?

A good question, and the right time to ask it. And the overall goal?

If we move toward a shared services model, any faculty member, scholar, or researcher can use and reuse content, resources, and applications no matter where they reside, what their particular field of interest is, or what support may be available to them. Our goal is to better enable and foster academic innovation through sharing and collaboration.

Project Bamboo was funded by the Andrew W. Mellon Foundation.

March 24, 2008 Add Comment

Zotero News, Big and Small

So much for a modest, stealthy launch of Zotero. I promised a couple of weeks ago that I would return to my blog soon with a few updates about user feedback, some hints about new features, and perhaps some additional news items. With a modest private beta test and a few pages explaining the software on our new site, I assumed that Zotero would quietly and slowly enter into public consciousness. Little did I know that within two weeks I would get over 400 emails asking to join the beta test, help develop and extend Zotero, make it work better with resources on the web, and evangelize it on campuses and in offices around the globe. (Sorry to those I haven’t responded to yet; I’m still working on my email backlog.) Better yet, we received some fantastic news about support for the project, which is where I’ll begin this update.

The big news is that the Center for History and New Media has received an incredibly generous grant from the Andrew W. Mellon Foundation to help build major new features into the 2.0 release of Zotero (coming in 2007). Included in this substantial upgrade are great capabilities that beta testers are already clamoring for (as I’ll describe below). I’m deeply appreciative to the Mellon Foundation and especially Ira Fuchs and Chris Mackie for their support of the project, and we’re delighted to join the stable of other Mellon-funded, open-source projects that are trying to revolutionize higher education and the scholarly enterprise through the use of innovative information technology. We have a very ambitious set of goals we would like to accomplish in the next two years under Mellon funding, and we’re really excited to get started and push these advances out to an eager audience.

My thanks also to the beta testers who have reported bugs and sent in suggestions. (For a few early reviews and thoughts about Zotero, see posts on the blogs of Bill Turkel, Bruce D’Arcus (1, 2), Adrian Cooke, Jeanne Kramer-Smyth, and Mark Phillipson.) We’re planning on rolling all of the bug fixes and a few of the suggestions that we’ve already implemented into the public beta that will be released shortly. The most requested new features were auto-completion/suggestions for tags, better support for non-Western and institutional authors, full-text searches of articles that are saved into one’s Zotero collection, more import/export options, support for other online collections and resources, and the detection of duplicate records. The developers are working feverishly on all of these fronts, and I think the Beta 2 release (our public beta) will be considerably better because of all of this helpful feedback.

I have intentionally left out perhaps the most wanted feature: tools for collaboration. Some of those who have started to hack the software have noticed what we at the Center for History and New Media have been thinking about from the start—that it seems very easy to add ways to send and receive information to and from Zotero (it does reside in the web browser, after all). What if you could share a folder of references and notes with a colleague across the country? What if you could receive a feed of new resources in your area of interest? What if you could synchronize your Zotero library with a server and access it from anywhere? What if you could send your personal collection to other web services, e.g., a mapping service or text analyzer or translation engine?

I’m glad so many of us are thinking alike. Those are the issues we’ve just started to work on, thanks to the Mellon Foundation. Stay tuned for the Zotero server and additional exciting extensions to the Zotero platform.

And despite my email backlog, please do contact me if you would like to join the Zotero movement.

September 19, 2006 Add Comment

Where Are the Noncommercial APIs?

Readers of this blog know that one of my pet peeves as someone trying to develop software tools for scholars, teachers, and students is the lack of application programming interfaces (APIs) for educational resources. APIs greatly facilitate the use of these resources and allow third parties to create new services on top of them, such as the Google Maps “mashups” that have become a phenomenon in the last year. (Please see my post “Do APIs Have a Place in the Digital Humanities?” as well as the Hurricane Digital Memory Bank for more on APIs and to see what a historical mashup looks like.) Now a clearing house for APIs shows the extent to which noncommercial resources—and especially those in the humanities—have been left out in the cold in this promising new phase of the web. Count with me the total number of noncommercial, educationally-oriented APIs out of the nearly 200 listed on Programmable Web.

That’s right, for the humanities the answer is one: the Library of Congress’s somewhat clunky SRU (Search/Retrieve via URL). Maybe in a broader definition you could count the API from the BBC archive, though it seems to be more about current events. The Internet Archive’s API is currently focused on facilitating uploads into its system rather than, say, historical data mining of the web. A potentially rich API for finding book information, ISBNdb.com, seems promising, but shouldn’t there be a noncommercial entity offering this service (I assume ISSNdb.com will eventually charge or limit this important service)?

By my count the only other noncommercial APIs are from large U.S. government scientific institutions such as NASA, NIH, and NOAA. Surely this long list is missing some other APIs out there, such as one for OAI-PMH. If so, let Programmable Web know—most “Web 2.0” developers are looking here first to get ideas for services, and we don’t need more mashups focusing on the real estate market.

March 10, 2006 Add Comment

Wikipedia vs. Encyclopaedia Britannica for Digital Research

In a prior post I argued that the recent coverage of Wikipedia has focused too much on one aspect of the online reference source’s openness—the ability of anyone to edit any article—and not enough on another aspect of Wikipedia’s openness—the ability of anyone to download or copy the entire contents of its database and use it in virtually any way they want (with some commercial exceptions). I speculated that, as I discovered in my data-mining work with H-Bot, which uses Wikipedia in its algorithms, having an open and free resource such as this could be very important for future digital research—e.g., finding all of the documents about the first President Bush in a giant, untagged corpus on the American presidency. For a piece I’m writing for D-Lib Magazine, I decided to test this theory by pulling out significant keywords and phrases from matching articles in Wikipedia and the Encyclopaedia Britannica on George H. W. Bush to see if one was better than the other for this purpose. Which resource is better? Here are the unedited term lists, derived by running plain text versions of each article through Yahoo’s Term Extraction web service. Vote on which one you think is a better profile, and I’ll reveal which list belongs to which reference work later this week.

Article #1
president bush
saddam hussein
fall of the berlin wall
tiananmen square
thanksgiving day
american troops
manuel noriega
halabja
invasion of panama
gulf war
help
saudi arabia
united nations
berlin wall

Article #2
president george bush
george bush
mikhail gorbachev
soviet union
collapse
reunification of germany
thurgood marshall
union
clarence thomas
joint chiefs of staff
cold war
manuel antonio noriega
iraq
george
nonaggression pact
david h souter
antonio noriega
president george

January 30, 2006 1 Comment