Category Archives: Web

Critical Elements of Web Culture Scholars Should Understand

The Scholars’ Lab at the University of Virginia has posted audio recordings of sessions from “The Humanities in a Digital Age,” a symposium that took place in November at UVA’s new Institute of the Humanities and Global Cultures. My keynote at the symposium was entitled “Humanities Scholars and the Web: Past, Present, Future,” and focused on what I believe are three critical elements of the web that scholars tend to overlook, or that cause concern because they upset certain academic conventions:

1) The openness and standards of the web produce generative platforms. The magic of the web is that from relatively simple technical specifications and interoperability arise an incredibly varied and constantly innovative set of genres. For those wedded to traditional forms such as the book and article, this can be difficult to understand and accept.

2) Interfaces shape genres. Tracing the history of web applications used to make blogs, from early link aggregators to the blank page of WordPress 3′s full-screen writing environment, shows this in action. Humanities blogs shifted in helpful ways over the last 15 years, into modes that should be more acceptable to the academy, as these interfaces changed. Being in control of these interfaces is important as we continue to develop online scholarship.

3) Communities define practice. Conventions around web genres are created by those participating in them. This has serious implications for what the academy might be able to do with the web in the future.

You can hear about these three main points and much more in the talk, which is available as a podcast or audio stream near the bottom of this page. Part of the talk comes from chapter 1 of The Ivory Tower and the Open Web.

The Ivory Tower and the Open Web: Introduction: Burritos, Browsers, and Books [Draft]

[A draft of the introduction to my forthcoming book, The Ivory Tower and the Open Web, which looks at academic resistance to the modes and genres of the web, and how those modes and genres might actually reinvigorate the academy. I'll be posting drafts of chapters as well for open comment and criticism.]

In the summer of 2007, Nate Silver decided to conduct a rigorous assessment of the inexpensive Mexican restaurants in his neighborhood, Chicago’s Wicker Park. Figuring that others might be interested in the results of his study, and that he might be able to use some feedback from an audience, he took his project online.

Silver had no prior experience in such an endeavor. By day he worked as a statistician and writer at Baseball Prospectus—an innovator, to be sure, having created a clever new standard for empirically measuring the value of players, an advanced form of the “sabermetrics” vividly described by Michael Lewis in Moneyball.1 But Silver had no experience as a food critic, nor as a web developer.

In time, his appetite took care of the former and the open web took care of the latter. Silver knit together a variety of free services as the tapestry for his culinary project. He set up a blog, The Burrito Bracket, using Google’s free Blogger web application. Weekly posts consisted of his visits to local restaurants, and the scores (in jalapeños) he awarded in twelve categories.

Home page of Nate Silver’s Burrito Bracket
Ranking system (upper left quadrant)

Being a sports geek, he organized the posts as a series of contests between two restaurants. Satisfying his urge to replicate March Madness, he modified another free application from Google, generally intended to create financial or data spreadsheets, to produce the “bracket” of the blog’s title.

Google Spreadsheets used to create the competition bracket

Like many of the savviest users of the web, Silver started small and improved the site as he went along. For instance, he had started to keep a photographic record of his restaurant visits and decided to share this documentary evidence. So he enlisted the photo-sharing site Flickr, creating an off-the-rack archive to accompany his textual descriptions and numerical scores. On August 15, 2007, he added a map to the site, geolocating each restaurant as he went along and color-coding the winners and losers.

Flickr photo archive for The Burrito Bracket (flickr.com)
Silver’s Google Map of Chicago’s Wicker Park (shaded in purple) with the location of each Mexican restaurant pinpointed

Even with its do-it-yourself enthusiasm and the allure of carne asada, Silver had trouble attracting an audience. He took to Yelp, a popular site for reviewing restaurants to plug The Burrito Bracket, and even thought about creating a Super Burrito Bracket, to cover all of Chicago.2 But eventually he abandoned the site following the climactic “Burrito Bowl I.”

With his web skills improved and a presidential election year approaching, Silver decided to try his mathematical approach on that subject instead—”an opportunity for a sort of Moneyball approach to politics,” as he would later put it.3 Initially, and with a nod to his obsession with Mexican food, he posted his empirical analyses of politics under the chili-pepper pseudonym “Poblano,” on the liberal website Daily Kos, which hosts blogs for its engaged readers.

Then, in March 2008, Silver registered his own web domain, with a title that was simultaneously and appropriately mathematical and political: fivethirtyeight.com, a reference to the total number of electors in the United States electoral college. He launched the site with a slight one-paragraph post on a recent poll from South Dakota and a summary of other recent polling from around the nation. As with The Burrito Bracket it was a modest start, but one that was modular and extensible. Silver soon added maps and charts to bolster his text.

FiveThirtyEight two months after launch, in May 2008

Nate Silver’s real name and FiveThiryEight didn’t remain obscure for long. His mathematical modeling of the competition between Barack Obama and Hillary Clinton for the Democratic presidential nomination proved strikingly, almost creepily, accurate. Clear-eyed, well-written, statistically rigorous posts began to be passed from browsers to BlackBerries, from bloggers to political junkies to Beltway insiders. From those wired early subscribers to his site, Silver found an increasingly large audience of those looking for data-driven, deeply researched analysis rather than the conventional reporting that presented political forecasting as more art than science.

FiveThiryEight went from just 800 visitors a day in its first month to a daily audience of 600,000 by October 2008.4 On election day, FiveThiryEight received a remarkable 3 
million 
visitors, more than most daily newspapers
.5

All of this attention for a site that most media coverage still called, with a hint of deprecation, a “blog,” or “aggregator” of polls, despite Silver’s rather obvious, if latent, journalistic skills. (Indeed, one of his roads not taken had been an offer, straight out of college, to become an assistant at The Washington Post.6 ) An article in the Colorado Daily on the emergent genre represented by FiveThirtyEight led with Ken Bickers, professor and chair of the political science department at the University of Colorado, saying that such sites were a new form of “quality blogs” (rather than, evidently, the uniformly second-rate blogs that had previously existed). The article then swerved into much more ominous territory, asking whether reading FiveThirtyEight and similar blogs was potentially dangerous, especially compared to the safe environs of the traditional newspaper. Surely these sites were superficial, and they very well might have a negative effect on their audience:

Mary Coussons-Read, a professor of psychology at CU Denver, says today’s quick turnaround of information helps to make it more compelling.

“Information travels so much more quickly,” she says. “(We expect) instant gratification. If people have a question, they want an answer.”

That real-time quality can bring with it the illusion that it’s possible to perceive a whole reality by accessing various bits of information.

“There’s this immediacy of the transfer of information that leads people to believe they’re seeing everything … and that they have an understanding of the meaning of it all,” she says.

And, Coussons-Read adds, there is pleasure in processing information.

“I sometimes feel like it’s almost a recreational activity and less of an information-gathering activity,” she says.

Is it addiction?

[Michele] Wolf says there is something addicting about all that data.

“I do feel some kind of high getting new information and being able to process it,” she says. “I’m also a rock climber. I think there are some characteristics that are shared. My addiction just happens to be information.”

While there’s no such mental-health diagnosis as political addiction, Jeanne White, chemical dependency counselor at Centennial Peaks Hospital in Louisville, says political information seeking could be considered an addictive process if it reaches an extreme.7

This stereotype of blogs as the locus of “information” rather than knowledge, of “recreation” rather than education, was—and is—a common one, despite the wide variety of blogs, including many with long-form, erudite writing. Perhaps in 2008 such a characterization of FiveThirtyEight was unsurprising given that Silver’s only other credits to date were the Player Empirical Comparison and Optimization Test Algorithm (PECOTA) and The Burrito Bracket. Clearly, however, here was an intelligent researcher who had set his mind on a new topic to write about, with a fresh, insightful approach to the material. All he needed was a way to disseminate his findings. His audience appreciated his extraordinarily clever methods—at heart, academic techniques—for cutting through the mythologies and inadequacies of standard political commentary. All they needed was a web browser to find him.

A few journalists saw past the prevailing bias against non-traditional outlets like FiveThirtyEight. In the spring of 2010, Nate Silver bumped into Gerald Marzorati, the editor of the New York Times Magazine, on a train platform in Boston. They struck up a conversation, which eventually turned into a discussion about how FiveThirtyEight might fit into the universe of the Times, which ultimately recognized the excellence of his work and wanted FiveThirtyEight to enhance their political reporting and commentary. That summer, a little more than two years after he had started FiveThirtyEight, Silver’s “blog” merged into the Times under a licensing deal.8 In less time than it takes for most students to earn a journalism degree, Silver had willed himself into writing for one of the world’s premier news outlets, taking a seat in the top tier of political analysis. A radically democratic medium had enabled him to do all of this, without the permission of any gatekeeper.

FiveThirtyEight on the New York Times website, 2010

* * *

 

The story of Nate Silver and FiveThirtyEight has many important lessons for academia, all stemming from the affordances of the open web. His efforts show the do-it-yourself nature of much of the most innovative work on the web, and how one can iterate toward perfection rather than publishing works in fully polished states. His tale underlines the principle that good is good, and that the web is extraordinarily proficient at finding and disseminating the best work, often through continual, post-publication, recursive review. FiveThirtyEight also shows the power of openness to foster that dissemination and the dialogue between author and audience. Finally, the open web enables and rewards unexpected uses and genres.

Undoubtedly it is true that the path from The Burrito Bracket to The New York Times may only be navigated by an exceptionally capable and smart individual. But the tools for replicating Silver’s work are just as open to anyone, and just as powerful. It was with that belief, and the desire to encourage other academics to take advantage of the open web, that Roy Rosenzweig and I wrote Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web.9 We knew that the web, although fifteen years old at the time, was still somewhat alien to many professors, graduate students, and even undergraduates (who might be proficient at texting but know nothing about HTML), and we wanted to make the medium more familiar and approachable.

What we did not anticipate was another kind of resistance to the web, based not on an unfamiliarity with the digital realm or on Luddism but on the remarkable inertia of traditional academic methods and genres—the more subtle and widespread biases that hinder the academy’s adoption of new media. These prejudices are less comical, and more deep-seated, than newspapers’ penchant for tales of internet addiction. This resistance has less to do with the tools of the web and more to do with the web’s culture. It was not enough for us to conclude Digital History by saying how wonderful the openness of the web was; for many academics, this openness was part of the problem, a sign that it might be like “playing tennis with the net down,” as my graduate school mentor worriedly wrote to me.10

In some respects, this opposition to the maximal use of the web is understandable. Almost by definition, academics have gotten to where they are by playing a highly scripted game extremely well. That means understanding and following self-reinforcing rules for success. For instance, in history and the humanities at most universities in the United States, there is a vertically integrated industry of monographs, beginning with the dissertation in graduate school—a proto-monograph—followed by the revisions to that work and the publication of it as a book to get tenure, followed by a second book to reach full professor status. Although we are beginning to see a slight liberalization of rules surrounding dissertations—in some places dissertations could be a series of essays or have digital components—graduate students infer that they would best be served on the job market by a traditional, analog monograph.

We thus find ourselves in a situation, now more than two decades into the era of the web, where the use of the medium in academia is modest, at best. Most academic journals have moved online but simply mimic their print editions, providing PDF facsimiles for download and having none of the functionality common to websites, such as venues for discussion. They are also largely gated, resistant not only to access by the general public but also to the coin of the web realm: the link. Similarly, when the Association of American University Presses recently asked its members about their digital publishing strategies, the presses tellingly remained steadfast in their fixation on the monograph. All of the top responses were about print-on-demand and the electronic distribution and discovery of their list, with a mere footnote for a smattering of efforts to host “databases, wikis, or blogs.”11 In other words, the AAUP members see themselves almost exclusively as book publishers, not as publishers of academic work in whatever form that may take. Surveys of faculty show comfort with decades-old software like word processors but an aversion to recent digital tools and methods.12 The professoriate may be more liberal politically than the most latte-filled ZIP code in San Francisco, but we are an extraordinarily conservative bunch when in comes to the progression and presentation of our own work. We have done far less than we should have by this point in imagining and enacting what academic work and communication might look like if it was digital first.

To be sure, as William Gibson has famously proclaimed, “The future is already here—it’s just not very evenly distributed.”13 Almost immediately following the advent of the web, which came out of the realm of physics, physicists began using the Los Alamos National Laboratory preprint server (later renamed ArXiv and moved to arXiv.org) to distribute scholarship directly to each other. Blogging has taken hold in some precincts of the academy, such as law and economics, and many in those disciplines rely on web-only outlets such as the Social Science Research Network. The future has had more trouble reaching the humanities, and perhaps this book is aimed slightly more at that side of campus than the science quad. But even among the early adopters, a conservatism reigns. For instance, one of the most prominent academic bloggers, the economist Tyler Cowen, still recommends to students a very traditional path for their own work.14 And far from being preferred by a large majority of faculty, quests to open scholarship to the general public often meet with skepticism.15

If Digital History was about the mechanisms for moving academic work online, this book is about how the digital-first culture of the web might become more widespread and acceptable to the professoriate and their students. It is, by necessity, slightly more polemical than Digital History, since it takes direct aim at the conservatism of the academy that twenty years of the web have laid bare. But the web and the academy are not doomed to an inevitable clash of cultures. Viewed properly, the open web is perfectly in line with the fundamental academic goals of research, sharing of knowledge, and meritocracy. This book—and it is a book rather than a blog or stream of tweets because pragmatically that is the best way to reach its intended audience of the hesitant rather than preaching to the online choir—looks at several core academic values and asks how we can best pursue them in a digital age.

First, it points to the critical academic ability to look at any genre without bias and asks whether we might be violating that principle with respect to the web. Upon reflection many of the best things we discover in scholarship are found by disregarding popularity and packaging, by approaching creative works without prejudice. We wouldn’t think much of the meandering novel Moby-Dick if Carl Van Doren hadn’t looked past decades of mixed reviews to find the genius in Melville’s writing. Art historians have similarly unearthed talented artists who did their work outside of the royal academies and the prominent schools of practice. As the unpretentious wine writer Alexis Lichine shrewdly said in the face of fancy labels and appeals to mythical “terroir”: “There is no substitute for pulling corks.”16

Good is good, no matter the venue of publication or what the crowd thinks. Scholars surely understand that on a deep level, yet many persist in the valuing venue and medium over the content itself. This is especially true at crucial moments, such as promotion and tenure. Surely we can reorient ourselves to our true core value—to honor creativity and quality—which will still guide us to many traditionally published works but will also allow us to consider works in some nontraditional venues such as new open access journals or articles written and posted on a personal website or institutional repository, or digital projects.

The genre of the blog has been especially cursed by this lack of open-mindedness from the academy. Chapter 1, “What is a Blog?”, looks at the history of the blog and blogging, the anatomy and culture of a genre that is in many ways most representative of the open web. Saddled with an early characterization as being the locus of inane, narcissistic writing, the blog has had trouble making real inroads in academia, even though it is an extraordinarily flexible form and the perfect venue for a great deal of academic work. The chapter highlights some of the best examples of academic blogging and how they shape and advance arguments in a field. We can be more creative in thinking about the role of the blog within the academy, as a venue for communicating our work to colleagues as well as to a lay audience beyond the ivory tower.

This academic prejudice against the blog extends to other genres that have proliferated on the open web. Chapter 2, “Genres and the Open Web,” examines the incredible variety of those new forms, and how, with a careful eye, we might be able to import some of them profitably into the academy. Some of these genres, like the wiki, are well-known (thanks to Wikipedia, which academics have come to accept begrudgingly in the last five years). Other genres are rarer but take maximal advantage of the latitude of the open web: its malleability and interactivity. Rather than imposing the genres we know on the web—as we do when we post PDFs of print-first journal articles—we would do well to understand and adopt the web’s native genres, where helpful to scholarly pursuits.

But what of our academic interest in validity and excellence, enshrined in our peer review system? Chapter 3, “Good is Good,” examines the fundamental requirements of any such system: the necessity of highlighting only a minority of the total scholarly output, based on community standards, and of disseminating that minority of work to communities of thought and practice. The chapter compares print-age forms of vetting with native web forms of assessment and review, and proposes ways that digital methods can supplement—or even replace—our traditional modes of peer review.

“The Value, and Values, of Openness,” Chapter 4, broadly examines the nature of the web’s openness. Oddly, this openness is both the easiest trait of the web to understand and its most complex, once one begins to dig deeper. The web’s radical openness not only has led to calls for open access to academic work, which has complicated the traditional models of scholarly publishers and societies; it has also challenged our academic predisposition toward perfectionism—the desire to only publish in a “final” format, purged (as much as possible) of error. Critically, openness has also engendered unexpected uses of online materials—for instance, when Nate Silver refactored poll numbers from the raw data polling agencies posted.

Ultimately, openness is at the core of any academic model that can operate effectively on the web: it provides a way to disseminate our work easily, to assess what has been published, and to point to what’s good and valuable. Openness can naturally lead—indeed, is leading—to a fully functional shadow academic system for scholarly research and communication that exists beyond the more restrictive and inflexible structures of the past.

[Update, 7/29/11: I've answered Zach Schrag's criticism about the disciplinary scope of the book in a new paragraph beginning with "To be sure, as William Gibson..."]

[Update, 8/1/11: Added more about "good is good," beginning with the line on Alexis Lichine and continuing through the following paragraph, to address Sylvia Miller's point about promotion and tenure. Also fixed a few points of grammar, thanks to Sherman Dorn.]

  1. Nate Silver, “Introducing PECOTA,” in Gary Huckabay, Chris Kahrl, Dave Pease et al., eds., Baseball Prospectus 2003 (Dulles, VA: Brassey’s Publishers, 2003): 507-514. Michael Lewis, Moneyball: The Art of Winning an Unfair Game (New York: W. W. Norton & Company, 2004). []
  2. Frequently Asked Questions, The Burrito Bracket, http://burritobracket.blogspot.com/2007/07/faq.html []
  3. http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf []
  4. Adam Sternbergh, The Spreadsheet Psychic, New York, Oct 12, 2008, http://nymag.com/news/features/51170/ []
  5. http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf []
  6. http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf []
  7. Cindy Sutter, “Hooked on information: Can political news really be addicting?” The Colorado Daily, November 3, 2008, http://www.coloradodaily.com/ci_13105998 []
  8. Nate Silver, “FiveThirtyEight to Partner with New York Times, http://www.fivethirtyeight.com/2010/06/fivethirtyeight-to-partner-with-new.html []
  9. Daniel J. Cohen and Roy Rosenzweig, Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web (University of Pennsylvania Press, 2006). []
  10. http://www.dancohen.org/2010/11/11/frank-turner-on-the-future-of-peer-review/ []
  11. Association of American University Presses, “Digital Publishing in the AAUP Community; Survey Report: Winter 2009-2010,” http://aaupnet.org/resources/reports/0910digitalsurvey.pdf, p. 2 []
  12. See, for example, Robert B. Townsend, “How Is New Media Reshaping the Work of Historians?”, Perspectives on History, November 2010, http://www.historians.org/Perspectives/issues/2010/1011/1011pro2.cfm []
  13. National Public Radio, “Talk of the Nation” radio program, 30 November 1999, timecode 11:55, http://discover.npr.org/features/feature.jhtml?wfId=1067220 []
  14. “Tyler Cowen: Academic Publishing,” remarks at the Institute for Humane Studies Summer Research Fellowship weekend seminar, May 2011, http://vimeo.com/24124436 []
  15. Open access mandates have been tough sells on many campuses, passing only by slight majorities or failing entirely. For instance, such a mandate was voted down at the University of Maryland, with evidence of confusion and ambivalence. http://scholarlykitchen.sspnet.org/2009/04/28/umaryland-faculty-vote-no-oa/ []
  16. Quoted in Frank J. Prial, “Wine Talk,” New York Times, 17 August 1994, http://www.nytimes.com/1994/08/17/garden/wine-talk-983519.html. []

Video: The Ivory Tower and the Open Web

Here’s the video of my plenary talk “The Ivory Tower and the Open Web,” given at the Coalition for Networked Information meeting in Washington in December, 2010. A general description of the talk:

The web is now over twenty years old, and there is no doubt that the academy has taken advantage of its tremendous potential for disseminating resources and scholarship. But a full accounting of the academic approach to the web shows that compared to the innovative vernacular forms that have flourished over the past two decades, we have been relatively meek in our use of the medium, often preferring to impose traditional ivory tower genres on the web rather than import the open web’s most successful models. For instance, we would rather digitize the journal we know than explore how blogs and social media might supplement or change our scholarly research and communication. What might happen if we reversed that flow and more wholeheartedly embraced the genres of the open web?

I hope the audience for this blog finds it worthy viewing. I enjoyed talking about burrito websites, Layer Tennis, aggregation and curation services, blog networks, Aaron Sorkin’s touchiness, scholarly uses of Twitter, and many other high- and low-brow topics all in one hour. (For some details in the images I put up on the screen, you might want to follow along with this PDF of the slides.) I’ll be expanding on the ideas in this talk in an upcoming book with the same title.

Eliminating the Power Cord

[My live talk at the Shape of Things to Come conference at the University of Virginia, March 27, 2010. It is a riff on a paper that will come out in the proceedings of the conference.]

As I noted in my paper for this conference, what I find interesting about this panel is that we got a chance to compare two projects by Ken Price: the Walt Whitman Archive and Civil War Washington. How their plans and designs differ tell us something about all digital humanities projects. I want to spend my brief time spinning out further what I said in the paper about control, flexibility, creativity, and reuse. It’s a tale of the tension between content creators and content users.

But before I get to Ken’s work, I’d like to start with another technological humanist, Jef Raskin, one of the first employees of Apple Computer and the designer, with Steve Jobs, of the first Macintosh. Just read the principles Raskin lays out in 1979 in “Design Considerations for an Anthropophilic Computer”:

This is an outline for a computer designed for the Person In The Street (or, to abbreviate: the PITS); one that will be truly pleasant to use, that will require the user to do nothing that will threaten his or her perverse delight in being able to say: “I don’t know the first thing about computers.”

You might think that any number of computers have been designed with these criteria in mind, but not so. Any system which requires a user to ever see the interior, for any reason, does not meet these specifications. There must not be additional ROMS, RAMS, boards or accessories except those that can be understood by the PITS as a separate appliance. As a rule of thumb, if an item does not stand on a table by itself, and if it does not have its own case, or if it does not look like a complete consumer item in [and] of itself, then it is taboo.

If the computer must be opened for any reason other than repair (for which our prospective user must be assumed incompetent) even at the dealer’s, then it does not meet our requirements.

Seeing the guts is taboo. Things in sockets is taboo. Billions of keys on the keyboard is taboo. Computerese is taboo. Large manuals, or many of them is taboo.

There must not be a plethora of configurations. It is better to manufacture versions in Early American, Contemporary, and Louis XIV than to have any external wires beyond a power cord.

And you get ten points if you can eliminate the power cord.

Many digital humanities projects implicitly believe strongly in Raskin’s design principle. They take care of what to the content creators and designers seems like hard and annoying work for the end users, freeing those users “to do what they do best.” These editorial projects bring together at once primary sources, middleware, user interfaces, and even tools.

Like the Macintosh, this can be a very good thing. I mostly agree with what Ken has just said, that in the case of Whitman, we probably cannot rely on a loose network of sites to provide canonical texts. Moreover, students new to Walt Whitman can clearly use the contextualization and criticism Ken and his colleagues provide on the Walt Whitman site. Similarly, scholars dipping for the first time into ethnomusicology will appreciate the total research environment provided by EVIA. As Matt Kirschenbaum noted in the last session, good user interfaces can enable new interpretations. I doubt that many scholars would be able to do Hypercities-grade geographical scholarship without a centralized Hypercities site.

But at the same time, like Raskin, sometimes these projects strive too hard to eliminate the power cord.

Raskin thought that the perfect computer would enable creativity at the very surface of the appliance. Access to the guts would not be permitted because to allow so would hinder the capacity of the user to be creative. The computer designers would take care of all of the creativity from the base of the hardware to the interface. But as Bethany Nowviskie discussed this morning, design decisions and user interface embody an argument. And so they also imply control. It’s worth thinking about the level of control the creators assume in each digital humanities project.

I would like to advance this principle: Scholars have uses for edited collections that the editors cannot anticipate. One of the joys of server logs is that we can actually see that principle in action (whereas print editorial projects have no idea how their volumes are being used, except in footnotes many years later). In the September 11 Digital Archive we assumed as historians that all uses of the archive would be related to social history. But we discovered later that many linguists were using the archive to study teen slang at the turn of the century, because it was a large open database that held many stories by teens. Anyone creating resources to serve scholars and scholarship needs to account for these unanticipated uses.

When we think through the principle of unanticipated uses, we begin to realize that there is a push and pull between the scholar and the editor. It is perhaps not a zero sum game, but surely there is a tension between the amount of intellectual work each party gets to do. Editors that put a major intellectual stamp on their collection through data massaging and design and user tools restrict the ability of the scholar to do flexible work on it. Alan Burdette of EVIA was thinking of this when he spoke about his fear of control vs. dynamism this morning.

Are digital humanities projects prepared to separate their interfaces from their primary content? What if Hypercities was just a set of KML files like Phil Ethington’s KML files of LA geography? What about the Grub Street Project? Or Ken’s Civil War Washington? This is a hard question for digital projects—freeing their content for reuse.

I believe Ken’s two projects, one a more traditional editorial project and one a labor of love, struggle with how much intellectual work to cede to the end user. Both projects have rather restrictive terms of use pages and admonishments about U.S. copyright law. Maybe I’m reading something into the terms of use page for Civil War Washington site, but it seems more half-hearted. You can tell that here is a project that isn’t a holding place for fixed perfected primary resources like Whitman’s, but an evolving scholarly discussion that could easily involve others.

Why not then allow for the download of all the data on the site? I don’t think it would detract from Civil War Washington; indeed, it would probably increase the profile of the site. The site would not only have its own interpretations, but allow for other interpretations—off of the site. Why not let others have access to the guts that Raskin wished to cloak? This is the way networked scholarship works. And this is, I believe, what Roger Bagnall was getting at yesterday when he said “we need to think about the death of the [centralized website] project” as the greater success of digital humanities.

Jim Chandler and I have been formulating a rule of thumb for these editorial projects: the more a discipline is secure in its existence, its modes of interpretation, and its methods of creating scholarship, the more likely it is to produce stripped-down, exchangeable data sets. Thus scholars in papyrology just want to get at the raw sources; they would be annoyed by a Mac-like interface or silo.  They have achieved what David Weinberger, in summarizing the optimal form of the web, called “small pieces, loosely joined.”

On the other hand, the newer and less confident disciplines, such as the digital geographic history of Civil War Washington, Hypercities, and Grub Street feel that they need to have a Raskin-like environment—it’s part of the process of justifying their existence. They feel pressure to be judge, jury and executioner. If the Cohen-Chandler law holds true, we will see in the future fewer fancy interfaces and more direct, portable access to humanities materials.

Of course, as I note in my paper, the level of curation apparent in a digital project is related to the question of credit. The Whitman archive feels like a traditional editorial project and thus worthy of credit. If Ken instead produced KML files and raw newspaper scans, he would likely get less credit than a robust, comprehensive site like Civil War Washington.

The irony about the long-suffering debate about credit is that every day humanities scholars deal with complexity, parsing complicated texts, finding meaning in the opaque. And yet somehow when it comes to self-assessment, we are remarkably simple-minded. If we can understand Whitman’s Leaves of Grass, surely we can tease out questions of credit and the intellectual work that goes into, say, complex KML files.

To help spur this transition along, Christine Madsen has made this weekend the important point that the separation of interface and data makes sustainability models easier to imagine (and suggests a new role for libraries). If art is long and life is short, data is longish and user interfaces are fleeting. Just look at how many digital humanities projects that rely on Flash are about to become useless on millions of iPads.

Finally, on sustainability, I made a comparison in my paper between the well-funded Whitman archive and the Civil War Washington site, which was produced through sweat equity. I believe that Ken has a trump card with the latter. Being a labor of love is worth thinking about, because it’s often the way that great scholarship happens. Scholars in the humanities are afflicted with an obsession that makes them wake up in the morning and research and write about topics that drive them and constantly occupy their thoughts. Scholars naturally want to spend their time doing things like Civil War Washington. Being a labor of love is often the best sustainability model.

Data on How Professors Use Technology

Rob Townsend, the Assistant Director of Research and Publications at the American Historical Association and the author of many insightful (and often indispensible) reports about the state of higher education, writes with some telling new data from the latest National Study of Postsecondary Faculty (conducted by the U.S. Department of Education roughly every five years since 1987). Rob focused on several questions about the use of technology in colleges and universities. The results are somewhat surprising and thought-provoking.

Here are two relatively new questions, exactly as they are written on the survey form (including the boldface in the first question; more on that later), which you can download from the Department of Education website. “[FILL INSTNAME]” is obviously replaced in the actual questionnaire by the faculty member’s institution.

Q39. During the 2003 Fall Term at [FILL INSTNAME], did you have one or more web sites for any of your teaching, advising, or other instructional duties? (Web sites used for instructional duties might include the syllabus, readings, assignments, and practice exams for classes; might enable communication with students via listservs or online forums; and might provide real-time computer-based instruction.)

Q41: During the 2003 Fall Term at [FILL INSTNAME], how many hours per week did you spend
communicating by e-mail (electronic mail) with your students?

Using the Department of Education’s web service to create bar graphs from their large data set, Rob generated these two charts:

Rob points out that historians are on the low end of e-mail usage in the academy, though it seems not too far off from other disciplines in the humanities and social sciences. A more statistically significant number to get (and probably impossible using this data set) would be the time spent on e-mail per student, since the number of students varies widely among the disciplines. [Update: Within hours of this post Rob had crunched the numbers and came up with an average of 2 minutes per student for history instructors (average of 83 students divided by 2.8 hours spent writing e-mail per week).]

For me, the surprising chart is the first one, on the adoption of the web in teaching, advising, or other instructional duties. Only about a 5-10% rise in the use of the web from 1998 to 2003 for most disciplines, and a decline for English and Literature? This, during a period of enormous, exponential growth in the web, a period that also saw many institutions of higher education mandate that faculty put their syllabi on the Internet (often paying for expensive course management software to do so)?

I have two theories about this chart, with the possibility that both theories are having an effect on the numbers. First, I wonder if that boldfaced “you” in Q39 made a number of professors answer “no” if technically they had someone else (e.g., a teaching assistant or department staffer) put their syllabus or other course materials online. I did some further research after hearing from Rob and noticed that buried in the 1998 survey questionnaire was a slightly different wording, with no boldface: “During the 1998 Fall Term, did you have websites for any of the classes you taught?” Maybe those wordsmiths in English and Literature were parsing the language of the 2003 question a little too closely (or maybe they were just reading it correctly, unlike faculty members from the other disciplines).

My second theory is a little more troubling for cyber-enthusiasts who believe that the Internet will take over the academy in the next decade, fully changing the face of research and instruction. Take a look at this chart from the Pew Internet and American Life Project:

Note how after an initial surge in Internet adoption in the late 1990s the rate of growth has slowed considerably. A minority, small but significant, will probably never adopt the Internet as an important, daily medium of interaction and information. If we believe the Department of Education numbers, within this minority is apparently a sizable segment of professors. According to additional data extracted by Rob Townsend, it looks like this segment is about 16% of history professors and about 21% of English and Literature professors. (These are faculty members who in the fall of 2003 did not use e-mail or the web at all in their instruction.) Remarkably, among all disciplines about a quarter (24.2%) of the faculty fall into this no-tech group. Seems to me it’s going to be a long, long time before that number is reduced to zero.

Kojo Nnamdi Show Questions

Roy Rosenzweig and I had a terrific time on The Kojo Nnamdi Show today. If you missed the radio broadcast you can listen to it online on the WAMU website. There were a number of interesting calls from the audience, and we promised several callers that we would answer a couple of questions off the air; here they are.

Barbara from Potomac, MD asks, “I’m wondering whether new products that claim to help compress and organize data (I think one is called “C-Gate” [Kathy, an alert reader of his blog, has pointed out that Barbara probably means the giant disk drive company Seagate]) help out [to solve the problem of storing digital data for the long run]? The ads claim that you can store all sorts of data—from PowerPoint presentations and music to digital files—in a two-ounce standalone disk or other device.”

As we say in the book, we’re skeptical of using rare and/or proprietary formats to store digital materials for the long run. Despite the claims of many companies about new and novel storage devices, it’s unclear whether these specialized devices will be accessible in ten or a hundred years. We recommend sticking with common, popular formats and devices (at this point, probably standard hard drives and CD- or DVD-ROMs) if you want to have the best odds of preserving your materials for the long run. The National Institute of Standards and Technology (NIST) provides a good summary of how to store optical media such as CDs and DVDs for long periods of time.

Several callers asked where they could go if they have materials on old media, such as reel-to-reel or 8-track tapes, that they want to convert to a digital format.

You can easily find online some of the companies we mentioned that will (for a fee) transfer your own media files onto new devices. Google for the media you have (e.g., “8-track tape”) along with the words “conversion services” or “transfer services.” I probably overestimated the cost for these services; most conversions will cost less than $100 per tape. However, the older the media the more expensive it will be. I’ll continue to look into places in the Washington area that might provide these services for free, such as libraries and archives.

Digital History on The Kojo Nnamdi Show

From the shameless plug dept.: Roy Rosenzweig and I will be discussing our book Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web this Tuesday, January 10, on The Kojo Nnamdi Show. The show is produced at Washington’s NPR station, WAMU. We’re on live from noon to 1 PM EST, and you’ll be able to ask us questions by phone (1-800-433-8850), via email (kojo@wamu.org), or through the web. The show will be replayed from 8-9 PM EST on Tuesday night, and syndicated via iTunes and other outlets as part of NPR’s terrific podcast series (look for The Kojo Nnamdi Show/Tech Tuesday). You’ll also be able to get the audio stream directly from the show’s website. I’ll probably answer some additional questions from the audience in this space.

Creating a Blog from Scratch, Part 5: What is XHTML, and Why Should I Care?

In prior posts in this series (1, 2, 3, and 4), I described with some glee my rash abandonment of common blogging software in favor of writing my own. For my purposes there seemed to be some key disadvantages to these popular packages, including an overemphasis on the calendar (I just saw the definition of a blog at the South by Southwest Interactive Festival—”a page with dated entries”—which, to paraphrase Woody Allen, is like calling War and Peace “a book about Russia”), a sameness to their designs, and comments that are rarely helpful and often filled with spam. But one of the greatest advantages of recent blog software packages is that they generally write standards-compliant code. More specifically, blog software like WordPress automatically produces XHTML. Some of you might be asking, what is XHTML, and who cares? And why would I want to spend a great deal of effort ensuring that this blog complied strictly with this language?

The large digital library contingent that reads this blog could probably enumerate many reasons why XHTML compliance is important, but I had two reasons in mind when I started this blog. (Actually, I had a third, more secretive reason that I’ll mention first: Roy Rosenzweig and I argue in our book Digital History that XHTML will likely be critical for digital humanists to adhere to in the future—don’t want to be accused of being a hypocrite.) For those for whom web acronyms are Greek, XHTML is a sibling of XML, a more rigorously structured and flexible language than the HTML that underlies most of the web. XHTML is better prepared than HTML to be platform-independent; because it separates formatting from content, XHTML (like XML) can be reconfigured easily for very different environments (using, e.g., different style sheets). HTML, with formatting and content inextricably combined, for the most part assumes that you are using a computer screen and a web browser. Theoretically XHTML can be dynamically and instantaneously recast to work on many different devices (including a personal computer). This flexibility is becoming an increasingly important feature as people view websites on a variety of platforms (not just a normal computer screen, e.g., but cell phones or audio browsers for the blind). Indeed, according to the server logs for this blog, 1.6% of visitors are using a smart phone, PDA, or other means to read this blog, a number that will surely grow. In short, XHTML seems better prepared than regular HTML to withstand the technological changes of the coming years, and theoretically should be more easily preserved than older methods of displaying information on the web. For these and other reasons a 2001 report the Smithsonian commissioned recommended the institution move to XHTML from HTML.

Of course, with standards compliance comes extra work. (And extra cost. Just ask webmasters at government agencies trying to make their websites comply with Section 508, the mandatory accessibility rules for federal information resources.) Aside from a brief flirtation with the what-you-see-is-what-you-get, write-the-HTML-for-you program Dreamweaver in the late 1990s, I’ve been composing web pages using a text editor (the superb BBEdit) for over ten years, so my hands are used to typing certain codes in HTML, in the same way you get used to a QWERTY keyboard. XHTML is not that dissimilar from HTML, but it still has enough differences to make life difficult for those used to HTML. You have to remember to close every tag; some attributes related to formating are in strange new locations. One small example of the minor infractions I frequently trip up on writing XHTML: the oft-used break tag to add a line to a web page must “close itself” by adding a slash before the end bracket (not <br>, but <br />). But I figured doing this blog would give me a good incentive to start writing everything in strict XHTML.

Yeah, right. I clearly haven’t been paying enough attention to detail. The page you’re reading likely still has dozens of little coding errors that make it fail strict compliance with the World Wide Web Consortium’s XHTML standard. (If you would like a humbling experience that brings to mind receiving a pop quiz back from your third-grade teacher with lots of red ink on it, try the W3C’s XHTML Validator.) I haven’t had enough time to go back and correct all of those little missing slashes and quotation marks. WordPress users out there can now begin their snickering; their blog software does such mundane things for them, and many proudly (and annoyingly) display little “XHTML 1.0 compliant” badges on their sites. Go ahead, rub it in.

After I realized that it would take serious effort to bring my code up to code, so to speak, I sat back and did the only thing I could do: rationalize. I didn’t really need strict XHTML compliance because through some design slight-of-hand I had already been able to make this blog load well on a wide range of devices. I learned from other blog software that if you put the navigation on the right rather than the more common left you see on most websites, the body of each post shows up first on a PDA or smart phone. It also means that blind visitors don’t have to suffer through a long list of your other posts before getting to the article they want to read.

As far as XHTML is concerned, I’ll be brushing up on that this summer. Unless I move this blog to WordPress by then.

Part 6: One Year Later

Nature Compares Science Entries in Wikipedia with Encyclopaedia Britannica

In an article published tomorrow, but online now, the journal Nature reveals the results of a (relatively small) study it conducted to compare the accuracy of Wikipedia with Encyclopaedia Britannica—at least in the natural sciences. The results may strike some as surprising.

As Jim Giles summarizes in the special report: “Among 42 entries tested, the difference in accuracy was not particularly great: the average science entry in Wikipedia contained around four inaccuracies; Britannica, about three…Only eight serious errors, such as misinterpretations of important concepts, were detected in the pairs of articles reviewed, four from each encyclopaedia. But reviewers also found many factual errors, omissions or misleading statements: 162 and 123 in Wikipedia and Britannica, respectively.”

These results, obtained by sending experts such as the Princeton historian of science Michael Gordin matching entries from the democratic/anarchical online source and the highbrow, edited reference work and having them go over the articles with a fine-toothed comb, should feed into the current debate over the quality of online information. My colleague Roy Rosenzweig has written a much more in-depth (and illuminating) comparison of Wikipedia with print sources in history, due out next year in the Journal of American History, which should spark an important debate in the humanities. I suspect that the Wikipedia articles in history are somewhat different than those in the sciences—it seems from Nature‘s survey that there may be more professional scientists contributing to Wikipedia than professional historians—but couple of the basic conclusions are the same: the prose on Wikipedia is not so terrific but most of its facts are indeed correct, to a far greater extent than Wikipedia’s critics would like to admit.