Category Archives: Open Access

The Journal of Digital Humanities Hits Full Stride

If you haven’t checked out the Journal of Digital Humanities yet, now’s the time to do so. My colleagues Joan Fragaszy Troyano, Jeri Wieringa, and Sasha Hoffman, along with our new editors-at-large and the many scholars who have taken democratic ownership of this open-access journal, have quickly gotten the production model down to a science. There’s also an art to it, as you can see from these shots of the new issue (thanks, Sasha!):

  

  

As I’ve explained in this space before, there is no formal submission process for the journal. Instead, we look to “catch the good” from across the open web, and take the very best of the good to develop into JDH on a quarterly basis. We believe this leads not only to a high-quality journal that can hold its own against submit-and-wait academic serials, but provides a better measure of what’s important to, and engaging, the entire digital humanities community.

But don’t take my word for it; judge for yourself at the Journal of Digital Humanities website, and pick your favorite format to read the journal in: HTML, ePub, iBook, or PDF.

Treading Water on Open Access

A statement from the governing council of the American Historical Association, September 2012:

The American Historical Association voices concerns about recent developments in the debates over “open access” to research published in scholarly journals. The conversation has been framed by the particular characteristics and economics of science publishing, a landscape considerably different from the terrain of scholarship in the humanities. The governing Council of the AHA has unanimously approved the following statement. We welcome further discussion…

In today’s digital world, many people inside and outside of academia maintain that information, including scholarly research, wants to be, and should be, free. Where people subsidized by taxpayers have created that information, the logic of free information is difficult to resist…

The concerns motivating these recommendations are valid, but the proposed solution raises serious questions for scholarly publishing, especially in the humanities and social sciences.

A statement from Roy Rosenzweig, the Vice President of Research of the American Historical Association, in May 2005:

Historical research also benefits directly (albeit considerably less generously [than science]) through grants from federal agencies like the National Endowment for the Humanities; even more of us are on the payroll of state universities, where research support makes it possible for us to write our books and articles. If we extend the notion of “public funding” to private universities and foundations (who are, of course, major beneficiaries of the federal tax codes), it can be argued that public support underwrites almost all historical scholarship.

Do the fruits of this publicly supported scholarship belong to the public? Should the public have free access to it? These questions pose a particular challenge for the AHA, which has conflicting roles as a publisher of history scholarship, a professional association for the authors of history scholarship, and an organization with a congressional mandate to support the dissemination of history. The AHA’s Research Division is currently considering the question of open—or at least enhanced—access to historical scholarship and we seek the views of members.

Two requests for comment from the AHA on open access, seven years apart. In 2005, the precipitating event for the AHA’s statement was the NIH report on “Enhancing Public Access to Publications Resulting from NIH-Funded Research”; yesterday it was the Finch report on “Accessibility, sustainability, excellence: how to expand access to research publications” [pdf]. History has repeated itself.

We historians have been treading water on open access for the better part of a decade. This is not a particular failure of our professional organization, the AHA; it’s a collective failure by historians who believe—contrary to the lessons of our own research—that today will be like yesterday, and tomorrow like today. Article-centric academic journals, a relatively recent development in the history of publishing, apparently have existed, and will exist, forever, in largely the same form and with largely the same business model.

We can wring our hands about open access every seven years when something notable happens in science publishing, but there’s much to be said for actually doing something rather than sitting on the sidelines. The fact is that the scientists have been thinking and discussing but also doing for a long, long time. They’ve had a free preprint service for articles since the beginning of the web in 1991. In 2012, our field has almost no experience with how alternate online models might function.

If we’re solely concerned with the business model of the American Historical Review (more on that focus in a moment), the AHA had on the table possible economic solutions that married open access with sustainability over seven years ago, when Roy wrote his piece. Since then other creative solutions have been proposed. I happen to prefer the library consortium model, in which large research libraries who are already paying millions of dollars for science journals are browbeaten into ponying up a tiny fraction of the science journal budget to continue to pay for open humanities journals. As a strong believer in the power of narcissism and shame, I could imagine a system in which libraries that pay would get exalted patron status on the home page for the journal, while free riders would face the ignominy of a red bar across the top of the browser when viewed on a campus that dropped support once the AHR went open access. (“You are welcome to read this open scholarship, but you should know that your university is skirting its obligation to the field.” The Shame Bar could be left off in places that cannot afford to pay.)

Regardless of the method and the model, the point is simply that we haven’t tried very hard. Too many of my colleagues, in the preferred professorial mode of focusing on the negative, have highlighted perceived problems with open access without actually engaging it. Yet somehow over 8,000 open access journals have flourished in the last decade. If the AHA’s response is that those journals aren’t flagship journals, well, I’m not sure that’s the one-percenter rhetoric they want to be associated with as representatives of the entire profession.

Furthermore, if our primary concern is indeed the economics of the AHR, wouldn’t it be fair game to look at the full economics of it—not just the direct costs on AHA’s side (“$460,000 to support the editorial processes”), but the other side, where much of the work gets done: the time professional historians take to write and vet articles? I would wager those in-kind costs are far larger than $460,000 a year. That’s partly what Roy was getting at in his appeal to the underlying funding of most historical scholarship. Any such larger economic accounting would trigger more difficult questions, such as Hugh Gusterson’s pointed query about why he’s being asked to give his peer-review labor for free but publishers are gating the final product in return—thanks for your gift labor, now pay up. That the AHA is a small non-profit publisher rather than a commercial giant doesn’t make this question go away.

There is no doubt that professional societies outside of the sciences are in a horrible bind between the drive toward open access and the need for sustainability. But history tells us that no institution has the privilege of remaining static. The American Historical Association can tinker with payments for the AHR as much as it likes under the assumption that the future will be like the past, just with a different spreadsheet. I’d like to see the AHA be bolder—supportive not only of its flagship but of the entire fleet, which now includes fledgling open access journals, blogs, and other nascent online genres.

Mostly, I’d like to see a statement that doesn’t read like this one does: anxious and reactive. I’d like to see a statement that says: “We stand ready to nurture and support historical scholarship whenever and wherever it might arise.”

Catching the Good

[Another post in my series on our need to focus more on the "demand side" of scholarly communication—how and why scholars engage with and contribute to publications—in addition to new models for the "supply side"—new production models for publications themselves. If you're new to this line of thought on my blog, you may wish to start here or here.]

As all parents discover when their children reach the “terrible twos” (a phase that evidently lasts until 18 years of age), it’s incredibly easy to catch your kids being bad, and to criticize them. Kids are constantly pushing boundaries and getting into trouble; it’s part of growing up, intellectually and emotionally. What’s harder for parents, but perhaps far more important, is “catching your child doing good,” to look over when your kid isn’t yelling or pulling the dog’s ear to say, “I like the way you’re doing that.”

Although I fear infantilizing scholars (wags would say that’s perfectly appropriate), whenever I talk about the publishing model at PressForward, I find myself referring back to this principle of “catching the good,” which of course goes by the fancier name of “positive reinforcement” in psychology. What appears in PressForward publications such as Digital Humanities Now isn’t submitted and threatened with criticism and rejection (negative reinforcement). Indeed, there is no submission process at all. Instead, we look to “catch the good” in whatever format, and wherever, it exists (positive reinforcement). Catching the good is not necessarily the final judgment upon a work, but an assessment that something is already quite worthy and might benefit from a wider audience.

It’s a useful exercise to consider the very different psychological modes of positive and negative reinforcement as they relate to scholarly (and non-scholarly) communication, and the kind of behavior these models encourage or suppress. Obviously PressForward has no monopoly on positive reinforcement; catching the good also happens when a sharp editor from a university press hears about a promising young scholar and cultivates her work for publication. And positive reinforcement is deeply imbedded in the open web, where a blog post can either be ignored or reach thousands as a link is propagated by impressed readers.

In modes where negative reinforcement predominates, such as at journals with high rejection rates, scholars are much more hesitant to distribute their work until it is perfect or near-perfect. An aversion to criticism spreads, with both constructive and destructive effects. Authors work harder on publications, but also spend significant energy to tailor their work to please the paren, er, editors and blind reviewers who wait in judgment. Authors internalize the preferences of the academic community they strive to join, and curb experimentation or the desire to reach interdisciplinary or general audiences.

Positive-reinforcement models, especially those that involve open access to content, allow for greater experimentation of form and content. Interdisciplinary and general audiences are more likely to be reached, since a work can be highlighted or linked to by multiple venues at the same time. Authors feel at greater liberty to disseminate more of their work, including material that is half-baked and work that is polished, but audiences may find even the half-baked to be helpful to their thought processes. In other publications that “partial” work might not ever see the light of day.

Finally, just as a kid who constantly strives to be a great baseball player might be unexpectedly told he has a great voice and should try out for the choir, positive reinforcement is more likely to push authors to contribute to fields in which they naturally excel. Positive reinforcement casts a wider net, doing a better job at catching scholars in all stations, or even outsiders, who might have ideas or approaches a discipline could use.

When mulling new outlets for their work, scholars implicitly model risk and reward, imagining the positive and negative reinforcement they will be subjected to. It would be worth talking about this psychology more explicitly. For instance, what if there were a low-risk, but potentially high-reward, outlet that focused more on positive reinforcement—published articles getting noticed and passed around based on merit after a relatively restricted phase of pre-publication criticism? If you want to know why PLoS ONE is the fastest-growing venue for scientific work, that’s the question they asked and successfully answered. And that’s what we’re trying to do with PressForward as well.

[My thanks to Joan Fragazsy Troyano and Mike O'Malley for reading an early version of this post.]

The Ivory Tower and the Open Web: Introduction: Burritos, Browsers, and Books [Draft]

[A draft of the introduction to my forthcoming book, The Ivory Tower and the Open Web, which looks at academic resistance to the modes and genres of the web, and how those modes and genres might actually reinvigorate the academy. I'll be posting drafts of chapters as well for open comment and criticism.]

In the summer of 2007, Nate Silver decided to conduct a rigorous assessment of the inexpensive Mexican restaurants in his neighborhood, Chicago’s Wicker Park. Figuring that others might be interested in the results of his study, and that he might be able to use some feedback from an audience, he took his project online.

Silver had no prior experience in such an endeavor. By day he worked as a statistician and writer at Baseball Prospectus—an innovator, to be sure, having created a clever new standard for empirically measuring the value of players, an advanced form of the “sabermetrics” vividly described by Michael Lewis in Moneyball.1 But Silver had no experience as a food critic, nor as a web developer.

In time, his appetite took care of the former and the open web took care of the latter. Silver knit together a variety of free services as the tapestry for his culinary project. He set up a blog, The Burrito Bracket, using Google’s free Blogger web application. Weekly posts consisted of his visits to local restaurants, and the scores (in jalapeños) he awarded in twelve categories.

Home page of Nate Silver’s Burrito Bracket
Ranking system (upper left quadrant)

Being a sports geek, he organized the posts as a series of contests between two restaurants. Satisfying his urge to replicate March Madness, he modified another free application from Google, generally intended to create financial or data spreadsheets, to produce the “bracket” of the blog’s title.

Google Spreadsheets used to create the competition bracket

Like many of the savviest users of the web, Silver started small and improved the site as he went along. For instance, he had started to keep a photographic record of his restaurant visits and decided to share this documentary evidence. So he enlisted the photo-sharing site Flickr, creating an off-the-rack archive to accompany his textual descriptions and numerical scores. On August 15, 2007, he added a map to the site, geolocating each restaurant as he went along and color-coding the winners and losers.

Flickr photo archive for The Burrito Bracket (flickr.com)
Silver’s Google Map of Chicago’s Wicker Park (shaded in purple) with the location of each Mexican restaurant pinpointed

Even with its do-it-yourself enthusiasm and the allure of carne asada, Silver had trouble attracting an audience. He took to Yelp, a popular site for reviewing restaurants to plug The Burrito Bracket, and even thought about creating a Super Burrito Bracket, to cover all of Chicago.2 But eventually he abandoned the site following the climactic “Burrito Bowl I.”

With his web skills improved and a presidential election year approaching, Silver decided to try his mathematical approach on that subject instead—”an opportunity for a sort of Moneyball approach to politics,” as he would later put it.3 Initially, and with a nod to his obsession with Mexican food, he posted his empirical analyses of politics under the chili-pepper pseudonym “Poblano,” on the liberal website Daily Kos, which hosts blogs for its engaged readers.

Then, in March 2008, Silver registered his own web domain, with a title that was simultaneously and appropriately mathematical and political: fivethirtyeight.com, a reference to the total number of electors in the United States electoral college. He launched the site with a slight one-paragraph post on a recent poll from South Dakota and a summary of other recent polling from around the nation. As with The Burrito Bracket it was a modest start, but one that was modular and extensible. Silver soon added maps and charts to bolster his text.

FiveThirtyEight two months after launch, in May 2008

Nate Silver’s real name and FiveThiryEight didn’t remain obscure for long. His mathematical modeling of the competition between Barack Obama and Hillary Clinton for the Democratic presidential nomination proved strikingly, almost creepily, accurate. Clear-eyed, well-written, statistically rigorous posts began to be passed from browsers to BlackBerries, from bloggers to political junkies to Beltway insiders. From those wired early subscribers to his site, Silver found an increasingly large audience of those looking for data-driven, deeply researched analysis rather than the conventional reporting that presented political forecasting as more art than science.

FiveThiryEight went from just 800 visitors a day in its first month to a daily audience of 600,000 by October 2008.4 On election day, FiveThiryEight received a remarkable 3 
million 
visitors, more than most daily newspapers
.5

All of this attention for a site that most media coverage still called, with a hint of deprecation, a “blog,” or “aggregator” of polls, despite Silver’s rather obvious, if latent, journalistic skills. (Indeed, one of his roads not taken had been an offer, straight out of college, to become an assistant at The Washington Post.6 ) An article in the Colorado Daily on the emergent genre represented by FiveThirtyEight led with Ken Bickers, professor and chair of the political science department at the University of Colorado, saying that such sites were a new form of “quality blogs” (rather than, evidently, the uniformly second-rate blogs that had previously existed). The article then swerved into much more ominous territory, asking whether reading FiveThirtyEight and similar blogs was potentially dangerous, especially compared to the safe environs of the traditional newspaper. Surely these sites were superficial, and they very well might have a negative effect on their audience:

Mary Coussons-Read, a professor of psychology at CU Denver, says today’s quick turnaround of information helps to make it more compelling.

“Information travels so much more quickly,” she says. “(We expect) instant gratification. If people have a question, they want an answer.”

That real-time quality can bring with it the illusion that it’s possible to perceive a whole reality by accessing various bits of information.

“There’s this immediacy of the transfer of information that leads people to believe they’re seeing everything … and that they have an understanding of the meaning of it all,” she says.

And, Coussons-Read adds, there is pleasure in processing information.

“I sometimes feel like it’s almost a recreational activity and less of an information-gathering activity,” she says.

Is it addiction?

[Michele] Wolf says there is something addicting about all that data.

“I do feel some kind of high getting new information and being able to process it,” she says. “I’m also a rock climber. I think there are some characteristics that are shared. My addiction just happens to be information.”

While there’s no such mental-health diagnosis as political addiction, Jeanne White, chemical dependency counselor at Centennial Peaks Hospital in Louisville, says political information seeking could be considered an addictive process if it reaches an extreme.7

This stereotype of blogs as the locus of “information” rather than knowledge, of “recreation” rather than education, was—and is—a common one, despite the wide variety of blogs, including many with long-form, erudite writing. Perhaps in 2008 such a characterization of FiveThirtyEight was unsurprising given that Silver’s only other credits to date were the Player Empirical Comparison and Optimization Test Algorithm (PECOTA) and The Burrito Bracket. Clearly, however, here was an intelligent researcher who had set his mind on a new topic to write about, with a fresh, insightful approach to the material. All he needed was a way to disseminate his findings. His audience appreciated his extraordinarily clever methods—at heart, academic techniques—for cutting through the mythologies and inadequacies of standard political commentary. All they needed was a web browser to find him.

A few journalists saw past the prevailing bias against non-traditional outlets like FiveThirtyEight. In the spring of 2010, Nate Silver bumped into Gerald Marzorati, the editor of the New York Times Magazine, on a train platform in Boston. They struck up a conversation, which eventually turned into a discussion about how FiveThirtyEight might fit into the universe of the Times, which ultimately recognized the excellence of his work and wanted FiveThirtyEight to enhance their political reporting and commentary. That summer, a little more than two years after he had started FiveThirtyEight, Silver’s “blog” merged into the Times under a licensing deal.8 In less time than it takes for most students to earn a journalism degree, Silver had willed himself into writing for one of the world’s premier news outlets, taking a seat in the top tier of political analysis. A radically democratic medium had enabled him to do all of this, without the permission of any gatekeeper.

FiveThirtyEight on the New York Times website, 2010

* * *

 

The story of Nate Silver and FiveThirtyEight has many important lessons for academia, all stemming from the affordances of the open web. His efforts show the do-it-yourself nature of much of the most innovative work on the web, and how one can iterate toward perfection rather than publishing works in fully polished states. His tale underlines the principle that good is good, and that the web is extraordinarily proficient at finding and disseminating the best work, often through continual, post-publication, recursive review. FiveThirtyEight also shows the power of openness to foster that dissemination and the dialogue between author and audience. Finally, the open web enables and rewards unexpected uses and genres.

Undoubtedly it is true that the path from The Burrito Bracket to The New York Times may only be navigated by an exceptionally capable and smart individual. But the tools for replicating Silver’s work are just as open to anyone, and just as powerful. It was with that belief, and the desire to encourage other academics to take advantage of the open web, that Roy Rosenzweig and I wrote Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web.9 We knew that the web, although fifteen years old at the time, was still somewhat alien to many professors, graduate students, and even undergraduates (who might be proficient at texting but know nothing about HTML), and we wanted to make the medium more familiar and approachable.

What we did not anticipate was another kind of resistance to the web, based not on an unfamiliarity with the digital realm or on Luddism but on the remarkable inertia of traditional academic methods and genres—the more subtle and widespread biases that hinder the academy’s adoption of new media. These prejudices are less comical, and more deep-seated, than newspapers’ penchant for tales of internet addiction. This resistance has less to do with the tools of the web and more to do with the web’s culture. It was not enough for us to conclude Digital History by saying how wonderful the openness of the web was; for many academics, this openness was part of the problem, a sign that it might be like “playing tennis with the net down,” as my graduate school mentor worriedly wrote to me.10

In some respects, this opposition to the maximal use of the web is understandable. Almost by definition, academics have gotten to where they are by playing a highly scripted game extremely well. That means understanding and following self-reinforcing rules for success. For instance, in history and the humanities at most universities in the United States, there is a vertically integrated industry of monographs, beginning with the dissertation in graduate school—a proto-monograph—followed by the revisions to that work and the publication of it as a book to get tenure, followed by a second book to reach full professor status. Although we are beginning to see a slight liberalization of rules surrounding dissertations—in some places dissertations could be a series of essays or have digital components—graduate students infer that they would best be served on the job market by a traditional, analog monograph.

We thus find ourselves in a situation, now more than two decades into the era of the web, where the use of the medium in academia is modest, at best. Most academic journals have moved online but simply mimic their print editions, providing PDF facsimiles for download and having none of the functionality common to websites, such as venues for discussion. They are also largely gated, resistant not only to access by the general public but also to the coin of the web realm: the link. Similarly, when the Association of American University Presses recently asked its members about their digital publishing strategies, the presses tellingly remained steadfast in their fixation on the monograph. All of the top responses were about print-on-demand and the electronic distribution and discovery of their list, with a mere footnote for a smattering of efforts to host “databases, wikis, or blogs.”11 In other words, the AAUP members see themselves almost exclusively as book publishers, not as publishers of academic work in whatever form that may take. Surveys of faculty show comfort with decades-old software like word processors but an aversion to recent digital tools and methods.12 The professoriate may be more liberal politically than the most latte-filled ZIP code in San Francisco, but we are an extraordinarily conservative bunch when in comes to the progression and presentation of our own work. We have done far less than we should have by this point in imagining and enacting what academic work and communication might look like if it was digital first.

To be sure, as William Gibson has famously proclaimed, “The future is already here—it’s just not very evenly distributed.”13 Almost immediately following the advent of the web, which came out of the realm of physics, physicists began using the Los Alamos National Laboratory preprint server (later renamed ArXiv and moved to arXiv.org) to distribute scholarship directly to each other. Blogging has taken hold in some precincts of the academy, such as law and economics, and many in those disciplines rely on web-only outlets such as the Social Science Research Network. The future has had more trouble reaching the humanities, and perhaps this book is aimed slightly more at that side of campus than the science quad. But even among the early adopters, a conservatism reigns. For instance, one of the most prominent academic bloggers, the economist Tyler Cowen, still recommends to students a very traditional path for their own work.14 And far from being preferred by a large majority of faculty, quests to open scholarship to the general public often meet with skepticism.15

If Digital History was about the mechanisms for moving academic work online, this book is about how the digital-first culture of the web might become more widespread and acceptable to the professoriate and their students. It is, by necessity, slightly more polemical than Digital History, since it takes direct aim at the conservatism of the academy that twenty years of the web have laid bare. But the web and the academy are not doomed to an inevitable clash of cultures. Viewed properly, the open web is perfectly in line with the fundamental academic goals of research, sharing of knowledge, and meritocracy. This book—and it is a book rather than a blog or stream of tweets because pragmatically that is the best way to reach its intended audience of the hesitant rather than preaching to the online choir—looks at several core academic values and asks how we can best pursue them in a digital age.

First, it points to the critical academic ability to look at any genre without bias and asks whether we might be violating that principle with respect to the web. Upon reflection many of the best things we discover in scholarship are found by disregarding popularity and packaging, by approaching creative works without prejudice. We wouldn’t think much of the meandering novel Moby-Dick if Carl Van Doren hadn’t looked past decades of mixed reviews to find the genius in Melville’s writing. Art historians have similarly unearthed talented artists who did their work outside of the royal academies and the prominent schools of practice. As the unpretentious wine writer Alexis Lichine shrewdly said in the face of fancy labels and appeals to mythical “terroir”: “There is no substitute for pulling corks.”16

Good is good, no matter the venue of publication or what the crowd thinks. Scholars surely understand that on a deep level, yet many persist in the valuing venue and medium over the content itself. This is especially true at crucial moments, such as promotion and tenure. Surely we can reorient ourselves to our true core value—to honor creativity and quality—which will still guide us to many traditionally published works but will also allow us to consider works in some nontraditional venues such as new open access journals or articles written and posted on a personal website or institutional repository, or digital projects.

The genre of the blog has been especially cursed by this lack of open-mindedness from the academy. Chapter 1, “What is a Blog?”, looks at the history of the blog and blogging, the anatomy and culture of a genre that is in many ways most representative of the open web. Saddled with an early characterization as being the locus of inane, narcissistic writing, the blog has had trouble making real inroads in academia, even though it is an extraordinarily flexible form and the perfect venue for a great deal of academic work. The chapter highlights some of the best examples of academic blogging and how they shape and advance arguments in a field. We can be more creative in thinking about the role of the blog within the academy, as a venue for communicating our work to colleagues as well as to a lay audience beyond the ivory tower.

This academic prejudice against the blog extends to other genres that have proliferated on the open web. Chapter 2, “Genres and the Open Web,” examines the incredible variety of those new forms, and how, with a careful eye, we might be able to import some of them profitably into the academy. Some of these genres, like the wiki, are well-known (thanks to Wikipedia, which academics have come to accept begrudgingly in the last five years). Other genres are rarer but take maximal advantage of the latitude of the open web: its malleability and interactivity. Rather than imposing the genres we know on the web—as we do when we post PDFs of print-first journal articles—we would do well to understand and adopt the web’s native genres, where helpful to scholarly pursuits.

But what of our academic interest in validity and excellence, enshrined in our peer review system? Chapter 3, “Good is Good,” examines the fundamental requirements of any such system: the necessity of highlighting only a minority of the total scholarly output, based on community standards, and of disseminating that minority of work to communities of thought and practice. The chapter compares print-age forms of vetting with native web forms of assessment and review, and proposes ways that digital methods can supplement—or even replace—our traditional modes of peer review.

“The Value, and Values, of Openness,” Chapter 4, broadly examines the nature of the web’s openness. Oddly, this openness is both the easiest trait of the web to understand and its most complex, once one begins to dig deeper. The web’s radical openness not only has led to calls for open access to academic work, which has complicated the traditional models of scholarly publishers and societies; it has also challenged our academic predisposition toward perfectionism—the desire to only publish in a “final” format, purged (as much as possible) of error. Critically, openness has also engendered unexpected uses of online materials—for instance, when Nate Silver refactored poll numbers from the raw data polling agencies posted.

Ultimately, openness is at the core of any academic model that can operate effectively on the web: it provides a way to disseminate our work easily, to assess what has been published, and to point to what’s good and valuable. Openness can naturally lead—indeed, is leading—to a fully functional shadow academic system for scholarly research and communication that exists beyond the more restrictive and inflexible structures of the past.

[Update, 7/29/11: I've answered Zach Schrag's criticism about the disciplinary scope of the book in a new paragraph beginning with "To be sure, as William Gibson..."]

[Update, 8/1/11: Added more about "good is good," beginning with the line on Alexis Lichine and continuing through the following paragraph, to address Sylvia Miller's point about promotion and tenure. Also fixed a few points of grammar, thanks to Sherman Dorn.]

  1. Nate Silver, “Introducing PECOTA,” in Gary Huckabay, Chris Kahrl, Dave Pease et al., eds., Baseball Prospectus 2003 (Dulles, VA: Brassey’s Publishers, 2003): 507-514. Michael Lewis, Moneyball: The Art of Winning an Unfair Game (New York: W. W. Norton & Company, 2004). []
  2. Frequently Asked Questions, The Burrito Bracket, http://burritobracket.blogspot.com/2007/07/faq.html []
  3. http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf []
  4. Adam Sternbergh, The Spreadsheet Psychic, New York, Oct 12, 2008, http://nymag.com/news/features/51170/ []
  5. http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf []
  6. http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf []
  7. Cindy Sutter, “Hooked on information: Can political news really be addicting?” The Colorado Daily, November 3, 2008, http://www.coloradodaily.com/ci_13105998 []
  8. Nate Silver, “FiveThirtyEight to Partner with New York Times, http://www.fivethirtyeight.com/2010/06/fivethirtyeight-to-partner-with-new.html []
  9. Daniel J. Cohen and Roy Rosenzweig, Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web (University of Pennsylvania Press, 2006). []
  10. http://www.dancohen.org/2010/11/11/frank-turner-on-the-future-of-peer-review/ []
  11. Association of American University Presses, “Digital Publishing in the AAUP Community; Survey Report: Winter 2009-2010,” http://aaupnet.org/resources/reports/0910digitalsurvey.pdf, p. 2 []
  12. See, for example, Robert B. Townsend, “How Is New Media Reshaping the Work of Historians?”, Perspectives on History, November 2010, http://www.historians.org/Perspectives/issues/2010/1011/1011pro2.cfm []
  13. National Public Radio, “Talk of the Nation” radio program, 30 November 1999, timecode 11:55, http://discover.npr.org/features/feature.jhtml?wfId=1067220 []
  14. “Tyler Cowen: Academic Publishing,” remarks at the Institute for Humane Studies Summer Research Fellowship weekend seminar, May 2011, http://vimeo.com/24124436 []
  15. Open access mandates have been tough sells on many campuses, passing only by slight majorities or failing entirely. For instance, such a mandate was voted down at the University of Maryland, with evidence of confusion and ambivalence. http://scholarlykitchen.sspnet.org/2009/04/28/umaryland-faculty-vote-no-oa/ []
  16. Quoted in Frank J. Prial, “Wine Talk,” New York Times, 17 August 1994, http://www.nytimes.com/1994/08/17/garden/wine-talk-983519.html. []

A Conversation with Richard Stallman about Open Access

[An email exchange with Richard Stallman, father of free software, copyleft, GNU, and the GPL, reprinted here in redacted form with Stallman's permission. Stallman tutors me in the important details of open access and I tutor him in the peculiarities of humanities publishing.]

RS: [Your] posting ["Open Access Publishing and Scholarly Values"] doesn’t specify which definition of “open access” you’re arguing for — but that is a fundamental question.

When the Budapest Declaration defined open access, the crucial condition was that users be free to redistribute copies of the articles.  That is an ethical imperative in its own right, and a requisite for proper and safe archiving of the work.

People paid more attention to the other condition specified in the Budapest Declaration: that the publication site allow access by anyone.  This is a good thing, but need not be explicitly required, because the other condition (freedom to redistribute) will have this as a consequence.  Many universities and labs to set up mirror sites, and everyone will thus have access.

More recently, some have started using a modified definition of “open access” which omits the freedom to redistribute.  As a result, “open access” is no longer a clear rallying point.  I think we should now campaign for “redistributable publication.”

What are your thoughts on this?

DC: I probably should have been clearer in my post that I’m for the maximal access—and distribution—of which you speak. Alas, the situation is actually worse than you imagine, especially in the humanities, where I work, and which is about a decade behind the sciences in open access. Beyond the muddying of the waters through terms like “Green OA” and “Gold OA” is the fact that academic publishing is horribly wrapped up (again, more so in the humanities) with structural problems related to reputation, promotion, and tenure. So my colleagues worry more about truly open publications “counting” vs. publications that are simply open to reading on a commercial publisher’s website. That is why I think the big question is not the licensing or the technology of decentralized publishing, posting and free distribution of papers, etc., but the social realm in which academic publishing sits. I’m working now on pragmatic ways to change that very conservative realm.

Put another way: when software developers write good (open) code, other developers recognize that quality, independent of where the code resides; in humanities publishing, packaging (including the imprimatur of a press, the sense that a work has jumped some (often mythical) peer-review hurdle) counts for too much right now.

RS: ["Green OA" and "Gold OA"] are new to me — can you tell me what they mean?

So my colleagues worry more about truly open publications “counting” vs. publications that are simply open to reading on a commercial publisher’s website.

I don’t understand that sentence.

That is why I think the big question is not the licensing or the technology of decentralized publishing, posting and free distribution of papers, etc., but the social realm in which academic publishing sits.

Ethically speaking, what matters is the license used. That’s what determines whether the publishing is ethical or not. Are you saying that the social realm contains the obstacle to the adoption of ethical publication methods?

Put another way: when software developers write good (open) code, other developers recognize that quality, independent of where the code resides.

Programmers can tell if code is well-written, assuming they are allowed to read it, but how does that relate? Are you saying that in the humanities people often judge work based on where it is published, and have no other way to determine what is good or bad?

DC: Green O[pen] A[ccess] = when a professor deposits her finished article in a university repository after it is published. Theoretically that article will then be available (if people can find the website for the institution’s repository), even if the journal keeps it gated.

Gold OA = when an author pays a journal (often around $1-3K) to make their submission open access. when the journal itself (rather than the repository) is open access; may involve the author paying a submission fee. Still probably doesn’t have a redistribution license, but it’s not behind a publisher’s digital gates.

Counting = counting in the academic promotion and tenure process. Much of the problem here is (I believe misplaced) concern about the effect of open access on one’s career.

Are you saying that the social realm contains the obstacle to the adoption of ethical publication methods?

Correct. And much of it has to do with the meekness of academics (especially in the humanities, bastion of liberalism in most other ways) to challenge the system to create a more ethical publication system, one controlled by the community of scholars rather than commercial publishers who profit from our work.

Are you saying that in the humanities people often judge work based on where it is published, and have no other way to determine what is good or bad?

Amazing as it may sound, many academics do indeed judge a work that way, especially in tenure and promotion processes. There are some departments that actually base promotion and tenure on the number of pages published in the top (mostly gated) journals.

RS: [Terms like "Green OA" and "Gold OA" provides] even more reason to reject the term “open access” and demand redistributable publication.

Maybe some leading scholars could be recruited to start a redistributable journal.  Their names would make it prestigious.

DC: That’s what PLoS did (http://plos.org) in the sciences. Unclear if the model is replicable in the humanities, but I’m trying.

UPDATE: This was an off-hand conversation with Stallman, and my apologies for the quick (and poor) descriptions of a couple of open access options. But I think the many commenters below who are focusing on the fine differences between kinds of OA are missing the central themes of this conversation.

Peer Review and the Most Influential Publications

Thanks to Josh Greenberg, I’ve been mulling over this fascinating paper I missed from last winter about the relative impact of science articles published in three different ways in the Proceedings of the National Academy of Sciences (PNAS). It speaks to the question of how important traditional peer review is, and how we might introduce other modes of scholarly communication and review.

PNAS now allows for three very different modes of article submission:

The majority of papers published in PNAS are submitted directly to the journal and follow the standard peer review process. The editorial board appoints an editor for each Direct submission, who then solicits reviewers. During the review process the authors are blinded to the identities of both the editor and the referees. PNAS refers to this publication method as “Track II”. In addition to the direct submission track, members of the National Academy of Sciences (NAS) are allowed to “Communicate” up to two papers per year for other authors. Here, authors send their paper to the NAS member, who then procures reviews from at least two other researchers and submits the paper and reviews to the PNAS editorial board for approval. As with Direct submissions, authors of Communicated papers are at least in theory blinded to the identity of their reviewers, but not to the identity of the editor. PNAS refers to this publication method as “Track I”. Lastly, NAS members are allowed to “Contribute” as many of their own papers per year as they wish. Here, NAS members choose their own referees, collect at least two reviews, and submit their paper along with the reviews to the PNAS editorial board. Peer review is no longer blind, as the authoring NAS member selects his or her own reviewers. PNAS refers to this publication method as “Track III”… Examining papers published in PNAS provides an opportunity to evaluate how these differences in the submission and peer review process within the same journal affect the impact of the papers finally published. The possibility that impact varies systematically across track has received a great deal of recent attention, particularly in light of the decision by PNAS to discontinue Track I. The citation analysis we now present provides a quantitative treatment of the quality of papers published through each track, a discussion which as hitherto been largely anecdotal in nature.

Here’s the eye-opening conclusion:

The analysis presented here clearly demonstrates variation in impact among papers published using different review processes at PNAS. We find that overall, papers authored by NAS member and Contributed to PNAS are cited significantly less than papers which are Direct submissions. Strikingly, however, we find that the 10% most cited Contributed papers receive significantly more citations than the 10% most cited Direct submissions. Thus the Contributed track seems to yield less influential papers on average, but is more likely produce truly exceptional papers. [emphasis mine]

I suspect this will hold true for many new kinds of scholarly communication that are liberated from traditional peer review. Due to their more open and freewheeling nature, these genres, like blogging, will undoubtedly contain much dreck, and thus be negatively stereotyped by many in the professoriate, who (as I have noted in this space) are inordinately conservative when in comes to scholarly communication. But in that sea of nontraditionally reviewed material will be many of the most creative and influential publications. I’m willing to bet this pattern will be even more pronounced in the humanities, where traditional peer review is particularly adept at homogenizing scholarly work.

Just a thought for Open Access Week.

Eliminating the Power Cord

[My live talk at the Shape of Things to Come conference at the University of Virginia, March 27, 2010. It is a riff on a paper that will come out in the proceedings of the conference.]

As I noted in my paper for this conference, what I find interesting about this panel is that we got a chance to compare two projects by Ken Price: the Walt Whitman Archive and Civil War Washington. How their plans and designs differ tell us something about all digital humanities projects. I want to spend my brief time spinning out further what I said in the paper about control, flexibility, creativity, and reuse. It’s a tale of the tension between content creators and content users.

But before I get to Ken’s work, I’d like to start with another technological humanist, Jef Raskin, one of the first employees of Apple Computer and the designer, with Steve Jobs, of the first Macintosh. Just read the principles Raskin lays out in 1979 in “Design Considerations for an Anthropophilic Computer”:

This is an outline for a computer designed for the Person In The Street (or, to abbreviate: the PITS); one that will be truly pleasant to use, that will require the user to do nothing that will threaten his or her perverse delight in being able to say: “I don’t know the first thing about computers.”

You might think that any number of computers have been designed with these criteria in mind, but not so. Any system which requires a user to ever see the interior, for any reason, does not meet these specifications. There must not be additional ROMS, RAMS, boards or accessories except those that can be understood by the PITS as a separate appliance. As a rule of thumb, if an item does not stand on a table by itself, and if it does not have its own case, or if it does not look like a complete consumer item in [and] of itself, then it is taboo.

If the computer must be opened for any reason other than repair (for which our prospective user must be assumed incompetent) even at the dealer’s, then it does not meet our requirements.

Seeing the guts is taboo. Things in sockets is taboo. Billions of keys on the keyboard is taboo. Computerese is taboo. Large manuals, or many of them is taboo.

There must not be a plethora of configurations. It is better to manufacture versions in Early American, Contemporary, and Louis XIV than to have any external wires beyond a power cord.

And you get ten points if you can eliminate the power cord.

Many digital humanities projects implicitly believe strongly in Raskin’s design principle. They take care of what to the content creators and designers seems like hard and annoying work for the end users, freeing those users “to do what they do best.” These editorial projects bring together at once primary sources, middleware, user interfaces, and even tools.

Like the Macintosh, this can be a very good thing. I mostly agree with what Ken has just said, that in the case of Whitman, we probably cannot rely on a loose network of sites to provide canonical texts. Moreover, students new to Walt Whitman can clearly use the contextualization and criticism Ken and his colleagues provide on the Walt Whitman site. Similarly, scholars dipping for the first time into ethnomusicology will appreciate the total research environment provided by EVIA. As Matt Kirschenbaum noted in the last session, good user interfaces can enable new interpretations. I doubt that many scholars would be able to do Hypercities-grade geographical scholarship without a centralized Hypercities site.

But at the same time, like Raskin, sometimes these projects strive too hard to eliminate the power cord.

Raskin thought that the perfect computer would enable creativity at the very surface of the appliance. Access to the guts would not be permitted because to allow so would hinder the capacity of the user to be creative. The computer designers would take care of all of the creativity from the base of the hardware to the interface. But as Bethany Nowviskie discussed this morning, design decisions and user interface embody an argument. And so they also imply control. It’s worth thinking about the level of control the creators assume in each digital humanities project.

I would like to advance this principle: Scholars have uses for edited collections that the editors cannot anticipate. One of the joys of server logs is that we can actually see that principle in action (whereas print editorial projects have no idea how their volumes are being used, except in footnotes many years later). In the September 11 Digital Archive we assumed as historians that all uses of the archive would be related to social history. But we discovered later that many linguists were using the archive to study teen slang at the turn of the century, because it was a large open database that held many stories by teens. Anyone creating resources to serve scholars and scholarship needs to account for these unanticipated uses.

When we think through the principle of unanticipated uses, we begin to realize that there is a push and pull between the scholar and the editor. It is perhaps not a zero sum game, but surely there is a tension between the amount of intellectual work each party gets to do. Editors that put a major intellectual stamp on their collection through data massaging and design and user tools restrict the ability of the scholar to do flexible work on it. Alan Burdette of EVIA was thinking of this when he spoke about his fear of control vs. dynamism this morning.

Are digital humanities projects prepared to separate their interfaces from their primary content? What if Hypercities was just a set of KML files like Phil Ethington’s KML files of LA geography? What about the Grub Street Project? Or Ken’s Civil War Washington? This is a hard question for digital projects—freeing their content for reuse.

I believe Ken’s two projects, one a more traditional editorial project and one a labor of love, struggle with how much intellectual work to cede to the end user. Both projects have rather restrictive terms of use pages and admonishments about U.S. copyright law. Maybe I’m reading something into the terms of use page for Civil War Washington site, but it seems more half-hearted. You can tell that here is a project that isn’t a holding place for fixed perfected primary resources like Whitman’s, but an evolving scholarly discussion that could easily involve others.

Why not then allow for the download of all the data on the site? I don’t think it would detract from Civil War Washington; indeed, it would probably increase the profile of the site. The site would not only have its own interpretations, but allow for other interpretations—off of the site. Why not let others have access to the guts that Raskin wished to cloak? This is the way networked scholarship works. And this is, I believe, what Roger Bagnall was getting at yesterday when he said “we need to think about the death of the [centralized website] project” as the greater success of digital humanities.

Jim Chandler and I have been formulating a rule of thumb for these editorial projects: the more a discipline is secure in its existence, its modes of interpretation, and its methods of creating scholarship, the more likely it is to produce stripped-down, exchangeable data sets. Thus scholars in papyrology just want to get at the raw sources; they would be annoyed by a Mac-like interface or silo.  They have achieved what David Weinberger, in summarizing the optimal form of the web, called “small pieces, loosely joined.”

On the other hand, the newer and less confident disciplines, such as the digital geographic history of Civil War Washington, Hypercities, and Grub Street feel that they need to have a Raskin-like environment—it’s part of the process of justifying their existence. They feel pressure to be judge, jury and executioner. If the Cohen-Chandler law holds true, we will see in the future fewer fancy interfaces and more direct, portable access to humanities materials.

Of course, as I note in my paper, the level of curation apparent in a digital project is related to the question of credit. The Whitman archive feels like a traditional editorial project and thus worthy of credit. If Ken instead produced KML files and raw newspaper scans, he would likely get less credit than a robust, comprehensive site like Civil War Washington.

The irony about the long-suffering debate about credit is that every day humanities scholars deal with complexity, parsing complicated texts, finding meaning in the opaque. And yet somehow when it comes to self-assessment, we are remarkably simple-minded. If we can understand Whitman’s Leaves of Grass, surely we can tease out questions of credit and the intellectual work that goes into, say, complex KML files.

To help spur this transition along, Christine Madsen has made this weekend the important point that the separation of interface and data makes sustainability models easier to imagine (and suggests a new role for libraries). If art is long and life is short, data is longish and user interfaces are fleeting. Just look at how many digital humanities projects that rely on Flash are about to become useless on millions of iPads.

Finally, on sustainability, I made a comparison in my paper between the well-funded Whitman archive and the Civil War Washington site, which was produced through sweat equity. I believe that Ken has a trump card with the latter. Being a labor of love is worth thinking about, because it’s often the way that great scholarship happens. Scholars in the humanities are afflicted with an obsession that makes them wake up in the morning and research and write about topics that drive them and constantly occupy their thoughts. Scholars naturally want to spend their time doing things like Civil War Washington. Being a labor of love is often the best sustainability model.

Idealism and Pragmatism in the Free Culture Movement

[A review of Gary Hall's Digitize This Book! The Politics of New Media, or Why We Need Open Access Now (University of Minnesota Press, 2009). Appeared in the May/June 2009 issue of Museum.]

Beginning in the late 1970s with Richard Stallman’s irritation at being unable to inspect or alter the code of software he was using at MIT, and accelerating with 22-year-old Linus Torvalds’s release of the whimsically named Linux operating system and the rise of the World Wide Web in the early 1990s, with its emphasis on openly available, interlinked documents, the free software and open access movements are among the most important developments of our digital age.

These movements can no longer be considered fringe. Two-thirds of all websites run on open source software, and although many academic resources remain closed behind digital gates, the Directory of Open Access Journals reports that nearly 4,000 publications are available to anyone via the Web, a number that grows rapidly each year. In the United States, the National Institutes of Health mandated recently that all articles produced under an NIH grant—a significant percentage of current medical research—must be available for free online.

But if the movement toward shared digital openness seems like a single groundswell, it masks an underlying tension between pragmatism and idealism. If Stallman was a seer and the intellectual justifier of “free software” (“free” meaning “liberated”), it was Torvalds’s focus on the practical as well as a less radical name—“open source”—that convinced tech giant IBM to commit billions of dollars to Linux starting in the late 1990s. Similarly, open access efforts like the science article sharing site arXiv.org have flourished because they provide useful services—including narcisstic ones such as establishing scientific precedent—while furthering idealistic goals. Successful movements need both Stallmans and Torvalds, as uneasily as they may coexist.

Gary Hall’s Digitize This Book! clearly falls more on the idealistic side of today’s open movements than the pragmatic side. Although he acknowledges the importance of practice—and he has practiced open access himself—Hall emphasizes that theory must be primary, since unlike any particular website or technology theory contains the full potential of what digitization might bring. He pursues this idealism by drawing from the critical theory—and the critical posture—of cultural studies, one of the most vociferous antagonists to traditional structures in higher education and politics.

Hall’s book is less accessible than others on the topic because of long stretches involving this cultural theory, with some chapters rife with the often opaque language developed by Jacques Derrida and his disciples. Digitize This Book! gets its name, of course, from Abbie Hoffman’s 1971 hippie classic, Steal This Book, which provided practical advice on a variety of uniformly shady (and often illegal) methods for rebelling against The Man. But Digitize This Book! reads less like a Hoffmanesque handbook for the digital age and more like a throw-off-your-chains political manifesto couched in academic lingo.

Those unaccustomed to the lingo and associated theoretical constructions might find the book offputting, but its impressive intellectual ambition makes Digitize This Book! an important addition to a growing literature on the true significance of digital openness. Hall imagines open access not merely in terms of the goods of universal availability and the greater dissemination of knowledge, but as potentially leading to energetic opposition to the “marketization and managerialization of the university,” that is, the growing approach by administrations to treat universities as businesses rather than as places of learning and free intellectual exchange—a development that has upset many, including well beyond cultural studies departments. Similar worries, of course, cloud cultural heritage institutions such as museums and libraries.

Despite his emphasis on theory, Hall knows that any positive transformation must ultimately come from effective action in addition to advocacy. As Stallman unhappily discovered after starting the Free Software Foundation in 1985 and working for many years on his revolutionary software called GNU, it was Torvalds, a clever tactician and amiable community builder rather than theoretician or firebrand, who helped (along with others of similar disposition) to break open source into the mainstream by finding pathways for his Linux operating system to insinuate itself into institutions and companies that normally might have rejected the mere idea of it out of hand.

Hall does understand this pragmatism, and much to his credit he has real experience with creating open access materials rather than simply thinking about how they might affect the academy. He is a co-founder of the Open Humanities Press, a founder and co-editor of the open access journal Culture Machine, and is director of CSeARCH, an arXiv.org for cultural studies.

Yet Hall sees his efforts as ongoing “experiments,” not the final (digital) word. Indeed, he worries that his compatriots in the open access and open source software movements are congratulating themselves too early, and for accomplishing lesser goals. Yes, open source software has made significant inroads, Hall acknowledges, but it has also been “coopted” by the giants of industry, as the IBM investment shows. (The book would have benefited from a more comprehensive analysis of open source, especially in the Third World, where free software is more radically challenging the IBMs and Microsofts.) Similarly, Hall claims, open access journals are flourishing, but too often these journals merely bring online the structures and strictures of traditional academia.

Here is where Hall’s true radicalism comes to the fore, building toward a conclusion with more expansive aims (and more expansive words, such as “hypercyberdemocracy” and “hyperpolitics”). He believes that open access provides a rare opportunity to completely rethink and remake the university, including its internal and external relationships. Paper journals ratified what and who was important in ways we may not want to replicate online, Hall argues. Even if one disagrees with his (hyper)politics, Hall’s insight that new media forms are often little more than unimaginative digital reproductions of the past, which bring forward old conventions and inequities, seems worthy of consideration.

A wag might note at this point that Digitize This Book! is oddly not itself available as a digital reproduction. (As part of the research for this review, I looked in the shadier parts of the Internet but could not locate a free electronic download of the book, even in the shadows.) Other recent books on the open access movement are available for free online (legally), including James Boyle’s The Public Domain: Enclosing the Commons of the Mind (Yale University Press) and John Willinsky’s The Access Principle: The Case for Open Access to Research and Scholarship (MIT Press). Drawing attention to this disconnect is less a cheap knock against Hall than a recognition that the actualization of open access and its transformative potential are easier said than done.

Assuming things will not change overnight and that few professors, curators, or librarians are ready to move, like Abbie Hoffman, to a commune (though many might applaud the lack of administrators there), the key questions are, How does one take concrete steps toward a system in which open access is the normal mode of publishing? Which structures must be dissolved and which created, and how to convince various stakeholders to make this transition together?

These are the kinds of practical—political—questions that advocates of open access must address. Gary Hall has helpfully provided the academic purveyors of open access much food for thought. Now comes the difficult work of crafting recipes to reach the future he so richly imagines.

Mass Digitization of Books: Exit Microsoft, What Next?

So Microsoft has left the business of digitizing millions of books—apparently because they saw it as no business at all.

This leaves Microsoft’s partner (and our partner on the Zotero project), the Internet Archive, somewhat in the lurch, although Microsoft has done the right thing and removed the contractual restrictions on the books they digitized so they may become part of IA’s fully open collection (as part of the broader Open Content Alliance), which now has about 400,000 volumes. Also still on the playing field is the Universal Digital Library (a/k/a the Million Books Project), which has 1.5 million volumes.

And then there’s Google and its Book Search program. For those keeping score at home, my sources tell me that Google, which coyly likes to say it has digitized “over a million books” so far, has actually finished scanning five million. It will be hard for non-profits like IA to catch up with Google without some game-changing funding or major new partnerships.

Foundations like the Alfred P. Sloan Foundation have generously made substantial (million-dollar) grants to add to the digital public domain. But with the cost of digitizing 10 million pre-1923 books at around $300 million, where might this scale of funds and new partners come from? To whom can the Open Content Alliance turn to replace Microsoft?

Frankly, I’ve never understood why institutions such as Harvard, Yale, and Princeton haven’t made a substantial commitment to a project like OCA. Each of these universities has seen its endowment grow into the tens of billions in the last decade, and each has the means and (upon reflection) the motive to do a mass book digitization project of Google’s scale. $300 million sounds like a lot, but it’s less than 1% of Harvard’s endowment and my guess is that the amount is considerably less than all three universities are spending to build and fund laboratories for cutting-edge sciences like genomics. And a 10 million public-domain book digitization project is just the kind of outrageously grand project HYP should be doing, especially if they value the humanities as much as the sciences.

Moreover, Harvard, Yale, and Princeton find themselves under enormous pressure to spend more of their endowment for a variety of purposes, including tuition remission and the public good. (Full and rather vain disclosure: I have some relationship to all three institutions; I complain because I love.) Congress might even get into the act, mandating that universities like HYP spend a more generous minimum percentage of their endowment every year, just like private foundations who benefit (as does HYP, though in an indirect way) from the federal tax code.

In one stroke HYP could create enormous good will with a moon-shot program to rival Google’s: free books for the world. (HYP: note the generous reaction to, and the great press for, MIT’s OpenCourseWare program.) And beyond access, the project could enable new forms of scholarship through computational access to a massive corpora of full texts.

Alas, Harvard and Princeton partnered with Google long ago. Princeton has committed to digitizing about one million volumes with Google; Harvard’s number is unclear, but probably smaller. The terms of the agreement with Google are non-exclusive; Harvard and Princeton could initiate their own digitization projects or form other partnerships. But I suspect that would be politically difficult since the two universities are getting free digitization services from Google and would have to explain to their overseers why they want to replace free with very expensive. (The answer sounds like Abbott and Costello: the free program produces something that’s not free, while the expensive one is free.)

If Google didn’t exist, Harvard would probably be the most obvious candidate to pull off the Great Digitization of Widener. Not only does it have the largest endowment; historian Robert Darnton, a leader in thinking about the future (and the past) of the book, is now the director of the Harvard library system. Harvard also recently passed an open access mandate for the publications of its faculty.

Princeton has the highest per-student endowment of any university, and could easily undertake a mass digitization project of this scale. Perhaps some of the many Princeton alumni who went on to vast riches on the Web, such as EBay‘s Meg Whitman (who has already given $100 million to Princeton) or Amazon‘s Jeff Bezos, could pitch in.

But Harvard’s and Princeton’s Google “non-exclusive” partnership makes these outcomes unlikely, as does the general resistance in these universities to spending science-scale funds outside of the sciences (unless it’s for a building).

That leaves Yale. Yale chose Microsoft last year to do its digitization, and has now been abandoned right in the middle of its project. Since Microsoft is apparently leaving its equipment and workflow in place at partner institutions, Yale could probably pick up the pieces with an injection of funding from its endowment or from targeted alumni gifts. Yale just spent an enormous amount of money on a new campus for the sciences, and this project could be seen as a counterbalance for the humanities.

Or, HYP could band together and put in a mere $100 million each to get the job done.

Is this likely to happen? Of course not. HYP and other wealthy institutions are being asked to spend their prodigious endowments on many other things, and are reluctant to up their spending rate at all. But I believe a HYP or HYP-like solution is much more likely than public funding for this kind of project, as the Human Genome Project received.

Digital Campus #26 – Free for All

On this episode of the Digital Campus podcast we wrestle with how to keep open access/open source educational resources and tools sustainable for the long run. Mills elaborates on some of his ideas about a “freemium” business model for higher ed, and Tom and I explain the dilemma from the perspective of large academic software projects. We also debate whether laptops are a distraction in the classroom, among other topics in the news roundup and picks of the week. [Subscribe to this podcast.]