Dan Cohen

Archive for the ‘Books’ Category

The Social Contract of Scholarly Publishing

Friday, March 5th, 2010

When Roy Rosenzweig and I finished writing a full draft of our book Digital History, we sat down at a table and looked at the stack of printouts.

“So, what now?” I said to Roy naively. “Couldn’t we just publish what we have on the web with the click of a button? What value does the gap between this stack and the finished product have? Isn’t it 95% done? What’s the last five percent for?”

We stared at the stack some more.

Roy finally broke the silence, explaining the magic of the last stage of scholarly production between the final draft and the published book: “What happens now is the creation of the social contract between the authors and the readers. We agree to spend considerable time ridding the manuscript of minor errors, and the press spends additional time on other corrections and layout, and readers respond to these signals—a lack of typos, nicely formatted footnotes, a bibliography, specialized fonts, and a high-quality physical presentation—by agreeing to give the book a serious read.”

I have frequently replayed that conversation in my mind, wondering about the constitution of this social contract in scholarly publishing, which is deeply related to questions of academic value and reward.

For the ease of conversation, let’s call the two sides of the social contract of scholarly publishing the supply side and the demand side. The supply side is the creation of scholarly works, including writing, peer review, editing, and the form of publication. The demand side is much more elusive—the mental state of the audience that leads them to “buy” what the supply side has produced. In order for the social contract to work, for engaged reading to happen and for credit to be given to the author (or editor of a scholarly collection), both sides need to be aligned properly.

The social contract of the book is profoundly entrenched and powerful—almost mythological—especially in the humanities. As John Updike put it in his diatribe against the digital (and most humanities scholars and tenure committees would still agree), “The printed, bound and paid-for book was—still is, for the moment—more exacting, more demanding, of its producer and consumer both. It is the site of an encounter, in silence, of two minds, one following in the other’s steps but invited to imagine, to argue, to concur on a level of reflection beyond that of personal encounter, with all its merely social conventions, its merciful padding of blather and mutual forgiveness.”

As academic projects have experimented with the web over the past two decades we have seen intense thinking about the supply side. Robust academic work has been reenvisioned in many ways: as topical portals, interactive maps, deep textual databases, new kinds of presses, primary source collections, and even software. Most of these projects strive to reproduce the magic of the traditional social contract of the book, even as they experiment with form.

The demand side, however, has languished. Far fewer efforts have been made to influence the mental state of the scholarly audience. The unspoken assumption is that the reader is more or less unchangeable in this respect, only able to respond to, and validate, works that have the traditional marks of the social contract: having survived a strong filtering process, near-perfect copyediting, the imprimatur of a press.

We need to work much more on the demand side if we want to move the social contract forward into the digital age. Despite Updike’s ode to the book, there are social conventions surrounding print that are worth challenging. Much of the reputational analysis that occurs in the professional humanities relies on cues beyond the scholarly content itself. The act of scanning a CV is an act fraught with these conventions.

Can we change the views of humanities scholars so that they may accept, as some legal scholars already do, the great blog post as being as influential as the great law review article? Can we get humanities faculty, as many tenured economists already do, to publish more in open access journals? Can we accomplish the humanities equivalent of FiveThirtyEight.com, which provides as good, if not better, in-depth political analysis than most newspapers, earning the grudging respect of journalists and political theorists? Can we get our colleagues to recognize outstanding academic work wherever and however it is published?

I believe that to do so, we may have to think less like humanities scholars and more like social scientists. Behavioral economists know that although the perception of value can come from the intrinsic worth of the good itself (e.g., the quality of a wine, already rather subjective), it is often influenced by many other factors, such as price and packaging (the wine bottle, how the wine is presented for tasting). These elements trigger a reaction based on stereotypes—if it’s expensive and looks well-wrapped, it must be valuable. The book and article have an abundance of these value triggers from generations of use, but we are just beginning to understand equivalent value triggers online—thus the critical importance of web design, and why the logo of a trusted institution or a university press can still matter greatly, even if it appears on a website rather than a book.

Social psychologists have also thought deeply about the potent grip of these idols of our tribe. They are aware of how cultural norms establish and propagate themselves, and tell us how the imposition of limits creates hierarchies of recognition. Thinking in their way, along with the way the web works, one potential solution on the demand side might come not from the scarcity of production, as it did in a print world, but from the scarcity of attention. That is, value will be perceived in any community-accepted process that narrows the seemingly limitless texts to read or websites to view. Curation becomes more important than publication once publication ceases to be limited.

[image credit: Priki]

Digital Campus #45 – Wave Hello

Wednesday, October 14th, 2009

If you’ve wondered what an academic trying to podcast while on Google Wave might sound like, you need listen no farther than the latest Digital Campus podcast. In addition to an appraisal of Wave, we cover the FTC ruling on bloggers accepting gifts (such as free books from academic presses), the great Kindle-on-campus experiment, and (of course) another update on the Google Books (un)settlement. Joining Tom, Mills, and me is another new irregular, Lisa Spiro. She’s the intelligent one who’s paying attention rather than muttering while watching Google waves go by. [Subscribe to this podcast.]

Digital Campus #44 – Unsettled

Thursday, October 1st, 2009

The latest edition of the Digital Campus podcast marks a break from the past. After three years of our small roundtable of Tom, Mills, and yours truly, we pull up a couple of extra seats for our first set of “irregulars,” Amanda French and Jeff McClurken. I think you’ll agree they greatly enliven the podcast and we’re looking forward to having them back on an irregular basis. On the discussion docket was the falling apart of the Google Books settlement, reCAPTCHA, Windows 7, and the future of libraries. [Subscribe to this podcast.]

Idealism and Pragmatism in the Free Culture Movement

Tuesday, May 12th, 2009

[A review of Gary Hall's Digitize This Book! The Politics of New Media, or Why We Need Open Access Now (University of Minnesota Press, 2009). Appeared in the May/June 2009 issue of Museum.]

Beginning in the late 1970s with Richard Stallman’s irritation at being unable to inspect or alter the code of software he was using at MIT, and accelerating with 22-year-old Linus Torvalds’s release of the whimsically named Linux operating system and the rise of the World Wide Web in the early 1990s, with its emphasis on openly available, interlinked documents, the free software and open access movements are among the most important developments of our digital age.

These movements can no longer be considered fringe. Two-thirds of all websites run on open source software, and although many academic resources remain closed behind digital gates, the Directory of Open Access Journals reports that nearly 4,000 publications are available to anyone via the Web, a number that grows rapidly each year. In the United States, the National Institutes of Health mandated recently that all articles produced under an NIH grant—a significant percentage of current medical research—must be available for free online.

But if the movement toward shared digital openness seems like a single groundswell, it masks an underlying tension between pragmatism and idealism. If Stallman was a seer and the intellectual justifier of “free software” (“free” meaning “liberated”), it was Torvalds’s focus on the practical as well as a less radical name—“open source”—that convinced tech giant IBM to commit billions of dollars to Linux starting in the late 1990s. Similarly, open access efforts like the science article sharing site arXiv.org have flourished because they provide useful services—including narcisstic ones such as establishing scientific precedent—while furthering idealistic goals. Successful movements need both Stallmans and Torvalds, as uneasily as they may coexist.

Gary Hall’s Digitize This Book! clearly falls more on the idealistic side of today’s open movements than the pragmatic side. Although he acknowledges the importance of practice—and he has practiced open access himself—Hall emphasizes that theory must be primary, since unlike any particular website or technology theory contains the full potential of what digitization might bring. He pursues this idealism by drawing from the critical theory—and the critical posture—of cultural studies, one of the most vociferous antagonists to traditional structures in higher education and politics.

Hall’s book is less accessible than others on the topic because of long stretches involving this cultural theory, with some chapters rife with the often opaque language developed by Jacques Derrida and his disciples. Digitize This Book! gets its name, of course, from Abbie Hoffman’s 1971 hippie classic, Steal This Book, which provided practical advice on a variety of uniformly shady (and often illegal) methods for rebelling against The Man. But Digitize This Book! reads less like a Hoffmanesque handbook for the digital age and more like a throw-off-your-chains political manifesto couched in academic lingo.

Those unaccustomed to the lingo and associated theoretical constructions might find the book offputting, but its impressive intellectual ambition makes Digitize This Book! an important addition to a growing literature on the true significance of digital openness. Hall imagines open access not merely in terms of the goods of universal availability and the greater dissemination of knowledge, but as potentially leading to energetic opposition to the “marketization and managerialization of the university,” that is, the growing approach by administrations to treat universities as businesses rather than as places of learning and free intellectual exchange—a development that has upset many, including well beyond cultural studies departments. Similar worries, of course, cloud cultural heritage institutions such as museums and libraries.

Despite his emphasis on theory, Hall knows that any positive transformation must ultimately come from effective action in addition to advocacy. As Stallman unhappily discovered after starting the Free Software Foundation in 1985 and working for many years on his revolutionary software called GNU, it was Torvalds, a clever tactician and amiable community builder rather than theoretician or firebrand, who helped (along with others of similar disposition) to break open source into the mainstream by finding pathways for his Linux operating system to insinuate itself into institutions and companies that normally might have rejected the mere idea of it out of hand.

Hall does understand this pragmatism, and much to his credit he has real experience with creating open access materials rather than simply thinking about how they might affect the academy. He is a co-founder of the Open Humanities Press, a founder and co-editor of the open access journal Culture Machine, and is director of CSeARCH, an arXiv.org for cultural studies.

Yet Hall sees his efforts as ongoing “experiments,” not the final (digital) word. Indeed, he worries that his compatriots in the open access and open source software movements are congratulating themselves too early, and for accomplishing lesser goals. Yes, open source software has made significant inroads, Hall acknowledges, but it has also been “coopted” by the giants of industry, as the IBM investment shows. (The book would have benefited from a more comprehensive analysis of open source, especially in the Third World, where free software is more radically challenging the IBMs and Microsofts.) Similarly, Hall claims, open access journals are flourishing, but too often these journals merely bring online the structures and strictures of traditional academia.

Here is where Hall’s true radicalism comes to the fore, building toward a conclusion with more expansive aims (and more expansive words, such as “hypercyberdemocracy” and “hyperpolitics”). He believes that open access provides a rare opportunity to completely rethink and remake the university, including its internal and external relationships. Paper journals ratified what and who was important in ways we may not want to replicate online, Hall argues. Even if one disagrees with his (hyper)politics, Hall’s insight that new media forms are often little more than unimaginative digital reproductions of the past, which bring forward old conventions and inequities, seems worthy of consideration.

A wag might note at this point that Digitize This Book! is oddly not itself available as a digital reproduction. (As part of the research for this review, I looked in the shadier parts of the Internet but could not locate a free electronic download of the book, even in the shadows.) Other recent books on the open access movement are available for free online (legally), including James Boyle’s The Public Domain: Enclosing the Commons of the Mind (Yale University Press) and John Willinsky’s The Access Principle: The Case for Open Access to Research and Scholarship (MIT Press). Drawing attention to this disconnect is less a cheap knock against Hall than a recognition that the actualization of open access and its transformative potential are easier said than done.

Assuming things will not change overnight and that few professors, curators, or librarians are ready to move, like Abbie Hoffman, to a commune (though many might applaud the lack of administrators there), the key questions are, How does one take concrete steps toward a system in which open access is the normal mode of publishing? Which structures must be dissolved and which created, and how to convince various stakeholders to make this transition together?

These are the kinds of practical—political—questions that advocates of open access must address. Gary Hall has helpfully provided the academic purveyors of open access much food for thought. Now comes the difficult work of crafting recipes to reach the future he so richly imagines.

Sol LeWitt and the Soul of Creative and Intellectual Work

Sunday, December 7th, 2008

I won’t get there until the summer, but I’m already looking forward to experiencing the Sol LeWitt retrospective at the always entertaining and often thought-provoking Massachusetts Museum of Contemporary Art, better known as MASS MoCA. (For previous thoughts provoked by MASS MoCA, see my post “The Artistic and the Digital.”)

For those who can’t make it to the retrospective—and really, you have no excuse, since its limited engagement runs through 2033—the museum has just put online a terrific website for the retrospective (one that exhibits many of the principles of good design, including the use of small multiples):

The site also has mesmerizing timelapse films showing how some of the giant works of wall art were created. This being LeWitt, the works were of course created not by him but by a team of (sixty-five) artists, including many students. LeWitt died last year, but his wall drawings were always made in this way. He “merely” created the plan for a wall drawing; others carried it out, and most of the works at MASS MoCA have been produced multiple times, on walls of different sizes and in different contexts.

Among LeWitt’s many innovations was this utter disdain toward a particular instance of a creative or intellectual work. The “artwork” was not what was on the wall (or the many walls a specific design had been placed on); it was in the ideas and feelings the artist had and the communication of these ideas and feelings to the viewer. The notion of a nicely framed work of art, a work of art that gained its value from its trappings or price or uniqueness, seemed hopelessly traditional, sentimental, and superficial. It missed the point of art.

My thoughts naturally turned to Sol LeWitt and the lessons we might learn from him as I mulled over the future of books and music this weekend. On an interesting listserv I’m subscribed to a debate raged about ebooks and the joys (the heft, the feel, the smell, the cover) of physical books; at the same time, the New York Times lionized Gabriel Roth, who is recreating classic soul and funk by eschewing digital technology and who speaks of the joys (the heft, the feel, the smell, the cover) of vinyl records.

My musical tastes happen to run toward classic soul and funk, but even I can’t help but feel that in Roth’s yearning for “real” vinyl and that rare 45 and book lovers’ similar idealization of hardcovers and that rare edition there isn’t something odd going on that LeWitt would have instantly recognized and scorned: the fetishization of the object rather than its underlying ideas, a nostalgia that improperly finds authenticity in packaging.

When Gabriel Roth tells Cliff Driver, a 75-year-old keyboardist, to replace his electronic Roland with an upright piano, Driver calls him “an old, traditional type” and the Times reporter notes that “Driver and his peers would just as well leave [such analog sound] in the past with their Afros and bell-bottoms.”

The soul of soul isn’t in the vinyl; it’s in the talent and creativity of its makers. The soul of books isn’t in their format; it’s in the ideas of their authors. Sol LeWitt understood that.

Digital Campus #33 – Classroom Action Settlement

Monday, November 3rd, 2008

After an unplanned month off (our apologies, things have been more than a little busy around here), the Digital Campus podcast triumphantly returns to the airwaves with a discussion of the recent Google Book Search settlement. Also up for analysis are Microsoft’s move to the cloud, the new Google phone, and, as always, recommendations from Tom, Mills, and me about helpful sites, tools, and publications. [Subscribe to this podcast.]

First Impressions of the Google Books Settlement

Tuesday, October 28th, 2008

Just announced is the settlement of the class action lawsuit that the Authors Guild, the Association of American Publishers and individual authors and publishers filed against Google for its Book Search program, which has been digitizing millions of books from libraries. (Hard to believe, but the lawsuit was first covered on this blog all the way back in November 2005.) Undoubtedly this agreement is a critical one not only for Google and the authors and publishers, but for all of us in academia and others who care about the present and future of learning and scholarship.

It will obviously take some time to digest this agreement; indeed, the Google post on it is fairly sketchy and we still need to hear details, such as the cost structure for full access the agreement now provides for. But my first impressions of some key points:

The agreement really focuses on in-copyright but out-of-print books. That is, books that can’t normally be copied but also can’t be purchased anywhere. Highlighting these books (which are numerous; most academic books, e.g., are out-of-print and have virtually no market) was smart for Google since it seems to provide value without stepping on publishers’ toes.

A second (also smart, but probably more controversial) focus is on access to the Google Books collection via libraries:

We’ll also be offering libraries, universities and other organizations the ability to purchase institutional subscriptions, which will give users access to the complete text of millions of titles while compensating authors and publishers for the service. Students and researchers will have access to an electronic library that combines the collections from many of the top universities across the country. Public and university libraries in the U.S. will also be able to offer terminals where readers can access the full text of millions of out-of-print books for free.

Again, we need to hear more details about this part of the agreement. We also need to begin thinking about how this will impact libraries, e.g., in terms of their own book acquisition plans and their subscriptions to other online databases.

Finally, and perhaps most interesting and surprising to those of us in the digital humanities, is an all-too-brief mention of computational access to these millions of books:

In addition to the institutional subscriptions and the free public access terminals, the agreement also creates opportunities for researchers to study the millions of volumes in the Book Search index. Academics will be able to apply through an institution to run computational queries through the index without actually reading individual books.

For years in this space I have been arguing for the necessity of such access (first envisioned, to give due credit, by Cliff Lynch of CNI). Inside Google they have methods for querying and analyzing these books that we academics could greatly benefit from, and that could enable new kinds of digital scholarship.

Update: The Association of American Publishers now has a page answering frequently asked questions about the agreement (have we had time to ask?).

Digital Campus #29 – Making It Count

Friday, July 4th, 2008

Tom, Mills, and I take up the much-debated issue of whether and how digital work should count toward promotion and tenure on this episode of the podcast. We also examine the significance of university presses putting their books on Amazon’s Kindle device, and the release of better copyright records. [Subscribe to this podcast.]

Happy 4th of July!

Mass Digitization of Books: Exit Microsoft, What Next?

Thursday, May 29th, 2008

So Microsoft has left the business of digitizing millions of books—apparently because they saw it as no business at all.

This leaves Microsoft’s partner (and our partner on the Zotero project), the Internet Archive, somewhat in the lurch, although Microsoft has done the right thing and removed the contractual restrictions on the books they digitized so they may become part of IA’s fully open collection (as part of the broader Open Content Alliance), which now has about 400,000 volumes. Also still on the playing field is the Universal Digital Library (a/k/a the Million Books Project), which has 1.5 million volumes.

And then there’s Google and its Book Search program. For those keeping score at home, my sources tell me that Google, which coyly likes to say it has digitized “over a million books” so far, has actually finished scanning five million. It will be hard for non-profits like IA to catch up with Google without some game-changing funding or major new partnerships.

Foundations like the Alfred P. Sloan Foundation have generously made substantial (million-dollar) grants to add to the digital public domain. But with the cost of digitizing 10 million pre-1923 books at around $300 million, where might this scale of funds and new partners come from? To whom can the Open Content Alliance turn to replace Microsoft?

Frankly, I’ve never understood why institutions such as Harvard, Yale, and Princeton haven’t made a substantial commitment to a project like OCA. Each of these universities has seen its endowment grow into the tens of billions in the last decade, and each has the means and (upon reflection) the motive to do a mass book digitization project of Google’s scale. $300 million sounds like a lot, but it’s less than 1% of Harvard’s endowment and my guess is that the amount is considerably less than all three universities are spending to build and fund laboratories for cutting-edge sciences like genomics. And a 10 million public-domain book digitization project is just the kind of outrageously grand project HYP should be doing, especially if they value the humanities as much as the sciences.

Moreover, Harvard, Yale, and Princeton find themselves under enormous pressure to spend more of their endowment for a variety of purposes, including tuition remission and the public good. (Full and rather vain disclosure: I have some relationship to all three institutions; I complain because I love.) Congress might even get into the act, mandating that universities like HYP spend a more generous minimum percentage of their endowment every year, just like private foundations who benefit (as does HYP, though in an indirect way) from the federal tax code.

In one stroke HYP could create enormous good will with a moon-shot program to rival Google’s: free books for the world. (HYP: note the generous reaction to, and the great press for, MIT’s OpenCourseWare program.) And beyond access, the project could enable new forms of scholarship through computational access to a massive corpora of full texts.

Alas, Harvard and Princeton partnered with Google long ago. Princeton has committed to digitizing about one million volumes with Google; Harvard’s number is unclear, but probably smaller. The terms of the agreement with Google are non-exclusive; Harvard and Princeton could initiate their own digitization projects or form other partnerships. But I suspect that would be politically difficult since the two universities are getting free digitization services from Google and would have to explain to their overseers why they want to replace free with very expensive. (The answer sounds like Abbott and Costello: the free program produces something that’s not free, while the expensive one is free.)

If Google didn’t exist, Harvard would probably be the most obvious candidate to pull off the Great Digitization of Widener. Not only does it have the largest endowment; historian Robert Darnton, a leader in thinking about the future (and the past) of the book, is now the director of the Harvard library system. Harvard also recently passed an open access mandate for the publications of its faculty.

Princeton has the highest per-student endowment of any university, and could easily undertake a mass digitization project of this scale. Perhaps some of the many Princeton alumni who went on to vast riches on the Web, such as EBay’s Meg Whitman (who has already given $100 million to Princeton) or Amazon’s Jeff Bezos, could pitch in.

But Harvard’s and Princeton’s Google “non-exclusive” partnership makes these outcomes unlikely, as does the general resistance in these universities to spending science-scale funds outside of the sciences (unless it’s for a building).

That leaves Yale. Yale chose Microsoft last year to do its digitization, and has now been abandoned right in the middle of its project. Since Microsoft is apparently leaving its equipment and workflow in place at partner institutions, Yale could probably pick up the pieces with an injection of funding from its endowment or from targeted alumni gifts. Yale just spent an enormous amount of money on a new campus for the sciences, and this project could be seen as a counterbalance for the humanities.

Or, HYP could band together and put in a mere $100 million each to get the job done.

Is this likely to happen? Of course not. HYP and other wealthy institutions are being asked to spend their prodigious endowments on many other things, and are reluctant to up their spending rate at all. But I believe a HYP or HYP-like solution is much more likely than public funding for this kind of project, as the Human Genome Project received.

Still Waiting for a Real Google Book Search API

Monday, March 31st, 2008

For years on this blog, at conferences, and even in direct conversations with Google employees I have been agitating for an API (application programming interface) for Google Book Search. (For a summary of my thoughts on the matter, see my imaginatively titled post, “Why Google Books Should Have an API.”) With the world’s largest collection of scanned books, I thought such an API would have major implications for doing research in the humanities. And I looked forward to building applications on top of the API, as I had done with my Syllabus Finder.

So why was I disappointed when Google finally released an API for their book scanning project a couple of weeks ago?

My suspicion began with the name of the API itself. Even though the URL for the API is http://code.google.com/apis/books/, suggesting that this is the long-awaited API for the kind of access to Google Books that I’ve been waiting for, the rather prosaic and awkward title of the API suggests otherwise: The Google Book Search Book Viewability API. From the API’s home page:

The Google Book Search Book Viewability API enables developers to:

  • Link to Books in Google Book Search using ISBNs, LCCNs, and OCLC numbers
  • Know whether Google Book Search has a specific title and what the viewability of that title is
  • Generate links to a thumbnail of the cover of a book
  • Generate links to an informational page about a book
  • Generate links to a preview of a book

These are remarkably modest goals. Certainly the API will be helpful for online library catalogs and other book services (such as LibraryThing) that wish to embed links to Google’s landing pages for books and (when copyright law allows) links to the full texts. The thumbnails of book covers will make OPACs look prettier.

But this API does nothing to advance the kind of digital scholarship I have advocated for in this space. To do that the API would have to provide direct access to the full OCRed text of the books, to provide the ability to mine these texts for patterns and to combine them with other digital tools and corpora. Undoubtedly copyright concerns are part of the story here, hobbling what Google can do. But why not give full access to pre-1923 books through the API?

I’m not hopeful that there are additional Google Book Search APIs coming. If that were the case the URL for the viewability API would be http://code.google.com/apis/books/viewability/. The result is that this API simply seems like a way to drive traffic to Google Books, rather than to help academia or to foster a external community of developers, as other Google APIs have done.