Dan Cohen

Archive for the ‘Open Access’ Category

Eliminating the Power Cord

Sunday, March 28th, 2010

[My live talk at the Shape of Things to Come conference at the University of Virginia, March 27, 2010. It is a riff on a paper that will come out in the proceedings of the conference.]

As I noted in my paper for this conference, what I find interesting about this panel is that we got a chance to compare two projects by Ken Price: the Walt Whitman Archive and Civil War Washington. How their plans and designs differ tell us something about all digital humanities projects. I want to spend my brief time spinning out further what I said in the paper about control, flexibility, creativity, and reuse. It’s a tale of the tension between content creators and content users.

But before I get to Ken’s work, I’d like to start with another technological humanist, Jef Raskin, one of the first employees of Apple Computer and the designer, with Steve Jobs, of the first Macintosh. Just read the principles Raskin lays out in 1979 in “Design Considerations for an Anthropophilic Computer”:

This is an outline for a computer designed for the Person In The Street (or, to abbreviate: the PITS); one that will be truly pleasant to use, that will require the user to do nothing that will threaten his or her perverse delight in being able to say: “I don’t know the first thing about computers.”

You might think that any number of computers have been designed with these criteria in mind, but not so. Any system which requires a user to ever see the interior, for any reason, does not meet these specifications. There must not be additional ROMS, RAMS, boards or accessories except those that can be understood by the PITS as a separate appliance. As a rule of thumb, if an item does not stand on a table by itself, and if it does not have its own case, or if it does not look like a complete consumer item in [and] of itself, then it is taboo.

If the computer must be opened for any reason other than repair (for which our prospective user must be assumed incompetent) even at the dealer’s, then it does not meet our requirements.

Seeing the guts is taboo. Things in sockets is taboo. Billions of keys on the keyboard is taboo. Computerese is taboo. Large manuals, or many of them is taboo.

There must not be a plethora of configurations. It is better to manufacture versions in Early American, Contemporary, and Louis XIV than to have any external wires beyond a power cord.

And you get ten points if you can eliminate the power cord.

Many digital humanities projects implicitly believe strongly in Raskin’s design principle. They take care of what to the content creators and designers seems like hard and annoying work for the end users, freeing those users “to do what they do best.” These editorial projects bring together at once primary sources, middleware, user interfaces, and even tools.

Like the Macintosh, this can be a very good thing. I mostly agree with what Ken has just said, that in the case of Whitman, we probably cannot rely on a loose network of sites to provide canonical texts. Moreover, students new to Walt Whitman can clearly use the contextualization and criticism Ken and his colleagues provide on the Walt Whitman site. Similarly, scholars dipping for the first time into ethnomusicology will appreciate the total research environment provided by EVIA. As Matt Kirschenbaum noted in the last session, good user interfaces can enable new interpretations. I doubt that many scholars would be able to do Hypercities-grade geographical scholarship without a centralized Hypercities site.

But at the same time, like Raskin, sometimes these projects strive too hard to eliminate the power cord.

Raskin thought that the perfect computer would enable creativity at the very surface of the appliance. Access to the guts would not be permitted because to allow so would hinder the capacity of the user to be creative. The computer designers would take care of all of the creativity from the base of the hardware to the interface. But as Bethany Nowviskie discussed this morning, design decisions and user interface embody an argument. And so they also imply control. It’s worth thinking about the level of control the creators assume in each digital humanities project.

I would like to advance this principle: Scholars have uses for edited collections that the editors cannot anticipate. One of the joys of server logs is that we can actually see that principle in action (whereas print editorial projects have no idea how their volumes are being used, except in footnotes many years later). In the September 11 Digital Archive we assumed as historians that all uses of the archive would be related to social history. But we discovered later that many linguists were using the archive to study teen slang at the turn of the century, because it was a large open database that held many stories by teens. Anyone creating resources to serve scholars and scholarship needs to account for these unanticipated uses.

When we think through the principle of unanticipated uses, we begin to realize that there is a push and pull between the scholar and the editor. It is perhaps not a zero sum game, but surely there is a tension between the amount of intellectual work each party gets to do. Editors that put a major intellectual stamp on their collection through data massaging and design and user tools restrict the ability of the scholar to do flexible work on it. Alan Burdette of EVIA was thinking of this when he spoke about his fear of control vs. dynamism this morning.

Are digital humanities projects prepared to separate their interfaces from their primary content? What if Hypercities was just a set of KML files like Phil Ethington’s KML files of LA geography? What about the Grub Street Project? Or Ken’s Civil War Washington? This is a hard question for digital projects—freeing their content for reuse.

I believe Ken’s two projects, one a more traditional editorial project and one a labor of love, struggle with how much intellectual work to cede to the end user. Both projects have rather restrictive terms of use pages and admonishments about U.S. copyright law. Maybe I’m reading something into the terms of use page for Civil War Washington site, but it seems more half-hearted. You can tell that here is a project that isn’t a holding place for fixed perfected primary resources like Whitman’s, but an evolving scholarly discussion that could easily involve others.

Why not then allow for the download of all the data on the site? I don’t think it would detract from Civil War Washington; indeed, it would probably increase the profile of the site. The site would not only have its own interpretations, but allow for other interpretations—off of the site. Why not let others have access to the guts that Raskin wished to cloak? This is the way networked scholarship works. And this is, I believe, what Roger Bagnall was getting at yesterday when he said “we need to think about the death of the [centralized website] project” as the greater success of digital humanities.

Jim Chandler and I have been formulating a rule of thumb for these editorial projects: the more a discipline is secure in its existence, its modes of interpretation, and its methods of creating scholarship, the more likely it is to produce stripped-down, exchangeable data sets. Thus scholars in papyrology just want to get at the raw sources; they would be annoyed by a Mac-like interface or silo.  They have achieved what David Weinberger, in summarizing the optimal form of the web, called “small pieces, loosely joined.”

On the other hand, the newer and less confident disciplines, such as the digital geographic history of Civil War Washington, Hypercities, and Grub Street feel that they need to have a Raskin-like environment—it’s part of the process of justifying their existence. They feel pressure to be judge, jury and executioner. If the Cohen-Chandler law holds true, we will see in the future fewer fancy interfaces and more direct, portable access to humanities materials.

Of course, as I note in my paper, the level of curation apparent in a digital project is related to the question of credit. The Whitman archive feels like a traditional editorial project and thus worthy of credit. If Ken instead produced KML files and raw newspaper scans, he would likely get less credit than a robust, comprehensive site like Civil War Washington.

The irony about the long-suffering debate about credit is that every day humanities scholars deal with complexity, parsing complicated texts, finding meaning in the opaque. And yet somehow when it comes to self-assessment, we are remarkably simple-minded. If we can understand Whitman’s Leaves of Grass, surely we can tease out questions of credit and the intellectual work that goes into, say, complex KML files.

To help spur this transition along, Christine Madsen has made this weekend the important point that the separation of interface and data makes sustainability models easier to imagine (and suggests a new role for libraries). If art is long and life is short, data is longish and user interfaces are fleeting. Just look at how many digital humanities projects that rely on Flash are about to become useless on millions of iPads.

Finally, on sustainability, I made a comparison in my paper between the well-funded Whitman archive and the Civil War Washington site, which was produced through sweat equity. I believe that Ken has a trump card with the latter. Being a labor of love is worth thinking about, because it’s often the way that great scholarship happens. Scholars in the humanities are afflicted with an obsession that makes them wake up in the morning and research and write about topics that drive them and constantly occupy their thoughts. Scholars naturally want to spend their time doing things like Civil War Washington. Being a labor of love is often the best sustainability model.

Idealism and Pragmatism in the Free Culture Movement

Tuesday, May 12th, 2009

[A review of Gary Hall's Digitize This Book! The Politics of New Media, or Why We Need Open Access Now (University of Minnesota Press, 2009). Appeared in the May/June 2009 issue of Museum.]

Beginning in the late 1970s with Richard Stallman’s irritation at being unable to inspect or alter the code of software he was using at MIT, and accelerating with 22-year-old Linus Torvalds’s release of the whimsically named Linux operating system and the rise of the World Wide Web in the early 1990s, with its emphasis on openly available, interlinked documents, the free software and open access movements are among the most important developments of our digital age.

These movements can no longer be considered fringe. Two-thirds of all websites run on open source software, and although many academic resources remain closed behind digital gates, the Directory of Open Access Journals reports that nearly 4,000 publications are available to anyone via the Web, a number that grows rapidly each year. In the United States, the National Institutes of Health mandated recently that all articles produced under an NIH grant—a significant percentage of current medical research—must be available for free online.

But if the movement toward shared digital openness seems like a single groundswell, it masks an underlying tension between pragmatism and idealism. If Stallman was a seer and the intellectual justifier of “free software” (“free” meaning “liberated”), it was Torvalds’s focus on the practical as well as a less radical name—“open source”—that convinced tech giant IBM to commit billions of dollars to Linux starting in the late 1990s. Similarly, open access efforts like the science article sharing site arXiv.org have flourished because they provide useful services—including narcisstic ones such as establishing scientific precedent—while furthering idealistic goals. Successful movements need both Stallmans and Torvalds, as uneasily as they may coexist.

Gary Hall’s Digitize This Book! clearly falls more on the idealistic side of today’s open movements than the pragmatic side. Although he acknowledges the importance of practice—and he has practiced open access himself—Hall emphasizes that theory must be primary, since unlike any particular website or technology theory contains the full potential of what digitization might bring. He pursues this idealism by drawing from the critical theory—and the critical posture—of cultural studies, one of the most vociferous antagonists to traditional structures in higher education and politics.

Hall’s book is less accessible than others on the topic because of long stretches involving this cultural theory, with some chapters rife with the often opaque language developed by Jacques Derrida and his disciples. Digitize This Book! gets its name, of course, from Abbie Hoffman’s 1971 hippie classic, Steal This Book, which provided practical advice on a variety of uniformly shady (and often illegal) methods for rebelling against The Man. But Digitize This Book! reads less like a Hoffmanesque handbook for the digital age and more like a throw-off-your-chains political manifesto couched in academic lingo.

Those unaccustomed to the lingo and associated theoretical constructions might find the book offputting, but its impressive intellectual ambition makes Digitize This Book! an important addition to a growing literature on the true significance of digital openness. Hall imagines open access not merely in terms of the goods of universal availability and the greater dissemination of knowledge, but as potentially leading to energetic opposition to the “marketization and managerialization of the university,” that is, the growing approach by administrations to treat universities as businesses rather than as places of learning and free intellectual exchange—a development that has upset many, including well beyond cultural studies departments. Similar worries, of course, cloud cultural heritage institutions such as museums and libraries.

Despite his emphasis on theory, Hall knows that any positive transformation must ultimately come from effective action in addition to advocacy. As Stallman unhappily discovered after starting the Free Software Foundation in 1985 and working for many years on his revolutionary software called GNU, it was Torvalds, a clever tactician and amiable community builder rather than theoretician or firebrand, who helped (along with others of similar disposition) to break open source into the mainstream by finding pathways for his Linux operating system to insinuate itself into institutions and companies that normally might have rejected the mere idea of it out of hand.

Hall does understand this pragmatism, and much to his credit he has real experience with creating open access materials rather than simply thinking about how they might affect the academy. He is a co-founder of the Open Humanities Press, a founder and co-editor of the open access journal Culture Machine, and is director of CSeARCH, an arXiv.org for cultural studies.

Yet Hall sees his efforts as ongoing “experiments,” not the final (digital) word. Indeed, he worries that his compatriots in the open access and open source software movements are congratulating themselves too early, and for accomplishing lesser goals. Yes, open source software has made significant inroads, Hall acknowledges, but it has also been “coopted” by the giants of industry, as the IBM investment shows. (The book would have benefited from a more comprehensive analysis of open source, especially in the Third World, where free software is more radically challenging the IBMs and Microsofts.) Similarly, Hall claims, open access journals are flourishing, but too often these journals merely bring online the structures and strictures of traditional academia.

Here is where Hall’s true radicalism comes to the fore, building toward a conclusion with more expansive aims (and more expansive words, such as “hypercyberdemocracy” and “hyperpolitics”). He believes that open access provides a rare opportunity to completely rethink and remake the university, including its internal and external relationships. Paper journals ratified what and who was important in ways we may not want to replicate online, Hall argues. Even if one disagrees with his (hyper)politics, Hall’s insight that new media forms are often little more than unimaginative digital reproductions of the past, which bring forward old conventions and inequities, seems worthy of consideration.

A wag might note at this point that Digitize This Book! is oddly not itself available as a digital reproduction. (As part of the research for this review, I looked in the shadier parts of the Internet but could not locate a free electronic download of the book, even in the shadows.) Other recent books on the open access movement are available for free online (legally), including James Boyle’s The Public Domain: Enclosing the Commons of the Mind (Yale University Press) and John Willinsky’s The Access Principle: The Case for Open Access to Research and Scholarship (MIT Press). Drawing attention to this disconnect is less a cheap knock against Hall than a recognition that the actualization of open access and its transformative potential are easier said than done.

Assuming things will not change overnight and that few professors, curators, or librarians are ready to move, like Abbie Hoffman, to a commune (though many might applaud the lack of administrators there), the key questions are, How does one take concrete steps toward a system in which open access is the normal mode of publishing? Which structures must be dissolved and which created, and how to convince various stakeholders to make this transition together?

These are the kinds of practical—political—questions that advocates of open access must address. Gary Hall has helpfully provided the academic purveyors of open access much food for thought. Now comes the difficult work of crafting recipes to reach the future he so richly imagines.

Mass Digitization of Books: Exit Microsoft, What Next?

Thursday, May 29th, 2008

So Microsoft has left the business of digitizing millions of books—apparently because they saw it as no business at all.

This leaves Microsoft’s partner (and our partner on the Zotero project), the Internet Archive, somewhat in the lurch, although Microsoft has done the right thing and removed the contractual restrictions on the books they digitized so they may become part of IA’s fully open collection (as part of the broader Open Content Alliance), which now has about 400,000 volumes. Also still on the playing field is the Universal Digital Library (a/k/a the Million Books Project), which has 1.5 million volumes.

And then there’s Google and its Book Search program. For those keeping score at home, my sources tell me that Google, which coyly likes to say it has digitized “over a million books” so far, has actually finished scanning five million. It will be hard for non-profits like IA to catch up with Google without some game-changing funding or major new partnerships.

Foundations like the Alfred P. Sloan Foundation have generously made substantial (million-dollar) grants to add to the digital public domain. But with the cost of digitizing 10 million pre-1923 books at around $300 million, where might this scale of funds and new partners come from? To whom can the Open Content Alliance turn to replace Microsoft?

Frankly, I’ve never understood why institutions such as Harvard, Yale, and Princeton haven’t made a substantial commitment to a project like OCA. Each of these universities has seen its endowment grow into the tens of billions in the last decade, and each has the means and (upon reflection) the motive to do a mass book digitization project of Google’s scale. $300 million sounds like a lot, but it’s less than 1% of Harvard’s endowment and my guess is that the amount is considerably less than all three universities are spending to build and fund laboratories for cutting-edge sciences like genomics. And a 10 million public-domain book digitization project is just the kind of outrageously grand project HYP should be doing, especially if they value the humanities as much as the sciences.

Moreover, Harvard, Yale, and Princeton find themselves under enormous pressure to spend more of their endowment for a variety of purposes, including tuition remission and the public good. (Full and rather vain disclosure: I have some relationship to all three institutions; I complain because I love.) Congress might even get into the act, mandating that universities like HYP spend a more generous minimum percentage of their endowment every year, just like private foundations who benefit (as does HYP, though in an indirect way) from the federal tax code.

In one stroke HYP could create enormous good will with a moon-shot program to rival Google’s: free books for the world. (HYP: note the generous reaction to, and the great press for, MIT’s OpenCourseWare program.) And beyond access, the project could enable new forms of scholarship through computational access to a massive corpora of full texts.

Alas, Harvard and Princeton partnered with Google long ago. Princeton has committed to digitizing about one million volumes with Google; Harvard’s number is unclear, but probably smaller. The terms of the agreement with Google are non-exclusive; Harvard and Princeton could initiate their own digitization projects or form other partnerships. But I suspect that would be politically difficult since the two universities are getting free digitization services from Google and would have to explain to their overseers why they want to replace free with very expensive. (The answer sounds like Abbott and Costello: the free program produces something that’s not free, while the expensive one is free.)

If Google didn’t exist, Harvard would probably be the most obvious candidate to pull off the Great Digitization of Widener. Not only does it have the largest endowment; historian Robert Darnton, a leader in thinking about the future (and the past) of the book, is now the director of the Harvard library system. Harvard also recently passed an open access mandate for the publications of its faculty.

Princeton has the highest per-student endowment of any university, and could easily undertake a mass digitization project of this scale. Perhaps some of the many Princeton alumni who went on to vast riches on the Web, such as EBay‘s Meg Whitman (who has already given $100 million to Princeton) or Amazon‘s Jeff Bezos, could pitch in.

But Harvard’s and Princeton’s Google “non-exclusive” partnership makes these outcomes unlikely, as does the general resistance in these universities to spending science-scale funds outside of the sciences (unless it’s for a building).

That leaves Yale. Yale chose Microsoft last year to do its digitization, and has now been abandoned right in the middle of its project. Since Microsoft is apparently leaving its equipment and workflow in place at partner institutions, Yale could probably pick up the pieces with an injection of funding from its endowment or from targeted alumni gifts. Yale just spent an enormous amount of money on a new campus for the sciences, and this project could be seen as a counterbalance for the humanities.

Or, HYP could band together and put in a mere $100 million each to get the job done.

Is this likely to happen? Of course not. HYP and other wealthy institutions are being asked to spend their prodigious endowments on many other things, and are reluctant to up their spending rate at all. But I believe a HYP or HYP-like solution is much more likely than public funding for this kind of project, as the Human Genome Project received.

Digital Campus #26 – Free for All

Thursday, May 8th, 2008

On this episode of the Digital Campus podcast we wrestle with how to keep open access/open source educational resources and tools sustainable for the long run. Mills elaborates on some of his ideas about a “freemium” business model for higher ed, and Tom and I explain the dilemma from the perspective of large academic software projects. We also debate whether laptops are a distraction in the classroom, among other topics in the news roundup and picks of the week. [Subscribe to this podcast.]

Where Are the Open Humanities Textbooks?

Wednesday, April 16th, 2008

Textbooks on the ShelvesTake a look at this list of free and open textbooks. (Found this page a couple of clicks away from a helpful post at Peter Suber’s Open Access News.) Now note the stark imbalance between the number of science textbooks listed here and the number of humanities textbooks. Why is this?

It seems to me like there is a great opportunity here for funders, with potentially an incredible return on investment. Texas alone spends hundreds of millions of dollars a year on textbooks like the U.S. History survey. For less than a million dollars a high-quality free and open textbook could be produced, with print on demand producing paper copies where needed and with a slight markup on those printed versions possibly covering ongoing expenses for updating the work.

[More on open source textbooks from Inside Higher Ed today.]

[Creative Commons image credit.]

Digital Campus #22 – Demanding Print on Demand?

Thursday, February 28th, 2008

This week on the podcast we look at the merits of print on demand, and investigate whether it can have an impact on academia. The podcast includes a wide-ranging interview with Yakov Shafranovich, a software developer who specializes in print on demand services including PublicDomainReprints.org, covered in several prior Digital Campus episodes. We also debate the importance of Harvard’s move toward open access to its faculty’s scholarship.

Harvard Faculty Approves Open Access Policy

Wednesday, February 13th, 2008

According to the Chronicle of Higher Education, Harvard’s Faculty of Arts and Sciences adopted the open access policy I mentioned yesterday. Peter Suber has the link to the full text of the faculty motion.

A Quartet of Open Access Arguments

Tuesday, February 12th, 2008

On the day that Harvard’s faculty votes on a strong open access proposal (I’m still looking for the actual text of the proposal; please add a link in the comments if you are aware of it), here are a few of the better arguments this week about the open access movement:

Digital Campus #20 – Open to Change

Wednesday, January 30th, 2008

Are open educational resources such as iTunes U and thought-provoking dot-coms such as BigThink.com a distraction from the mission of professors and universities, or the wave of the future? We debate the merits of “open access” intellectual content in the feature story on our twentieth Digital Campus podcast. Also, I report on the mostly good (if a little odd) experience of buying a book from PublicDomainReprints.org, and we discuss the MacBook Air, Flickr Commons, and a variety of tools for manipulating RSS feeds.

The Case for Open Access Books

Thursday, January 24th, 2008

Open BookThis month’s First Monday has one of the most pragmatic, sensible articles I’ve read about the promise and perils of open access books. In “Open access book publishing in writing studies: A case study,” by Charles Bazerman, David Blakesley, Mike Palmquist, and David Russell, the authors describe their experience deciding to eschew a traditional publication arrangement with an academic press (what supposedly gives our monographs the sheen of value and gets us tenure). Instead they publish an edited volume straight to the web.

Along the way the authors discover that many of the concerns that humanities scholars have about publishing in a free and open way are either overblown or simply myths. Only one junior scholar (out of the 20 scholars asked to contribute) worries about promotion and tenure. And indeed all of the scholars who contribute to the edited volume receive credit for their chapters. More important, the editors and contributors are surprised to discover that the book makes its way rapidly and powerfully into the consciousness of their field:

[The] initial reaction [to the book] did not prepare us for the acceptance the book ultimately received from the academic communities to which it was addressed.

Since its publication, the Writing Selves/Writing Societies Web page has been visited more than 85,000 times by more than 36,000 unique visitors. The trend, interestingly, has been a steady increase in visits over the past four years, with more than 30,000 occurring in the past 12 months. Since its publication, the book has been downloaded in its entirety more than 36,000 times. Individual essays have been downloaded more than 108,000 times. In terms of perceived quality of the scholarly work in the collection, the book has been well received by the field. Within six months of publication, the book was positively reviewed by four journals: two print and two electronic. One year after its publication, in the keynote address to the Conference on College Composition and Communication, the major annual conference in writing studies, Kathleen Blake Yancey quoted extensively from chapters in the book. And the book has continued to figure prominently in scholarly work subsequently published in the field of composition and rhetoric.

According to a search of Google Scholar, which indexes scholarly publications available on the Web (29 September 2006), the book or individual chapters in it has been cited 68 times, according to a search of Google Scholar. Although we do not have comprehensive comparison data for print publications, we suspect that this is a higher rate. A print–only collection with about the same number of chapters (15) published in the same year as Writing Selves/Writing Societies (and winner of a best book award given by a leading journal in the field), had far fewer citations: 10. Our experience suggests that open access scholarly books follow a pattern of citation similar to journals, which indicate that open access journal articles in a wide range of fields are both more likely to be cited and likely to be cited more quickly. Our experience with Writing Selves/Writing Societies supports this…

Overall, Writing Selves/Writing Societies appears to have entered into the system of book publishing neatly, in spite of the fact that it was not published by a traditional academic publisher and was being offered at no charge.

Beyond the questions of business models, scholarly influence, and promotion and tenure, there is also the nagging question Roy Rosenzweig posed in “Should Historical Scholarship Be Free?” At the time Roy was the Vice President for Research at the American Historical Association, and was pushing for open access to the American Historical Review. (Ultimately he got the powers that be to agree to put AHR articles online for free, although the book reviews remain behind gates.)

Besides the ethical good of publishing in an open access model—sharing educational and scholarly materials—Roy noted that the work of most scholars is funded, directly or indirectly, by the public. Noting the National Institutes of Health‘s recent mandate that grantees share their work openly with the public, Roy wrote:

The new policy affects few historians, but its implications ought to give us serious pause. After all, historical research also benefits directly (albeit considerably less generously) through grants from federal agencies like the National Endowment for the Humanities; even more of us are on the payroll of state universities, where research support makes it possible for us to write our books and articles. If we extend the notion of “public funding” to private universities and foundations (who are, of course, major beneficiaries of the federal tax codes), it can be argued that public support underwrites almost all historical scholarship.

Do the fruits of this publicly supported scholarship belong to the public? Should the public have free access to it?

Roy, of course, thought this meant that like NIH grantees we should provide open access to our articles, such as those in the AHR. But doesn’t the same argument hold true for books?

[Postscript: Some scientists have been wondering the same thing.]

[Image credit]