Dan Cohen

Archive for the ‘Books’ Category

Some Thoughts on the Hacking the Academy Process and Model

Thursday, September 8th, 2011

I’m delighted that the edited version of Hacking the Academy is now available on the University of Michigan’s DigitalCultureBooks site. Here are some of my quick thoughts on the process of putting the book together. (For more, please read the preface Tom Scheinfeldt and I wrote.)

1) Be careful what you wish for. Although we heavily promoted the submission process for HTA, Tom and I had no idea we would receive over 300 contributions from nearly 200 authors. This put an enormous, unexpected burden on us; it obviously takes a long time to read through that many submissions. Tom and I had to set up a collaborative spreadsheet for assessing the contributions, and it took several months to slog through the mass. We also had to make tough decisions about what kind of work to include, since we were not overly prescriptive about what we were looking for. A large number of well-written, compelling pieces (including many from friends of ours) had to be left out of the volume, unfortunately, because they didn’t quite match our evolving criteria, or didn’t fit with other pieces in the same chapter.

2) Set aside dedicated time and people. Other projects that have crowdsourced volumes, such as Longshot Magazine, have well-defined crunch times for putting everything together, using an expanded staff and a lot of coffee. I think it’s fair to say (and I hope not haughty to say) that Tom and I are incredibly busy people and we had to do the assembly and editing in bits and pieces. I wish we could have gotten it done much sooner to sustain the energy of the initial week. We probably could have included others in the editing process, although I think we have good editorial consistency and smooth transitions because of the more limited control.

3) Get the permissions set from the beginning. One of the delays on the edited volume was making sure we had the rights to all of the materials. HTA has made us appreciate even more the importance of pushing for Creative Commons licenses (especially the simple CC-BY) in academia; many of our contributors are dedicated to open access and already had licensed their materials under a permissive reproduction license, but we had to annoy everyone else (and by “we,” I mean the extraordinary helpful and capable Shana Kimball at MPublishing). This made the HTA process a little more like a standard publication, where the press has to hound contributors for sign-offs, adding friction along the way.

4) Let the writing dictate the form, not vice versa. I think one of the real breakthroughs that Tom and I had in this process is realizing that we didn’t need to adhere to a standard edited-volume format of same-size chapters. After reading through odd-sized submissions and thinking about form, we came up with an array of “short, medium, long” genres that could fit together on a particular theme. Yes, some of the good longer pieces could stand as more-or-less standard essays, but others could be paired together or set into dialogues. It was liberating to borrow some conventions from, e.g., magazines and the way they handle shorter pieces. In some cases we also got rather aggressive about editing down articles so that they would fit into useful spaces.

5) This is a model that can be repeated. Sure, it’s not ideal for some academic cases, and speed is not necessarily of the essence. But for “state of the field” volumes, vibrant debates about new ideas, and books that would benefit from blended genres, it seems like an improvement upon the staid “you have two years to get me 8,000 words for a chapter” model of the edited book.

Using WordPress as a Book-Writing Platform

Thursday, July 28th, 2011

I’ve had a few people ask about the writing environment I’m using for The Ivory Tower and the Open Web (introduction posted a couple of days ago). I’m writing the book entirely in WordPress, which really has matured into a terrific authoring platform. Some notes:

1) The addition of the TinyMCE WYSIWYG text-editing tools made WordPress today’s version of the beloved Word 5.1, the lean, mean, writing machine that Word used to be before Microsoft bloated it beyond recognition.

2) WordPress 3.2 joined the distraction-free trend mainstreamed by apps like Scrivener and Instapaper, where computer administrative debris (as Edward Tufte once called the layers of eye-catching controls that frame most application windows) fades away. If you go into full-screen mode in the editor everything disappears but your text. WordPress devs even thoughtfully added a zen “Just write” prompt to get you going. Go full-screen in your browser for extra zen.

3) For footnotes, I’m using the excellent WP-Footnotes plugin, which is not only easy to use but (perhaps critically for the future) degrades gracefully into parenthetical embedded citations outside of WordPress.

4) I’m of course using Zotero to insert and format those footnotes, using one of the features that makes Zotero better (IMHO) than other research managers: the ability to drag and drop formatted citations right from the Zotero interface into a textarea in the browser. (WP-Footnotes handles the automatic numbering.)

5) I’ve done a few tweaks to WordPress’s wp-admin CSS to customize the writing environment (there’s an “editorcontainer” that styles the textarea). In particular, I found the default width too wide for comfortable writing or reading. So I resized it to 500 pixels, which is roughly the line width of a standard book.

The Ivory Tower and the Open Web: Introduction: Burritos, Browsers, and Books [Draft]

Tuesday, July 26th, 2011

[A draft of the introduction to my forthcoming book, The Ivory Tower and the Open Web, which looks at academic resistance to the modes and genres of the web, and how those modes and genres might actually reinvigorate the academy. I'll be posting drafts of chapters as well for open comment and criticism.]

In the summer of 2007, Nate Silver decided to conduct a rigorous assessment of the inexpensive Mexican restaurants in his neighborhood, Chicago’s Wicker Park. Figuring that others might be interested in the results of his study, and that he might be able to use some feedback from an audience, he took his project online.

Silver had no prior experience in such an endeavor. By day he worked as a statistician and writer at Baseball Prospectus—an innovator, to be sure, having created a clever new standard for empirically measuring the value of players, an advanced form of the “sabermetrics” vividly described by Michael Lewis in Moneyball.1 But Silver had no experience as a food critic, nor as a web developer.

In time, his appetite took care of the former and the open web took care of the latter. Silver knit together a variety of free services as the tapestry for his culinary project. He set up a blog, The Burrito Bracket, using Google’s free Blogger web application. Weekly posts consisted of his visits to local restaurants, and the scores (in jalapeños) he awarded in twelve categories.

Home page of Nate Silver’s Burrito Bracket
Ranking system (upper left quadrant)

Being a sports geek, he organized the posts as a series of contests between two restaurants. Satisfying his urge to replicate March Madness, he modified another free application from Google, generally intended to create financial or data spreadsheets, to produce the “bracket” of the blog’s title.

Google Spreadsheets used to create the competition bracket

Like many of the savviest users of the web, Silver started small and improved the site as he went along. For instance, he had started to keep a photographic record of his restaurant visits and decided to share this documentary evidence. So he enlisted the photo-sharing site Flickr, creating an off-the-rack archive to accompany his textual descriptions and numerical scores. On August 15, 2007, he added a map to the site, geolocating each restaurant as he went along and color-coding the winners and losers.

Flickr photo archive for The Burrito Bracket (flickr.com)
Silver’s Google Map of Chicago’s Wicker Park (shaded in purple) with the location of each Mexican restaurant pinpointed

Even with its do-it-yourself enthusiasm and the allure of carne asada, Silver had trouble attracting an audience. He took to Yelp, a popular site for reviewing restaurants to plug The Burrito Bracket, and even thought about creating a Super Burrito Bracket, to cover all of Chicago.2 But eventually he abandoned the site following the climactic “Burrito Bowl I.”

With his web skills improved and a presidential election year approaching, Silver decided to try his mathematical approach on that subject instead—”an opportunity for a sort of Moneyball approach to politics,” as he would later put it.3 Initially, and with a nod to his obsession with Mexican food, he posted his empirical analyses of politics under the chili-pepper pseudonym “Poblano,” on the liberal website Daily Kos, which hosts blogs for its engaged readers.

Then, in March 2008, Silver registered his own web domain, with a title that was simultaneously and appropriately mathematical and political: fivethirtyeight.com, a reference to the total number of electors in the United States electoral college. He launched the site with a slight one-paragraph post on a recent poll from South Dakota and a summary of other recent polling from around the nation. As with The Burrito Bracket it was a modest start, but one that was modular and extensible. Silver soon added maps and charts to bolster his text.

FiveThirtyEight two months after launch, in May 2008

Nate Silver’s real name and FiveThiryEight didn’t remain obscure for long. His mathematical modeling of the competition between Barack Obama and Hillary Clinton for the Democratic presidential nomination proved strikingly, almost creepily, accurate. Clear-eyed, well-written, statistically rigorous posts began to be passed from browsers to BlackBerries, from bloggers to political junkies to Beltway insiders. From those wired early subscribers to his site, Silver found an increasingly large audience of those looking for data-driven, deeply researched analysis rather than the conventional reporting that presented political forecasting as more art than science.

FiveThiryEight went from just 800 visitors a day in its first month to a daily audience of 600,000 by October 2008.4 On election day, FiveThiryEight received a remarkable 3 
million 
visitors, more than most daily newspapers
.5

All of this attention for a site that most media coverage still called, with a hint of deprecation, a “blog,” or “aggregator” of polls, despite Silver’s rather obvious, if latent, journalistic skills. (Indeed, one of his roads not taken had been an offer, straight out of college, to become an assistant at The Washington Post.6 ) An article in the Colorado Daily on the emergent genre represented by FiveThirtyEight led with Ken Bickers, professor and chair of the political science department at the University of Colorado, saying that such sites were a new form of “quality blogs” (rather than, evidently, the uniformly second-rate blogs that had previously existed). The article then swerved into much more ominous territory, asking whether reading FiveThirtyEight and similar blogs was potentially dangerous, especially compared to the safe environs of the traditional newspaper. Surely these sites were superficial, and they very well might have a negative effect on their audience:

Mary Coussons-Read, a professor of psychology at CU Denver, says today’s quick turnaround of information helps to make it more compelling.

“Information travels so much more quickly,” she says. “(We expect) instant gratification. If people have a question, they want an answer.”

That real-time quality can bring with it the illusion that it’s possible to perceive a whole reality by accessing various bits of information.

“There’s this immediacy of the transfer of information that leads people to believe they’re seeing everything … and that they have an understanding of the meaning of it all,” she says.

And, Coussons-Read adds, there is pleasure in processing information.

“I sometimes feel like it’s almost a recreational activity and less of an information-gathering activity,” she says.

Is it addiction?

[Michele] Wolf says there is something addicting about all that data.

“I do feel some kind of high getting new information and being able to process it,” she says. “I’m also a rock climber. I think there are some characteristics that are shared. My addiction just happens to be information.”

While there’s no such mental-health diagnosis as political addiction, Jeanne White, chemical dependency counselor at Centennial Peaks Hospital in Louisville, says political information seeking could be considered an addictive process if it reaches an extreme.7

This stereotype of blogs as the locus of “information” rather than knowledge, of “recreation” rather than education, was—and is—a common one, despite the wide variety of blogs, including many with long-form, erudite writing. Perhaps in 2008 such a characterization of FiveThirtyEight was unsurprising given that Silver’s only other credits to date were the Player Empirical Comparison and Optimization Test Algorithm (PECOTA) and The Burrito Bracket. Clearly, however, here was an intelligent researcher who had set his mind on a new topic to write about, with a fresh, insightful approach to the material. All he needed was a way to disseminate his findings. His audience appreciated his extraordinarily clever methods—at heart, academic techniques—for cutting through the mythologies and inadequacies of standard political commentary. All they needed was a web browser to find him.

A few journalists saw past the prevailing bias against non-traditional outlets like FiveThirtyEight. In the spring of 2010, Nate Silver bumped into Gerald Marzorati, the editor of the New York Times Magazine, on a train platform in Boston. They struck up a conversation, which eventually turned into a discussion about how FiveThirtyEight might fit into the universe of the Times, which ultimately recognized the excellence of his work and wanted FiveThirtyEight to enhance their political reporting and commentary. That summer, a little more than two years after he had started FiveThirtyEight, Silver’s “blog” merged into the Times under a licensing deal.8 In less time than it takes for most students to earn a journalism degree, Silver had willed himself into writing for one of the world’s premier news outlets, taking a seat in the top tier of political analysis. A radically democratic medium had enabled him to do all of this, without the permission of any gatekeeper.

FiveThirtyEight on the New York Times website, 2010

* * *

 

The story of Nate Silver and FiveThirtyEight has many important lessons for academia, all stemming from the affordances of the open web. His efforts show the do-it-yourself nature of much of the most innovative work on the web, and how one can iterate toward perfection rather than publishing works in fully polished states. His tale underlines the principle that good is good, and that the web is extraordinarily proficient at finding and disseminating the best work, often through continual, post-publication, recursive review. FiveThirtyEight also shows the power of openness to foster that dissemination and the dialogue between author and audience. Finally, the open web enables and rewards unexpected uses and genres.

Undoubtedly it is true that the path from The Burrito Bracket to The New York Times may only be navigated by an exceptionally capable and smart individual. But the tools for replicating Silver’s work are just as open to anyone, and just as powerful. It was with that belief, and the desire to encourage other academics to take advantage of the open web, that Roy Rosenzweig and I wrote Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web.9 We knew that the web, although fifteen years old at the time, was still somewhat alien to many professors, graduate students, and even undergraduates (who might be proficient at texting but know nothing about HTML), and we wanted to make the medium more familiar and approachable.

What we did not anticipate was another kind of resistance to the web, based not on an unfamiliarity with the digital realm or on Luddism but on the remarkable inertia of traditional academic methods and genres—the more subtle and widespread biases that hinder the academy’s adoption of new media. These prejudices are less comical, and more deep-seated, than newspapers’ penchant for tales of internet addiction. This resistance has less to do with the tools of the web and more to do with the web’s culture. It was not enough for us to conclude Digital History by saying how wonderful the openness of the web was; for many academics, this openness was part of the problem, a sign that it might be like “playing tennis with the net down,” as my graduate school mentor worriedly wrote to me.10

In some respects, this opposition to the maximal use of the web is understandable. Almost by definition, academics have gotten to where they are by playing a highly scripted game extremely well. That means understanding and following self-reinforcing rules for success. For instance, in history and the humanities at most universities in the United States, there is a vertically integrated industry of monographs, beginning with the dissertation in graduate school—a proto-monograph—followed by the revisions to that work and the publication of it as a book to get tenure, followed by a second book to reach full professor status. Although we are beginning to see a slight liberalization of rules surrounding dissertations—in some places dissertations could be a series of essays or have digital components—graduate students infer that they would best be served on the job market by a traditional, analog monograph.

We thus find ourselves in a situation, now more than two decades into the era of the web, where the use of the medium in academia is modest, at best. Most academic journals have moved online but simply mimic their print editions, providing PDF facsimiles for download and having none of the functionality common to websites, such as venues for discussion. They are also largely gated, resistant not only to access by the general public but also to the coin of the web realm: the link. Similarly, when the Association of American University Presses recently asked its members about their digital publishing strategies, the presses tellingly remained steadfast in their fixation on the monograph. All of the top responses were about print-on-demand and the electronic distribution and discovery of their list, with a mere footnote for a smattering of efforts to host “databases, wikis, or blogs.”11 In other words, the AAUP members see themselves almost exclusively as book publishers, not as publishers of academic work in whatever form that may take. Surveys of faculty show comfort with decades-old software like word processors but an aversion to recent digital tools and methods.12 The professoriate may be more liberal politically than the most latte-filled ZIP code in San Francisco, but we are an extraordinarily conservative bunch when in comes to the progression and presentation of our own work. We have done far less than we should have by this point in imagining and enacting what academic work and communication might look like if it was digital first.

To be sure, as William Gibson has famously proclaimed, “The future is already here—it’s just not very evenly distributed.”13 Almost immediately following the advent of the web, which came out of the realm of physics, physicists began using the Los Alamos National Laboratory preprint server (later renamed ArXiv and moved to arXiv.org) to distribute scholarship directly to each other. Blogging has taken hold in some precincts of the academy, such as law and economics, and many in those disciplines rely on web-only outlets such as the Social Science Research Network. The future has had more trouble reaching the humanities, and perhaps this book is aimed slightly more at that side of campus than the science quad. But even among the early adopters, a conservatism reigns. For instance, one of the most prominent academic bloggers, the economist Tyler Cowen, still recommends to students a very traditional path for their own work.14 And far from being preferred by a large majority of faculty, quests to open scholarship to the general public often meet with skepticism.15

If Digital History was about the mechanisms for moving academic work online, this book is about how the digital-first culture of the web might become more widespread and acceptable to the professoriate and their students. It is, by necessity, slightly more polemical than Digital History, since it takes direct aim at the conservatism of the academy that twenty years of the web have laid bare. But the web and the academy are not doomed to an inevitable clash of cultures. Viewed properly, the open web is perfectly in line with the fundamental academic goals of research, sharing of knowledge, and meritocracy. This book—and it is a book rather than a blog or stream of tweets because pragmatically that is the best way to reach its intended audience of the hesitant rather than preaching to the online choir—looks at several core academic values and asks how we can best pursue them in a digital age.

First, it points to the critical academic ability to look at any genre without bias and asks whether we might be violating that principle with respect to the web. Upon reflection many of the best things we discover in scholarship are found by disregarding popularity and packaging, by approaching creative works without prejudice. We wouldn’t think much of the meandering novel Moby-Dick if Carl Van Doren hadn’t looked past decades of mixed reviews to find the genius in Melville’s writing. Art historians have similarly unearthed talented artists who did their work outside of the royal academies and the prominent schools of practice. As the unpretentious wine writer Alexis Lichine shrewdly said in the face of fancy labels and appeals to mythical “terroir”: “There is no substitute for pulling corks.”16

Good is good, no matter the venue of publication or what the crowd thinks. Scholars surely understand that on a deep level, yet many persist in the valuing venue and medium over the content itself. This is especially true at crucial moments, such as promotion and tenure. Surely we can reorient ourselves to our true core value—to honor creativity and quality—which will still guide us to many traditionally published works but will also allow us to consider works in some nontraditional venues such as new open access journals or articles written and posted on a personal website or institutional repository, or digital projects.

The genre of the blog has been especially cursed by this lack of open-mindedness from the academy. Chapter 1, “What is a Blog?”, looks at the history of the blog and blogging, the anatomy and culture of a genre that is in many ways most representative of the open web. Saddled with an early characterization as being the locus of inane, narcissistic writing, the blog has had trouble making real inroads in academia, even though it is an extraordinarily flexible form and the perfect venue for a great deal of academic work. The chapter highlights some of the best examples of academic blogging and how they shape and advance arguments in a field. We can be more creative in thinking about the role of the blog within the academy, as a venue for communicating our work to colleagues as well as to a lay audience beyond the ivory tower.

This academic prejudice against the blog extends to other genres that have proliferated on the open web. Chapter 2, “Genres and the Open Web,” examines the incredible variety of those new forms, and how, with a careful eye, we might be able to import some of them profitably into the academy. Some of these genres, like the wiki, are well-known (thanks to Wikipedia, which academics have come to accept begrudgingly in the last five years). Other genres are rarer but take maximal advantage of the latitude of the open web: its malleability and interactivity. Rather than imposing the genres we know on the web—as we do when we post PDFs of print-first journal articles—we would do well to understand and adopt the web’s native genres, where helpful to scholarly pursuits.

But what of our academic interest in validity and excellence, enshrined in our peer review system? Chapter 3, “Good is Good,” examines the fundamental requirements of any such system: the necessity of highlighting only a minority of the total scholarly output, based on community standards, and of disseminating that minority of work to communities of thought and practice. The chapter compares print-age forms of vetting with native web forms of assessment and review, and proposes ways that digital methods can supplement—or even replace—our traditional modes of peer review.

“The Value, and Values, of Openness,” Chapter 4, broadly examines the nature of the web’s openness. Oddly, this openness is both the easiest trait of the web to understand and its most complex, once one begins to dig deeper. The web’s radical openness not only has led to calls for open access to academic work, which has complicated the traditional models of scholarly publishers and societies; it has also challenged our academic predisposition toward perfectionism—the desire to only publish in a “final” format, purged (as much as possible) of error. Critically, openness has also engendered unexpected uses of online materials—for instance, when Nate Silver refactored poll numbers from the raw data polling agencies posted.

Ultimately, openness is at the core of any academic model that can operate effectively on the web: it provides a way to disseminate our work easily, to assess what has been published, and to point to what’s good and valuable. Openness can naturally lead—indeed, is leading—to a fully functional shadow academic system for scholarly research and communication that exists beyond the more restrictive and inflexible structures of the past.

[Update, 7/29/11: I've answered Zach Schrag's criticism about the disciplinary scope of the book in a new paragraph beginning with "To be sure, as William Gibson..."]

[Update, 8/1/11: Added more about "good is good," beginning with the line on Alexis Lichine and continuing through the following paragraph, to address Sylvia Miller's point about promotion and tenure. Also fixed a few points of grammar, thanks to Sherman Dorn.]

  1. Nate Silver, “Introducing PECOTA,” in Gary Huckabay, Chris Kahrl, Dave Pease et al., eds., Baseball Prospectus 2003 (Dulles, VA: Brassey’s Publishers, 2003): 507-514. Michael Lewis, Moneyball: The Art of Winning an Unfair Game (New York: W. W. Norton & Company, 2004). []
  2. Frequently Asked Questions, The Burrito Bracket, http://burritobracket.blogspot.com/2007/07/faq.html []
  3. http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf []
  4. Adam Sternbergh, The Spreadsheet Psychic, New York, Oct 12, 2008, http://nymag.com/news/features/51170/ []
  5. http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf []
  6. http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf []
  7. Cindy Sutter, “Hooked on information: Can political news really be addicting?” The Colorado Daily, November 3, 2008, http://www.coloradodaily.com/ci_13105998 []
  8. Nate Silver, “FiveThirtyEight to Partner with New York Times, http://www.fivethirtyeight.com/2010/06/fivethirtyeight-to-partner-with-new.html []
  9. Daniel J. Cohen and Roy Rosenzweig, Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web (University of Pennsylvania Press, 2006). []
  10. http://www.dancohen.org/2010/11/11/frank-turner-on-the-future-of-peer-review/ []
  11. Association of American University Presses, “Digital Publishing in the AAUP Community; Survey Report: Winter 2009-2010,” http://aaupnet.org/resources/reports/0910digitalsurvey.pdf, p. 2 []
  12. See, for example, Robert B. Townsend, “How Is New Media Reshaping the Work of Historians?”, Perspectives on History, November 2010, http://www.historians.org/Perspectives/issues/2010/1011/1011pro2.cfm []
  13. National Public Radio, “Talk of the Nation” radio program, 30 November 1999, timecode 11:55, http://discover.npr.org/features/feature.jhtml?wfId=1067220 []
  14. “Tyler Cowen: Academic Publishing,” remarks at the Institute for Humane Studies Summer Research Fellowship weekend seminar, May 2011, http://vimeo.com/24124436 []
  15. Open access mandates have been tough sells on many campuses, passing only by slight majorities or failing entirely. For instance, such a mandate was voted down at the University of Maryland, with evidence of confusion and ambivalence. http://scholarlykitchen.sspnet.org/2009/04/28/umaryland-faculty-vote-no-oa/ []
  16. Quoted in Frank J. Prial, “Wine Talk,” New York Times, 17 August 1994, http://www.nytimes.com/1994/08/17/garden/wine-talk-983519.html. []

Initial Thoughts on the Google Books Ngram Viewer and Datasets

Sunday, December 19th, 2010

First and foremost, you have to be the most jaded or cynical scholar not to be excited by the release of the Google Books Ngram Viewer and (perhaps even more exciting for the geeks among us) the associated datasets. In the same way that the main Google Books site has introduced many scholars to the potential of digital collections on the web, Google Ngrams will introduce many scholars to the possibilities of digital research. There are precious few easy-to-use tools that allow one to explore text-mining patterns and anomalies; perhaps only Wordle has the same dead-simple, addictive quality as Google Ngrams. Digital humanities needs gateway drugs. Kudos to the pushers on the Google Books team.

Second, on the concurrent launch of “Culturomics“: Naming new fields is always contentious, as is declaring precedence. Yes, it was slightly annoying to have the Harvard/MIT scholars behind this coinage and the article that launched it, Michel et al., stake out supposedly new ground without making sufficient reference to prior work and even (ahem) some vaguely familiar, if simpler, graphs and intellectual justifications. Yes, “Culturomics” sounds like an 80s new wave band. If we’re going to coin neologisms, let’s at least go with Sean Gillies’ satirical alternative: Freakumanities. No, there were no humanities scholars in sight in the Culturomics article. But I’m also sure that longtime “humanities computing” scholars consider advocates of “digital humanities” like me Johnnies-come-lately. Luckily, digital humanities is nice, and so let us all welcome Michel et al. to the fold, applaud their work, and do what we can to learn from their clever formulations. (But c’mon, Cantabs, at least return the favor by following some people on Twitter.)

Third, on the quality and utility of the data: To be sure, there are issues. Some big ones. Mark Davies makes some excellent points about why his Corpus of Historical American English (COHA) might be a better choice for researchers, including more nuanced search options and better variety and normalization of the data. Natalie Binder asks some tough questions about Google’s OCR. On Twitter many of us were finding serious problems with the long “s” before 1800 (Danny Sullivan got straight to the naughty point with his discourse on the history of the f-bomb). But the Freakumanities, er, Culturomics guys themselves talk about this problem in their caveats, as does Google.

Moreover, the data will improve. The Google n-grams are already over a year old, and the plan is to release new data as soon as it can be compiled. In addition, unlike text-mining tools like COHA, Google Ngrams is multilingual. For the first time, historians working on Chinese, French, German, and Spanish sources can do what many of us have been doing for some time. Professors love to look a gift horse in the mouth. But let’s also ride the horse and see where it takes us.

So where does it take us? My initial tests on the viewer and examination of the datasets—which, unlike the public site, allow you to count words not only by overall instances but, critically, by number of pages those instances appear on and number of works they appear in—hint at much work to be done:

1) The best possibilities for deeper humanities research are likely in the longer n-grams, not in the unigrams. While everyone obsesses about individuals words (guilty here too of unigramism) or about proper names (which are generally bigrams), more elaborate and interesting interpretations are likelier in the 4- and 5-grams since they begin to provide some context. For instance, if you want to look at the history of marriage, charting the word itself is far less interesting than seeing if it co-occurs with words like “loving” or “arranged.” (This is something we learned in working on our NEH-funded grant on text mining for historians.)

2) We should remember that some of the best uses of Google’s n-grams will come from using this data along with other data. My gripe with the “Culturomics” name was that it implied (from “genomics”) that some single massive dataset, like the human genome, will be the be-all and end-all for cultural research. But much of the best digital humanities work has come from mashing up data from different domains. Creative scholars will find ways to use the Google n-grams in concert with other datasets from cultural heritage collections.

3) Despite my occasional griping about the Culturomists, they did some rather clever things with statistics in the latter part of their article to tease out cultural trends. We historians and humanists should be looking carefully at the more complex formulations of Michel et al., when they move beyond linguistics and unigram patterns to investigate in shrewd ways topics like how fleeting fame is and whether the suppression of authors by totalitarian regimes works. Good stuff.

4) For me, the biggest problem with the viewer and the data is that you cannot seamlessly move from distant reading to close reading, from the bird’s eye view to the actual texts. Historical trends often need to be investigated in detail (another lesson from our NEH grant), and it’s not entirely clear if you move from Ngram Viewer to the main Google Books interface that you’ll get the book scans the data represents. That’s why I have my students use Mark Davies’ Time Magazine Corpus when we begin to study historical text mining—they can easily look at specific magazine articles when they need to.

How do you plan to use the Google Books Ngram Viewer and its associated data? I would love to hear your ideas for smart work in history and the humanities in the comments, and will update this post with my own further thoughts as they occur to me.

New York Times Covers Victorian Books Project

Friday, December 3rd, 2010

Patricia Cohen of the New York Times has been working on an excellent series on digital humanities, and her second article focuses on our text mining work on Victorian books, which was directly enabled by a grant from Google and more broadly enabled by a previous grant from the National Endowment for the Humanities to explore text mining in history. I’m glad Cohen (no relation) captured the nuances and caveats as well as the potential of digital methods. I also liked how the graphics department did a great job converting and explaining some of our graphs.

I previously posted a rough transcript of my talk on Victorian history and literature that Cohen mentions in the piece. She also covered my work earlier this year in an article on peer review that was much debated in academia.

Thoughts on One Week | One Tool

Thursday, August 5th, 2010

Well that just happened. It’s hard to believe that last Sunday twelve scholars and software developers were arriving at the brand-new Mason Inn on our campus and now have created and launched a tool, Anthologize, that created a frenzy on social and mass media.

If you haven’t already done so, you should first read the many excellent reports from those who participated in One Week | One Tool (and watched it from afar). One Week | One Tool was an intense institute sponsored by the National Endowment for the Humanities that strove to convey the Center for History and New Media‘s knowledge about building useful scholarly software. As the name suggests, the participants had to conceive, build, and disseminate their own tool in just one week. To the participants’ tired voices I add a few thoughts from the aftermath.

Less Talk, More Grok

One Week director (and Center for History and New Media managing director) Tom Scheinfeldt and I grew up listening to WAAF in Boston, which had the motto (generally yelled, with reverb) “Less Talk, More Rock!” (This being Boston, it was actually more like “Rahwk!”) For THATCamp I spun that call-to-action into “Less Talk, More Grok!” since it seemed to me that the core of THATCamp is its antagonism toward the deadening lectures and panels of normal academic conferences and its attempt to maximize knowledge transfer with nonhierarchical, highly participatory, hands-on work. THATCamp is exhausting and exhilarating because everyone is engaged and has something to bring to the table.

Not to over-philosophize or over-idealize THATCamp, but for academic doubters I do think the unconference is making an argument about understanding that should be familiar to many humanists: the importance of “tacit knowledge.” For instance, in my field, the history of science, scholars have come to realize in the last few decades that not all of science consists of cerebral equations and concepts that can be taught in a textbook; often science involves techniques and experiential lessons that must be acquired in a hands-on way from someone already capable in that realm.

This is also true for the digital humanities. I joked with emissaries from the National Endowment for the Humanities, which took a huge risk in funding One Week, that our proposal to them was like Jerry Seinfeld’s and George Costanza’s pitch to NBC for a “show about nothing.” I’m sure it was hard for reviewers of our proposal to see its slightly sketchy syllabus. (“You don’t know what will be built ahead of time?!”) But this is the way in which the digital humanities is close to the lab sciences. There can of course be theory and discussion, but there will also have to be a lot of doing if you want to impart full knowledge of the subject. Many times during the week I saw participants and CHNMers convey things to each other—everything from little shortcuts to substantive lessons—that wouldn’t have occurred to us ahead of time, without the team being engaged in actually building something.

MTV Cops

The low point of One Week was undoubtedly my ham-fisted attempt at something of a keynote while the power was out on campus, killing the lights, the internet, and (most seriously) the air conditioning. Following “Less Talk, More Grok,” I never should have done it. But one story I told at the beginning did seem to have modest continuing impact over the week (if frequently as the source of jokes).

Hollywood is famous for great (and laughable) idea pitches—which is why that Seinfeld episode was amusing—but none is perhaps better than Brandon Tartikoff’s brilliantly concise pitch for Miami Vice: “MTV cops.” I’m a firm believer that it’s important to be able to explain a digital tool with something close to the precision of “MTV cops” if you want a significant number of people to use it. Some might object that we academics are smart folks, capable of understanding sophisticated, multivalent tools, but people are busy, and with digital tools there are so many clamoring for attention and each entails a huge commitment (often putting your scholarship into an entirely new system). Scholars, like everyone else, are thus enormously resistant to tools that are hard to grasp. (Case in point: Google Wave.)

I loved the 24 hours of One Week from Monday afternoon to Tuesday afternoon where the group brainstormed potential tools to build and then narrowed them down to “MTV Cops” soundbites. Of course the tools were going to be more complex than these reductionistic soundbites, but those soundbites gave the process some focus and clarity. It also allowed us to ask Twitter followers to vote on general areas of interest (e.g., “Better timelines”) to gauge the market. We tweeted “Blog->Book” for idea #1, which is what became Anthologize.

And what were most of the headlines on launch day? Some variant on the crystal-clear ReadWriteWeb headline: “Scholars Build Blog-to-eBook Tool in One Week.”

Speed Doesn’t Kill

We’ve gotten occasional flak at the Center for History and New Media for some recent efforts that seem more carnival than Ivory Tower, because they seem to throw out the academic emphasis on considered deliberation. (However, it should be noted that we also do many multi-year, sweat-and-tears, time-consuming projects like the National History Education Clearinghouse, putting online the first fifteen years of American history, and creating software used by millions of people.)

But the experience of events like One Week makes me question whether the academic default to deliberation is truly wise. One Weekers could have sat around for a week, a month, a year, and still I suspect that the tool they decided to build was the best choice, with the greatest potential impact. As programmers in the real world know, it’s much better to have partial, working code than to plan everything out in advance. Just by launching Anthologize in alpha and generating all that excitement, the team opened up tremendous reserves of good will, creativity, and problem-solving from users and outside developers. I saw at least ten great new use cases for Anthologize on Twitter in the first day. How are you supposed to come up with those ideas from internal deliberation or extensive planning?

There was also something special about the 24/7 focus the group achieved. The notion that they had to have a tool in one week (crazy on the face of it) demanded that the participants think about that tool all of the time (even in their sleep, evidently). I’ll bet there was the equivalent of several months worth of thought that went on during One Week, and the time limit meant that participants didn’t have the luxury of overthinking certain choices that were, at the end of the day, either not that important or equally good options. Eric Johnson, observing One Week on Twitter, called this the power of intense “singular worlds” to get things done. Paul Graham has similarly noted the importance of environments that keep one idea foremost in your mind.

There are probably many other areas where focus, limits, and, yes, speed might help us in academia. Dissertations, for instance, often unhealthily drag on as doctoral students unwisely aim for perfection, or feel they have to write 300 pages even though their breakthrough thesis is contained in a single chapter. I wonder if a targeted writing blitz like the successful National Novel Writing Month might be ported to the academy.

Start Small, Dream Big

As dissertations become books through a process of polish and further thought, so should digital tools iterate toward perfection from humble beginnings. I’ve written in this space about the Center for History and New Media’s love of Voltaire’s dictum that “the perfect is the enemy of the good [enough],” and we communicated to One Week attendees that it was fine to start with a tool that was doable in a week. The only caveat was that tool should be conceived with such modularity and flexibility that it could grow into something very powerful. The Anthologize launch reminds me of what I said in this space about Zotero on its launch: it was modest, but it had ambition. It was conceived not just as a reference manager but as an extensible platform for research. The few early negative comments about Anthologize similarly misinterpreted it myopically as a PDF-formatter for blogs. Sure, it will do that, as can other services. But like Zotero (and Omeka) Anthologize is a platform that can be broadly extended and repurposed. Most people thankfully got that—it sparked the imagination of many, even though it’s currently just a rough-around-the-edges alpha.

Congrats again to the whole One Week team. Go get some rest.

Crowdsourcing the Title of My Next Book

Wednesday, August 4th, 2010

Already put this out on Twitter but will reblog here:

I’m crowdsourcing the title of my next book, which is about the way in which common web tech/methods should influence academia, rather than academia thinking it can impose its methods and genres on the web. The title should be a couplet like “The X and the Y” where X can be “Highbrow Humanities” “Elite Academia” “The Ivory Tower” “Deep/High Thought” [insert your idea] and Y can be “Lowbrow Web” “Common Web” “Vernacular Technology/Web” “Public Web” [insert your idea]. so possible titles are “The Highbrow Humanities and the Lowbrow Web” or “The Ivory Tower and the Wild Web” etc. What’s your choice? Thanks in advance for the help and suggestions.

Introducing Anthologize

Monday, August 2nd, 2010

A long-running theme of this blog has been the perceived gulf between new forms of online scholarship—including the genre of the blog itself—and traditional forms such as the book and journal. I’m obviously delighted, then, about the outcome of One Week | One Tool, a week-long institute funded by the National Endowment for the Humanities and run by the Center for History and New Media at George Mason University. As the name suggests, twelve humanities scholars with technical chops hunkered down for one week to produce a digital tool they thought could have an impact in the humanities and beyond.

Today marks the launch of this effort: Anthologize, software that converts the popular open-source WordPress system into a full-fledged book-production platform. Using Anthologize, you can take online content such as blogs, feeds, and images (and soon multimedia), and organize it, edit it, and export it into a variety of modern formats that will work on multiple devices. Have a poetry blog? Anthologize it into a nice-looking ePub ebook and distribute it to iPads the world over. A museum with an RSS feed of the best items from your collection? Anthologize it into a coffee table book. Have a group blog on a historical subject? Anthologize the best pieces quarterly into a print or e-journal, or archive it in TEI. Get all the delicious details on the newly revealed Anthologize website.

Anthologize is free and open source software. Obviously in one week it’s impossible to have feature-complete, polished software. There will be a few rough edges. But it works right now (see below) and it’s just the start of a major effort. The grant from NEH anticipates more work for the One Week team over the next year to refine the tool, culminating in a follow-up meeting at THATCamp 2011.

I suspect there will be many users and uses for Anthologize, and developers can extend the software to work in different environments and for different purposes. I see the tool as part of a wave of “reading 2.0″ software that I’ve come to rely on for packaging online content for long-form consumption and distribution, including the Readability browser plugin and Instapaper. This class of software is particularly important for the humanities, which remains very bookish, but it is broadly applicable. Anthologize is flexible enough to handle different genres of writing and content, opening up new possibilities for scholarly communication. Personally, I plan to use Anthologize to run a journal and to edit and write two upcoming books.

Credit for Anthologize goes to the amazing team that produced it: Jason Casden, Boone Gorges, Kathie Gossett, Scott Hanrath, Effie Kapsalis, Doug Knox, Zachary McCune, Julie Meloni, Patrick Murray-John, Steve Ramsay, Patrick Rashleigh, and Jana Remy. It is notable that the One Weekers ranged from a recent college grad to tenured professors, programmers and designers and interface experts who also are humanities scholars, and professionals from libraries, museums, and instructional technology. Remarkably, they first met last Sunday night and had production-ready code by Saturday morning, a website to market and support the software, an outreach plan, and a vision for the future of the software beyond its original state. Not to mention a logo to go on nice-looking swag (personally, I’ll take the book bag).

Credit also goes to the great Center for History and New Media team that instructed and supported the One Weekers in the ways we like to conceive, design, and build digital humanities tools: Sharon Leon, Jeremy Boggs, Sheila Brennan, Trevor Owens, and many others who dropped in to help out. Two huge final credits: one to Tom Scheinfeldt for conceiving and running the structured madness that was One Week | One Tool, and the National Endowment for the Humanities, which took a big risk on a very untraditional institute. We hope they, and others, like the idea and the execution of Anthologize.

And just to give you some idea of what Anthologize can do, here’s the Anthologize ePub version of this blog post on an iPad, created in five minutes:

One Week, One Book: Hacking the Academy

Friday, May 21st, 2010

[Reblogged from the THATCamp website. Please note that you don't need to be a THATCamper to participate. We are soliciting submissions from everyone, worldwide. Join us by writing something in the next week, or if you've already written something you think deserves to be included, let us know!]

Tom Scheinfeldt and I have been brewing a proposal for an edited book entitled Hacking the Academy. Let’s write it together, starting at THATCamp this weekend. And let’s do it in one week.

Can an algorithm edit a journal? Can a library exist without books? Can students build and manage their own learning management platforms? Can a conference be held without a program? Can Twitter replace a scholarly society?

As recently as the mid-2000s, questions like these would have been unthinkable. But today serious scholars are asking whether the institutions of the academy as they have existed for decades, even centuries, aren’t becoming obsolete. Every aspect of scholarly infrastructure is being questioned, and even more importantly, being <em>hacked</em>. Sympathetic scholars of traditionally disparate disciplines are cancelling their association memberships and building their own networks on Facebook and Twitter. Journals are being compiled automatically from self-published blog posts. Newly-minted Ph.D.’s are foregoing the tenure track for alternative academic careers that blur the lines between research, teaching, and service. Graduate students are looking beyond the categories of the traditional C.V. and building expansive professional identities and popular followings through social media. Educational technologists are “punking” established technology vendors by rolling their own open source infrastructure.

“Hacking the Academy” will both explore and contribute to ongoing efforts to rebuild scholarly infrastructure for a new millenium. Contributors can write on these topics, which will form chapters:

  • Lectures and classrooms
  • Scholarly societies
  • Conferences and meetings
  • Journals
  • Books and monographs
  • Tenure and academic employment
  • Scholarly Identity and the CV
  • Departments and disciplines
  • Educational technology
  • Libraries

In keeping with the spirit of hacking, the book will itself be an exercise in reimagining the edited volume. Any blog post, video response, or other media created for the volume and tweeted (or tagged) with the hashtag #hackacad will be aggregated at hackingtheacademy.org. The best pieces will go into the published volume (we are currently in talks with a publisher to do an open access version of this final volume). The volume will also include responses such as blog comments and tweets to individual pieces. If you’ve already written something that you would like included, that’s fine too, just be sure to tweet or tag it (or email us the link to where it’s posted).

You have until midnight on May 28, 2010. Ready, set, go!

UPDATE: [5/23/10] 48 hours in, we have 65 contributions to the book. There’s a running list of contributions.

The Social Contract of Scholarly Publishing

Friday, March 5th, 2010

When Roy Rosenzweig and I finished writing a full draft of our book Digital History, we sat down at a table and looked at the stack of printouts.

“So, what now?” I said to Roy naively. “Couldn’t we just publish what we have on the web with the click of a button? What value does the gap between this stack and the finished product have? Isn’t it 95% done? What’s the last five percent for?”

We stared at the stack some more.

Roy finally broke the silence, explaining the magic of the last stage of scholarly production between the final draft and the published book: “What happens now is the creation of the social contract between the authors and the readers. We agree to spend considerable time ridding the manuscript of minor errors, and the press spends additional time on other corrections and layout, and readers respond to these signals—a lack of typos, nicely formatted footnotes, a bibliography, specialized fonts, and a high-quality physical presentation—by agreeing to give the book a serious read.”

I have frequently replayed that conversation in my mind, wondering about the constitution of this social contract in scholarly publishing, which is deeply related to questions of academic value and reward.

For the ease of conversation, let’s call the two sides of the social contract of scholarly publishing the supply side and the demand side. The supply side is the creation of scholarly works, including writing, peer review, editing, and the form of publication. The demand side is much more elusive—the mental state of the audience that leads them to “buy” what the supply side has produced. In order for the social contract to work, for engaged reading to happen and for credit to be given to the author (or editor of a scholarly collection), both sides need to be aligned properly.

The social contract of the book is profoundly entrenched and powerful—almost mythological—especially in the humanities. As John Updike put it in his diatribe against the digital (and most humanities scholars and tenure committees would still agree), “The printed, bound and paid-for book was—still is, for the moment—more exacting, more demanding, of its producer and consumer both. It is the site of an encounter, in silence, of two minds, one following in the other’s steps but invited to imagine, to argue, to concur on a level of reflection beyond that of personal encounter, with all its merely social conventions, its merciful padding of blather and mutual forgiveness.”

As academic projects have experimented with the web over the past two decades we have seen intense thinking about the supply side. Robust academic work has been reenvisioned in many ways: as topical portals, interactive maps, deep textual databases, new kinds of presses, primary source collections, and even software. Most of these projects strive to reproduce the magic of the traditional social contract of the book, even as they experiment with form.

The demand side, however, has languished. Far fewer efforts have been made to influence the mental state of the scholarly audience. The unspoken assumption is that the reader is more or less unchangeable in this respect, only able to respond to, and validate, works that have the traditional marks of the social contract: having survived a strong filtering process, near-perfect copyediting, the imprimatur of a press.

We need to work much more on the demand side if we want to move the social contract forward into the digital age. Despite Updike’s ode to the book, there are social conventions surrounding print that are worth challenging. Much of the reputational analysis that occurs in the professional humanities relies on cues beyond the scholarly content itself. The act of scanning a CV is an act fraught with these conventions.

Can we change the views of humanities scholars so that they may accept, as some legal scholars already do, the great blog post as being as influential as the great law review article? Can we get humanities faculty, as many tenured economists already do, to publish more in open access journals? Can we accomplish the humanities equivalent of FiveThirtyEight.com, which provides as good, if not better, in-depth political analysis than most newspapers, earning the grudging respect of journalists and political theorists? Can we get our colleagues to recognize outstanding academic work wherever and however it is published?

I believe that to do so, we may have to think less like humanities scholars and more like social scientists. Behavioral economists know that although the perception of value can come from the intrinsic worth of the good itself (e.g., the quality of a wine, already rather subjective), it is often influenced by many other factors, such as price and packaging (the wine bottle, how the wine is presented for tasting). These elements trigger a reaction based on stereotypes—if it’s expensive and looks well-wrapped, it must be valuable. The book and article have an abundance of these value triggers from generations of use, but we are just beginning to understand equivalent value triggers online—thus the critical importance of web design, and why the logo of a trusted institution or a university press can still matter greatly, even if it appears on a website rather than a book.

Social psychologists have also thought deeply about the potent grip of these idols of our tribe. They are aware of how cultural norms establish and propagate themselves, and tell us how the imposition of limits creates hierarchies of recognition. Thinking in their way, along with the way the web works, one potential solution on the demand side might come not from the scarcity of production, as it did in a print world, but from the scarcity of attention. That is, value will be perceived in any community-accepted process that narrows the seemingly limitless texts to read or websites to view. Curation becomes more important than publication once publication ceases to be limited.

[image credit: Priki]