THATCamp 2011: Even Bigger, More Open, More Educational, More Fun

We decided to pull out all the stops for this year’s THATCamp (now called THATCamp Prime or THATCamp CHNM or that THATCamp since there are now so many regional THATCamps). From the THATCamp blog:

All year has been THATCamp time, seems like, but we’re now talking about that THATCamp, which will take place

June 3-5, 2011
Center for History and New Media, Fairfax, VA

We’ve instituted some changes this year:

  • THATCamp will be larger: we’re planning on having about 125 people who do all kinds of work related to the humanities and technology;
  • THATCamp will be truly open to all: instead of having an application process, we’ll be accepting all registrations up to 125 people until April 22;
  • THATCamp will have a BootCamp: the unconference will happen as usual on the weekend over a day and a half, but the Friday beforehand will be devoted to a series of workshops dedicated to improving technical skills; and
  • THATCamp is planning on at least two virtual sessions in which we get to talk to campers at THATCamp Liberal Arts Colleges and to Jon Voss about the outcome of his Linked Open Data in Libraries, Archives, and Museums Summit.

Needless to say, we’re psyched. See you there.

If you haven’t been to THATCamp yet, I can’t recommend it enough. It’s intense, fun, and you’ll learn more and meet more interesting, great people than anywhere else. There’s also a bit of Woodstock to it, and no big registration fee, just a very small suggested donation. We also have on-campus accommodations this year at the very nice new Mason Inn.

Register right now to reserve your slot. Hope to see you in June!

Defining Digital Humanities, Briefly

I’m participating in the Day of Digital Humanities this year, and the organizers have asked all participants to briefly define “digital humanities.” It’s a helpful exercise, and for those new to the field, it might be useful to give the many responses a quick scan. I wrote this one-sentence answer out fairly hastily, but think it’s not so bad:

Broadly construed, digital humanities is the use of digital media and technology to advance the full range of thought and practice in the humanities, from the creation of scholarly resources, to research on those resources, to the communication of results to colleagues and students.

The best answer to “How do you define digital humanities?” came from Lou Burnard: “With extreme reluctance.”

What Scholars Want from the Digital Public Library of America

[A rough transcript of my talk at the Digital Public Library of America meeting at Harvard on March 1, 2011. To permit unguarded, open discussion, we operated under the Chatham House Rule, which prevents attribution of comments, but I believe I'm allowed to violate my own anonymity.]

I was once at a meeting similar to this one, where technologists and scholars were discussing what a large digital library should look like. During a breakout session, the technologists huddled and talked about databases, indices, search mechanisms; the scholars, on the other side of the room, painted a vision of what the archive would look like online, in their view a graphical representation as close to the library as possible, where one could pull down boxes from the shelves, and then open those boxes and leaf through the folios one by one.

While the technologists debated digital infrastructure, the scholars were trying to replicate or maintain what they liked about the analog world they knew: a trusted order, the assurance of the physical, all of the cues they pick up from the shelf and the book. If we want to think about the Digital Public Library of America from the scholar’s point of view, we must think about how to replicate those signals while taking advantage of the technology. In short: the best of the single search box with the trust and feel of the bookshelf.

So how can this group translate those scholarly concerns into elements of the DPLA? I did what any rigorous, traditionally trained scholar would do: I asked my Twitter followers. Here are their thoughts, with my thanks for their help:

First, scholars want reliable metadata about scholarly objects like books. Close enough doesn’t count. Although Google has relatively few metadata errors (given that they handle literally a trillion pieces of metadata), these errors drive scholars mad, and make them skeptical of online collections.

Second, serendipity. Many works of scholarship come from the chance encounter of the scholar with primary sources. How can that be enhanced? Some in my feed suggested a user interface with links to “more like this,” “recent additions in your field,” or “sample collections.” Others advocated social cues, such as user-contributed notes on works in the library.

Third, there are different modes of scholarly research, and the interface has to reflect that: a simple discovery layer with a sophisticated advanced search underneath, faceted search, social search methods for collaborative practice, the ability to search within a collection or subcollection.

Fourth, connection with the physical. We need better representations of books online than the sameness of Google books, where everything looks like a PDF of the same size. Scholars also need the ability to go from the digital to the analog by finding a local copy of a work.

Finally, as I have often said, scholars have uses for libraries that libraries can’t anticipate. So we need the DPLA to enable other parties to build upon, reframe, and reuse the collection. In technical terms, this means open APIs.

Video: The Ivory Tower and the Open Web

Here’s the video of my plenary talk “The Ivory Tower and the Open Web,” given at the Coalition for Networked Information meeting in Washington in December, 2010. A general description of the talk:

The web is now over twenty years old, and there is no doubt that the academy has taken advantage of its tremendous potential for disseminating resources and scholarship. But a full accounting of the academic approach to the web shows that compared to the innovative vernacular forms that have flourished over the past two decades, we have been relatively meek in our use of the medium, often preferring to impose traditional ivory tower genres on the web rather than import the open web’s most successful models. For instance, we would rather digitize the journal we know than explore how blogs and social media might supplement or change our scholarly research and communication. What might happen if we reversed that flow and more wholeheartedly embraced the genres of the open web?

I hope the audience for this blog finds it worthy viewing. I enjoyed talking about burrito websites, Layer Tennis, aggregation and curation services, blog networks, Aaron Sorkin’s touchiness, scholarly uses of Twitter, and many other high- and low-brow topics all in one hour. (For some details in the images I put up on the screen, you might want to follow along with this PDF of the slides.) I’ll be expanding on the ideas in this talk in an upcoming book with the same title.

Web Design Job at CHNM

A great opportunity to join us at the Center for History and New Media:

Do you get as excited about clean mark-up as you do about the latest Photoshop effect? Do you want to be on the cutting edge of web design and digital humanities, and design websites that inform and engage end users?

If so, the Center for History and New Media wants to hear from you.

CHNM, known for innovative work in digital media, is seeking an energetic, well-organized, and creative web designer with front-end development skills or experience to work on a variety of innovative, web-based history projects.

This position is particularly appropriate for someone with a combined interest in technology and history or humanities. The successful applicant will be able to create mockups and wireframes for historical, cultural, and educational websites and bring those ideas to fruition using the latest and highest web development standards.

We are looking for a combination of the following skills:

  • fluency with current web design technologies (including ability to hand code HTML, CSS, and Javascript);
  • fluent in Photoshop and experience with Illustrator;
  • experience with web accessibility and web usability standards;
  • experience with or interest in designing for social media or online communities;
  • experience with common open source content management systems (WordPress, BuddyPress, Drupal, etc.);
  • familiarity with web-database technologies (MySQL, PHP);
  • familiarity with contemporary trends in web development (e.g., AJAX, jquery, Rails, css3/HTML5);
  • prior work in history or the digital humanities is a plus.

CHNM offers a casual, collaborative work environment, with excellent opportunities for professional growth and development.

This is a grant-funded, two-year position at the Center for History and New Media (http://chnm.gmu.edu), located in Fairfax, Virginia, CHNM is 15 miles from Washington, DC, and accessible by public transportation. Apply online (including resume, three references, links to prior web work, and a cover letter describing technology background and any interest in history) at http://jobs.gmu.edu for position #10376z.  We will review applications as they arrive and the job closes on January 31, 2011.

If you have questions, contact us at chnm@gmu.edu with subject line “Web Designer.”

Digital Humanities on the Kojo Nnamdi Show

I really enjoyed being on the Kojo Nnamdi Show today talking about digital humanities for an hour with Kojo, the NEH‘s Brett Bobley, and UVA‘s Bill Ferster. Kojo’s show is produced at Washington’s NPR station, WAMU, and syndicated nationally. It’s also available as an audio stream and a podcast.

Having done podcasts for four years now, I’ve come to understand how difficult it is to do a radio show—to ask the right questions, to not um and er a lot, and to stimulate informative conversation. Kojo really makes it look easy, which is even more impressive given the wide variety of topics he covers. As I left the studio today he immediately prepped to do a show on Eisenhower and the military-industrial complex.

Brett, Bill, and I talked about how to define digital humanities, the use of text mining, visualization, and digital mapping, problems associated with the abundant digital record, collaboration in the digital humanities, and questions of publishing, open access, and tenure. We also took numerous questions from callers. I thought the show had a good vibe.

So, worth a listen: The Kojo Nnamdi Show: “History Meets High-Tech: Digital Humanities”

Today was also a moment to reflect on the fact that the last time I was on the Kojo Nnamdi Show was exactly five years ago, with Roy Rosenzweig. Our book Digital History had just come out. It was just before Roy got sick. Probably said a lot on the broadcast today that Roy would have said.

Initial Thoughts on the Google Books Ngram Viewer and Datasets

First and foremost, you have to be the most jaded or cynical scholar not to be excited by the release of the Google Books Ngram Viewer and (perhaps even more exciting for the geeks among us) the associated datasets. In the same way that the main Google Books site has introduced many scholars to the potential of digital collections on the web, Google Ngrams will introduce many scholars to the possibilities of digital research. There are precious few easy-to-use tools that allow one to explore text-mining patterns and anomalies; perhaps only Wordle has the same dead-simple, addictive quality as Google Ngrams. Digital humanities needs gateway drugs. Kudos to the pushers on the Google Books team.

Second, on the concurrent launch of “Culturomics“: Naming new fields is always contentious, as is declaring precedence. Yes, it was slightly annoying to have the Harvard/MIT scholars behind this coinage and the article that launched it, Michel et al., stake out supposedly new ground without making sufficient reference to prior work and even (ahem) some vaguely familiar, if simpler, graphs and intellectual justifications. Yes, “Culturomics” sounds like an 80s new wave band. If we’re going to coin neologisms, let’s at least go with Sean Gillies’ satirical alternative: Freakumanities. No, there were no humanities scholars in sight in the Culturomics article. But I’m also sure that longtime “humanities computing” scholars consider advocates of “digital humanities” like me Johnnies-come-lately. Luckily, digital humanities is nice, and so let us all welcome Michel et al. to the fold, applaud their work, and do what we can to learn from their clever formulations. (But c’mon, Cantabs, at least return the favor by following some people on Twitter.)

Third, on the quality and utility of the data: To be sure, there are issues. Some big ones. Mark Davies makes some excellent points about why his Corpus of Historical American English (COHA) might be a better choice for researchers, including more nuanced search options and better variety and normalization of the data. Natalie Binder asks some tough questions about Google’s OCR. On Twitter many of us were finding serious problems with the long “s” before 1800 (Danny Sullivan got straight to the naughty point with his discourse on the history of the f-bomb). But the Freakumanities, er, Culturomics guys themselves talk about this problem in their caveats, as does Google.

Moreover, the data will improve. The Google n-grams are already over a year old, and the plan is to release new data as soon as it can be compiled. In addition, unlike text-mining tools like COHA, Google Ngrams is multilingual. For the first time, historians working on Chinese, French, German, and Spanish sources can do what many of us have been doing for some time. Professors love to look a gift horse in the mouth. But let’s also ride the horse and see where it takes us.

So where does it take us? My initial tests on the viewer and examination of the datasets—which, unlike the public site, allow you to count words not only by overall instances but, critically, by number of pages those instances appear on and number of works they appear in—hint at much work to be done:

1) The best possibilities for deeper humanities research are likely in the longer n-grams, not in the unigrams. While everyone obsesses about individuals words (guilty here too of unigramism) or about proper names (which are generally bigrams), more elaborate and interesting interpretations are likelier in the 4- and 5-grams since they begin to provide some context. For instance, if you want to look at the history of marriage, charting the word itself is far less interesting than seeing if it co-occurs with words like “loving” or “arranged.” (This is something we learned in working on our NEH-funded grant on text mining for historians.)

2) We should remember that some of the best uses of Google’s n-grams will come from using this data along with other data. My gripe with the “Culturomics” name was that it implied (from “genomics”) that some single massive dataset, like the human genome, will be the be-all and end-all for cultural research. But much of the best digital humanities work has come from mashing up data from different domains. Creative scholars will find ways to use the Google n-grams in concert with other datasets from cultural heritage collections.

3) Despite my occasional griping about the Culturomists, they did some rather clever things with statistics in the latter part of their article to tease out cultural trends. We historians and humanists should be looking carefully at the more complex formulations of Michel et al., when they move beyond linguistics and unigram patterns to investigate in shrewd ways topics like how fleeting fame is and whether the suppression of authors by totalitarian regimes works. Good stuff.

4) For me, the biggest problem with the viewer and the data is that you cannot seamlessly move from distant reading to close reading, from the bird’s eye view to the actual texts. Historical trends often need to be investigated in detail (another lesson from our NEH grant), and it’s not entirely clear if you move from Ngram Viewer to the main Google Books interface that you’ll get the book scans the data represents. That’s why I have my students use Mark Davies’ Time Magazine Corpus when we begin to study historical text mining—they can easily look at specific magazine articles when they need to.

How do you plan to use the Google Books Ngram Viewer and its associated data? I would love to hear your ideas for smart work in history and the humanities in the comments, and will update this post with my own further thoughts as they occur to me.

New York Times Covers Victorian Books Project

Patricia Cohen of the New York Times has been working on an excellent series on digital humanities, and her second article focuses on our text mining work on Victorian books, which was directly enabled by a grant from Google and more broadly enabled by a previous grant from the National Endowment for the Humanities to explore text mining in history. I’m glad Cohen (no relation) captured the nuances and caveats as well as the potential of digital methods. I also liked how the graphics department did a great job converting and explaining some of our graphs.

I previously posted a rough transcript of my talk on Victorian history and literature that Cohen mentions in the piece. She also covered my work earlier this year in an article on peer review that was much debated in academia.

A Conversation with Richard Stallman about Open Access

[An email exchange with Richard Stallman, father of free software, copyleft, GNU, and the GPL, reprinted here in redacted form with Stallman's permission. Stallman tutors me in the important details of open access and I tutor him in the peculiarities of humanities publishing.]

RS: [Your] posting ["Open Access Publishing and Scholarly Values"] doesn’t specify which definition of “open access” you’re arguing for — but that is a fundamental question.

When the Budapest Declaration defined open access, the crucial condition was that users be free to redistribute copies of the articles.  That is an ethical imperative in its own right, and a requisite for proper and safe archiving of the work.

People paid more attention to the other condition specified in the Budapest Declaration: that the publication site allow access by anyone.  This is a good thing, but need not be explicitly required, because the other condition (freedom to redistribute) will have this as a consequence.  Many universities and labs to set up mirror sites, and everyone will thus have access.

More recently, some have started using a modified definition of “open access” which omits the freedom to redistribute.  As a result, “open access” is no longer a clear rallying point.  I think we should now campaign for “redistributable publication.”

What are your thoughts on this?

DC: I probably should have been clearer in my post that I’m for the maximal access—and distribution—of which you speak. Alas, the situation is actually worse than you imagine, especially in the humanities, where I work, and which is about a decade behind the sciences in open access. Beyond the muddying of the waters through terms like “Green OA” and “Gold OA” is the fact that academic publishing is horribly wrapped up (again, more so in the humanities) with structural problems related to reputation, promotion, and tenure. So my colleagues worry more about truly open publications “counting” vs. publications that are simply open to reading on a commercial publisher’s website. That is why I think the big question is not the licensing or the technology of decentralized publishing, posting and free distribution of papers, etc., but the social realm in which academic publishing sits. I’m working now on pragmatic ways to change that very conservative realm.

Put another way: when software developers write good (open) code, other developers recognize that quality, independent of where the code resides; in humanities publishing, packaging (including the imprimatur of a press, the sense that a work has jumped some (often mythical) peer-review hurdle) counts for too much right now.

RS: ["Green OA" and "Gold OA"] are new to me — can you tell me what they mean?

So my colleagues worry more about truly open publications “counting” vs. publications that are simply open to reading on a commercial publisher’s website.

I don’t understand that sentence.

That is why I think the big question is not the licensing or the technology of decentralized publishing, posting and free distribution of papers, etc., but the social realm in which academic publishing sits.

Ethically speaking, what matters is the license used. That’s what determines whether the publishing is ethical or not. Are you saying that the social realm contains the obstacle to the adoption of ethical publication methods?

Put another way: when software developers write good (open) code, other developers recognize that quality, independent of where the code resides.

Programmers can tell if code is well-written, assuming they are allowed to read it, but how does that relate? Are you saying that in the humanities people often judge work based on where it is published, and have no other way to determine what is good or bad?

DC: Green O[pen] A[ccess] = when a professor deposits her finished article in a university repository after it is published. Theoretically that article will then be available (if people can find the website for the institution’s repository), even if the journal keeps it gated.

Gold OA = when an author pays a journal (often around $1-3K) to make their submission open access. when the journal itself (rather than the repository) is open access; may involve the author paying a submission fee. Still probably doesn’t have a redistribution license, but it’s not behind a publisher’s digital gates.

Counting = counting in the academic promotion and tenure process. Much of the problem here is (I believe misplaced) concern about the effect of open access on one’s career.

Are you saying that the social realm contains the obstacle to the adoption of ethical publication methods?

Correct. And much of it has to do with the meekness of academics (especially in the humanities, bastion of liberalism in most other ways) to challenge the system to create a more ethical publication system, one controlled by the community of scholars rather than commercial publishers who profit from our work.

Are you saying that in the humanities people often judge work based on where it is published, and have no other way to determine what is good or bad?

Amazing as it may sound, many academics do indeed judge a work that way, especially in tenure and promotion processes. There are some departments that actually base promotion and tenure on the number of pages published in the top (mostly gated) journals.

RS: [Terms like "Green OA" and "Gold OA" provides] even more reason to reject the term “open access” and demand redistributable publication.

Maybe some leading scholars could be recruited to start a redistributable journal.  Their names would make it prestigious.

DC: That’s what PLoS did (http://plos.org) in the sciences. Unclear if the model is replicable in the humanities, but I’m trying.

UPDATE: This was an off-hand conversation with Stallman, and my apologies for the quick (and poor) descriptions of a couple of open access options. But I think the many commenters below who are focusing on the fine differences between kinds of OA are missing the central themes of this conversation.

Frank Turner on the Future of Peer Review

As I mentioned in my memorial post for my mentor Frank Turner, we were having a deep discussion of the future of peer review when he suddenly passed away. I wish we could have finished this discussion; as with so many other things, he brought tremendous insight to the topic. Much of the discussion was about personal experiences with peer review that I can’t recount in this public space, but we also got into “strategic planning” for changing the peer review system.

Here are the powerful last few email messages I received from Frank, redacted of personal matters and some touchy subjects. I think all of us trying to reform the academy through digital means should heed his words.

On the practice of peer review (I limit my thoughts to the Humanities) I have several different and conflicting opinions. Numerous journals are really quite well edited in my experience…In theory and often in practice peer review is a good thing.

But the problems of peer review are also longstanding…Other journals have in the past and still do remain the proprietary reserves of academic cliques or worse. One of the problems of which peer review in the humanities is only one facet is the absence of any agreed-upon and widely accepted understanding of the professional procedures and expectations. (Some would say a lack of “professional ethics.”) This has been exacerbated by the vast expansion of the academy in the second half of the last century; the often undue research expectations put on faculty in institutions that cannot financially support significant research; the necessity of editors sending out all manuscripts no matter how clearly mediocre and or undeveloped and hence expanding peer review expectations…

As you and others think through the peer review process, I would hope that you would keep several things in mind. First, you will need to avoid the appearance of playing tennis with the net down. Groups of friends or overly like-minded folks producing journals or collections of essays may disperse various views but do not necessarily make for tough-minded scholarship. Second, the kind of new reviewing processes you and others are suggesting could provide the opportunity really to establish widely accepted understandings of procedures and expectations. Such would be a major new departure, and it could benefit from the input of the editors of genuinely respected journals. Third, and I will return to this point below on another topic, as journals come to be published online (and I think within five years or less entirely on-line), they should make available to readers the possibility of commenting on articles. Again, there would need to be some kind of template so the comments are not like those on Amazon. But what would emerge would be a kind of scholarly community of commentary, revision, and correction. Fourth, at the end of the day, however, a new, open, collective peer review process will still need to indicate that some work is stronger, more deeply researched, and more profoundly analyzed than other work. I happen to think one of the benefits of studying the various areas of the humanities is achieving the capacity to make judgments. The peer review of humanities scholarship should avoid at all costs the appearance and the reality of not being able to make judgments regarding quality…

Let me expand the purview of what you and others are seeking to accomplish.  The realm of peer reviewing of articles is really quite strong when compared to that of book reviewing in journals. Books reviews are published with essentially no peer review, little or no concern or indication of actual or potential conflict of interest, and little or no concern for factual correctness. Such reviews are then used across the country in promotion processes. Scholarly book reviewing stands in a near scandalous situation. Most people review books in order not to purchase them. Reviews tend to be quite brief and as I have indicated are generally unedited except for style. Many reviewers simply rehash the dust jacket. Your group could again add to your agenda the establishment of professional procedures and expectations regarding book reviewing. These would include all reviewers indicating any conflict of interests, e.g., having taught the author, friendship with the author, residing in the same academic department or institution with the author, having written or edited a similar or competing book, having published with the same press, or having some political, religious, or ideological point of view that informs their thinking. Furthermore, again with the establishment of almost entirely on-line journal publication, all authors could be permitted to comment on and correct reviews and other scholars could similarly comment on the review or the book reviewed…

Most of you who are looking toward new ways of peer reviewing are young or at the entry level of the profession. All of you have a clear interest in reforming the existing reviewing process. I hope you will add that to your agenda as well as the peer reviewing of journals.