Category Archives: Software

Digital Journalism and Digital Humanities

I’ve increasingly felt that digital journalism and digital humanities are kindred spirits, and that more commerce between the two could be mutually beneficial. That sentiment was confirmed by the extremely positive reaction on Twitter to a brief comment I made on the launch of Knight-Mozilla OpenNews, including from Jon Christensen (of the Bill Lane Center for the American West at Stanford, and formerly a journalist), Shana Kimball (MPublishing, University of Michigan), Tim Carmody (Wired), and Jenna Wortham (New York Times).

Here’s an outline of some of the main areas where digital journalism and digital humanities could profitably collaborate. It’s remarkable, upon reflection, how much overlap there now is, and I suspect these areas will only grow in common importance.

1) Big data, and the best ways to scan and visualize it. All of us are facing either present-day or historical archives of almost unimaginable abundance, and we need sophisticated methods for finding trends, anomalies, and specific documents that could use additional attention. We also require robust ways of presenting this data to audiences to convey theses and supplement narratives.

2) How to involve the public in our work. If confronted by big data, how and when should we use crowdsourcing, and through which mechanisms? Are there areas where pro-am work is especially effective, and how can we heighten its advantages while diminishing its disadvantages? Since we both do work on the open web rather than in the cloistered realms of the ivory tower, what are we to make of the sometimes helpful, sometimes rocky interactions with the public?

3) The narrative plus the archive. Journalists are now writing articles that link to or embed primary sources (e.g., using DocumentCloud). Scholars are now writing articles that link to or embed primary sources (e.g., using Omeka). Formerly hidden sources are now far more accessible to the reader.

4) Software developers and other technologists are our partners. No longer relegated to secondary status as “the techies who make the websites,” we need to work intellectually and practically with those who understand how digital media and technology can advance our agenda and our content. For scholars, this also extends to technologically sophisticated librarians, archivists, and museum professionals. Moreover, the line between developer and journalist/scholar is already blurring, and will blur further.

5) Platforms and infrastructure. We care a great deal about common platforms, ranging from web and data standards, to open source software, to content management systems such as WordPress and Drupal. Developers we work with can create platforms with entirely novel functionality for news and scholarship.

6) Common tools. We are all writers and researchers. When the New York Times produces a WordPress plugin for editing, it affects academics looking to use WordPress as a scholarly communication platform. When our center updates Zotero, it affects many journalists who use that software for organizing their digital research.

7) A convergence of length. I’m convinced that something interesting and important is happening at the confluence of long-form journalism (say, 5,000 words or more) and short-form scholarship (ranging from long blog posts to Kindle Singles geared toward popular audiences). It doesn’t hurt that many journalists writing at this length could very well have been academics in a parallel universe, and vice versa. The prevalence of high-quality writing that is smart and accessible has never been greater.

This list is undoubtedly not comprehensive; please add your thoughts about additional common areas in the comments. It may be worth devoting substantial time to increasing the dialogue between digital journalists and digital humanists at the next THATCamp Prime, or perhaps at a special THATCamp focused on the topic. Let me know if you’re interested. And more soon in this space.

Introducing Anthologize

A long-running theme of this blog has been the perceived gulf between new forms of online scholarship—including the genre of the blog itself—and traditional forms such as the book and journal. I’m obviously delighted, then, about the outcome of One Week | One Tool, a week-long institute funded by the National Endowment for the Humanities and run by the Center for History and New Media at George Mason University. As the name suggests, twelve humanities scholars with technical chops hunkered down for one week to produce a digital tool they thought could have an impact in the humanities and beyond.

Today marks the launch of this effort: Anthologize, software that converts the popular open-source WordPress system into a full-fledged book-production platform. Using Anthologize, you can take online content such as blogs, feeds, and images (and soon multimedia), and organize it, edit it, and export it into a variety of modern formats that will work on multiple devices. Have a poetry blog? Anthologize it into a nice-looking ePub ebook and distribute it to iPads the world over. A museum with an RSS feed of the best items from your collection? Anthologize it into a coffee table book. Have a group blog on a historical subject? Anthologize the best pieces quarterly into a print or e-journal, or archive it in TEI. Get all the delicious details on the newly revealed Anthologize website.

Anthologize is free and open source software. Obviously in one week it’s impossible to have feature-complete, polished software. There will be a few rough edges. But it works right now (see below) and it’s just the start of a major effort. The grant from NEH anticipates more work for the One Week team over the next year to refine the tool, culminating in a follow-up meeting at THATCamp 2011.

I suspect there will be many users and uses for Anthologize, and developers can extend the software to work in different environments and for different purposes. I see the tool as part of a wave of “reading 2.0″ software that I’ve come to rely on for packaging online content for long-form consumption and distribution, including the Readability browser plugin and Instapaper. This class of software is particularly important for the humanities, which remains very bookish, but it is broadly applicable. Anthologize is flexible enough to handle different genres of writing and content, opening up new possibilities for scholarly communication. Personally, I plan to use Anthologize to run a journal and to edit and write two upcoming books.

Credit for Anthologize goes to the amazing team that produced it: Jason Casden, Boone Gorges, Kathie Gossett, Scott Hanrath, Effie Kapsalis, Doug Knox, Zachary McCune, Julie Meloni, Patrick Murray-John, Steve Ramsay, Patrick Rashleigh, and Jana Remy. It is notable that the One Weekers ranged from a recent college grad to tenured professors, programmers and designers and interface experts who also are humanities scholars, and professionals from libraries, museums, and instructional technology. Remarkably, they first met last Sunday night and had production-ready code by Saturday morning, a website to market and support the software, an outreach plan, and a vision for the future of the software beyond its original state. Not to mention a logo to go on nice-looking swag (personally, I’ll take the book bag).

Credit also goes to the great Center for History and New Media team that instructed and supported the One Weekers in the ways we like to conceive, design, and build digital humanities tools: Sharon Leon, Jeremy Boggs, Sheila Brennan, Trevor Owens, and many others who dropped in to help out. Two huge final credits: one to Tom Scheinfeldt for conceiving and running the structured madness that was One Week | One Tool, and the National Endowment for the Humanities, which took a big risk on a very untraditional institute. We hope they, and others, like the idea and the execution of Anthologize.

And just to give you some idea of what Anthologize can do, here’s the Anthologize ePub version of this blog post on an iPad, created in five minutes:

Shakespeare’s Hard Drive

Congrats to Matt Kirschenbaum on his thought-provoking article in the Chronicle of Higher Education, “Hamlet.doc? Literature in a Digital Age.” Matt makes two excellent points. First, “born digital” literature presents incredible new opportunities for research, because manuscripts written on computers retain significant metadata and draft tracking that allows for major insights into an author’s thought and writing process. Second, scholars who wish to study such literature in the future need to be proactive in pushing for writing environments, digital standards, and archival storage that will provide accessibility and persistence for these advantages.

Creating a Blog from Scratch, Part 9: The Conclusion

Since its inception until today, this blog was powered by code I had written myself. Some people thought this took a lot of work; to be honest, it was just a few days of simple coding. As I noted at the beginning of this series on “Creating a Blog from Scratch,” rather than using existing software or services, such as WordPress or Blogger, I wanted to write my own blog code so that I could experiment with the form of the blog. In general, I found it to be a great exercise that I would highly recommend. It helped me understand the genre of the blog, challenge long-standing assumptions of form and function (like the tyranny of the calendar, now gone on most blogs), and think about ways one might customize a blog to fit academic needs.

But starting today, this blog will be powered by WordPress, not my own code. Am I a hypocrite? Well, yes and no. Yes, in that by switching to WordPress I have had to abandon some quirks of my original blog that had made it unique and that represented the accumulated wisdom of writing my own code. No, in that I feel I’ve learned enough in the process of the last two years that I can bend WordPress to my will enough to satisfy my need to customize and adapt.

More important, I had other needs that I just didn’t have enough time to implement by writing more of my own code, and there were other features of WordPress–a terrific open-source project–that I really wanted:

  • It took two years, but I’ve decided after initially disparaging comments (sentiments echoed recently by some well-known bloggers), I actually do think they are important to a blog and that my critics were right that the blog suffered without them. So starting today I have comments at the end of each post. (My old posts will remain free of comments since I have left them in their original format.)
  • I had also worried that the blog comments would be a haven for spam, but after the release of the wonderful reCAPTCHA system–which helps the Open Content Alliance transcribe digitized books while preventing spam–I felt that relatively spam-free commenting was possible.
  • As successful open-source software, WordPress has engendered a universe of helpful plugins, modifications, and documentation. For instance, this blog is now Zotero-compatible, thanks to the WordPress COinS plugin by my colleague Sean Takats. And of course reCAPTCHA came with a plugin for WordPress too.
  • WordPress’s system for drafting and editing posts is far more advanced than the basic screens I created. Writing this post is taking me about half the time it would have taken in my old system.
  • For the past six months I have been using ma.gnolia to add small posts to my feed (and to the sidebar of my old blog under “Briefly Noted”). I now can do this just as quickly using WordPress, and plan to post much more frequently starting in September.
  • Despite my best efforts, my old blog code failed to output valid XHTML, which I believe is increasingly important in a world where non-computer devices (such as the iPhone) are browsing the web and RSS feeds. WordPress automatically writes pages in XHTML.

I suppose I should rip off of my sleeve the badge of honor from my home-grown blogging software. But I like to see the switch to WordPress as just another step in the continual improvement of this blog, and look forward to many more years of writing in this space.

Social and Semantic Computing for Historical Scholarship

Under the assumption that many readers of this blog don’t receive the American Historical Association’s magazine Perspectives, you might be interested in this article I wrote for the May 2007 issue. In the piece I discuss the Zotero project’s connection to several recent trends in computing, and think ahead to what the Zotero server might mean for academic fields like history.

2007 Mellon Awards for Technology Collaboration

The Andrew W. Mellon Foundation has launched the nominating process for the second annual Mellon Awards for Technology Collaboration (MATC). The awards, given by tech luminaries such as Tim Berners-Lee and Vint Cerf, honor not-for-profit organizations for leadership in the collaborative development of open source software tools with particular application to higher education and not-for-profit activities.

NINES Officially Launches

As someone keenly interested in the possibilities of digital scholarship as well as nineteenth-century British and American intellectual history, I’m delighted to hear of the official launch of NINES (Networked Infrastructure for Nineteenth-century Electronic Scholarship), which allows researchers to search, organize, and annotate over 60,000 texts and images. A screencast of how to use Collex, their powerful web application, would be helpful for new users.

Intelligence Analysts and Humanities Scholars

About halfway through the Chicago Colloquium on Digital Humanities and Computer Science last week, the always witty and insightful Martin Mueller humorously interjected: “I will go away from this conference with the knowledge that intelligence analysts and literary scholars are exactly the same.” As the chuckles from the audience died down, the core truth of the joke settled in—for those interested in advancing the still-nascent field of the digital humanities, are academic researchers indeed becoming clones of intelligence analysts by picking up the latter’s digital tools? What exactly is the difference between an intelligence analyst and a scholar who is scanning, sorting, and aggregating information from massive electronic corpora?

Mueller’s remark prods those of us exploring the frontiers of the digital humanities to do a better job describing how our pursuit differs from other fields making use of similar computational means. A good start would be to highlight that while the intelligence analyst sifts through mountains of data looking for patterns, anomalies, and connections that might be (in the euphemistic argot of the military) “actionable” (when policy makers piece together bits of intelligence and decide to take action), the digital humanities scholar should be looking for patterns, anomalies, and connections that strengthen or weaken existing theories in their field, or produce new theories. In other words, we not only uncover evidence, but come to overarching conclusions and make value judgments; we are at once the FBI, the district attorney, the judge, and the jury. (Perhaps the “National Intelligence Estimates” that are the highest form of synthesis in the intelligence community come closest to what academics do.)

The gentle criticism I gave to the Chicago audience at the end of the colloquium was that too many presentations seemed one (important) piece away from completing this interpretive whole. Through extraordinary guile, a series of panelists showed how digital methods can determine the gender of Shakespeare’s interlocutors, show more clearly the repetition of key phrases in Gertrude Stein’s prose, or more clearly map the ideology and interactions of FDR’s advisors during and after Pearl Harbor. But of course the real questions that need to be answered—answers that will make other humanities scholars stand up and take notice of digital methods—are, of course, how the identification of gender reshapes (or reinforces) our views of Shakespeare’s plays, how the use of repetition changes our perspectives on Gertrude Stein’s writings, or how a better understanding of presidential advisors alters our historical narrative of America’s entry into the second World War.

In Chicago, I tried to give this critical, final moment of insight reached through digital means a name—the “John Snow moment”—in honor of the Victorian pharmacist who discovered the cause of cholera by using a novel research tool unfamiliar to traditional medical science. Rather than looking at symptoms or other patient information on a case-by-case basis as a cholera outbreak killed and sickened hundreds of people in London in 1854, Snow instead mapped all incidences of the disease by the street addresses of the patients, thus quickly discovering that the cases clustered around a Soho water pump. The city council removed the water pump’s handle, quickly curtailing the disease and inaugurating a new era of epidemiology. Snow proved that cholera was a waterborne disease. Now that’s actionable intelligence.

What can digital scholars do to reach this level of insight? A key first step, reinforced by my experience in Chicago, is that academics interested in the power of computational methods must work to forge tools that satisfy their interpretive needs rather than simply accepting the tools that are currently available from other domains of knowledge, like intelligence. Ostensibly the Chicago Colloquium was about bringing together computer scientists and humanities scholars to see how we might learn from each other and enable new forms of research in an age of millions of digitized books. But as I noted in my remarks on the closing panel, too often this interaction seemed like a one-way street, with humanities scholars applying existing computer science tools rather than engaging the computer scientists (or programming themselves) to create new tools that would be better suited to their own needs. Hopefully such new tools will lead to more John Snow moments in the humanities in the near future.