Category Archives: Humanities

Digital Journalism and Digital Humanities

I’ve increasingly felt that digital journalism and digital humanities are kindred spirits, and that more commerce between the two could be mutually beneficial. That sentiment was confirmed by the extremely positive reaction on Twitter to a brief comment I made on the launch of Knight-Mozilla OpenNews, including from Jon Christensen (of the Bill Lane Center for the American West at Stanford, and formerly a journalist), Shana Kimball (MPublishing, University of Michigan), Tim Carmody (Wired), and Jenna Wortham (New York Times).

Here’s an outline of some of the main areas where digital journalism and digital humanities could profitably collaborate. It’s remarkable, upon reflection, how much overlap there now is, and I suspect these areas will only grow in common importance.

1) Big data, and the best ways to scan and visualize it. All of us are facing either present-day or historical archives of almost unimaginable abundance, and we need sophisticated methods for finding trends, anomalies, and specific documents that could use additional attention. We also require robust ways of presenting this data to audiences to convey theses and supplement narratives.

2) How to involve the public in our work. If confronted by big data, how and when should we use crowdsourcing, and through which mechanisms? Are there areas where pro-am work is especially effective, and how can we heighten its advantages while diminishing its disadvantages? Since we both do work on the open web rather than in the cloistered realms of the ivory tower, what are we to make of the sometimes helpful, sometimes rocky interactions with the public?

3) The narrative plus the archive. Journalists are now writing articles that link to or embed primary sources (e.g., using DocumentCloud). Scholars are now writing articles that link to or embed primary sources (e.g., using Omeka). Formerly hidden sources are now far more accessible to the reader.

4) Software developers and other technologists are our partners. No longer relegated to secondary status as “the techies who make the websites,” we need to work intellectually and practically with those who understand how digital media and technology can advance our agenda and our content. For scholars, this also extends to technologically sophisticated librarians, archivists, and museum professionals. Moreover, the line between developer and journalist/scholar is already blurring, and will blur further.

5) Platforms and infrastructure. We care a great deal about common platforms, ranging from web and data standards, to open source software, to content management systems such as WordPress and Drupal. Developers we work with can create platforms with entirely novel functionality for news and scholarship.

6) Common tools. We are all writers and researchers. When the New York Times produces a WordPress plugin for editing, it affects academics looking to use WordPress as a scholarly communication platform. When our center updates Zotero, it affects many journalists who use that software for organizing their digital research.

7) A convergence of length. I’m convinced that something interesting and important is happening at the confluence of long-form journalism (say, 5,000 words or more) and short-form scholarship (ranging from long blog posts to Kindle Singles geared toward popular audiences). It doesn’t hurt that many journalists writing at this length could very well have been academics in a parallel universe, and vice versa. The prevalence of high-quality writing that is smart and accessible has never been greater.

This list is undoubtedly not comprehensive; please add your thoughts about additional common areas in the comments. It may be worth devoting substantial time to increasing the dialogue between digital journalists and digital humanists at the next THATCamp Prime, or perhaps at a special THATCamp focused on the topic. Let me know if you’re interested. And more soon in this space.

Defining Digital Humanities, Briefly

I’m participating in the Day of Digital Humanities this year, and the organizers have asked all participants to briefly define “digital humanities.” It’s a helpful exercise, and for those new to the field, it might be useful to give the many responses a quick scan. I wrote this one-sentence answer out fairly hastily, but think it’s not so bad:

Broadly construed, digital humanities is the use of digital media and technology to advance the full range of thought and practice in the humanities, from the creation of scholarly resources, to research on those resources, to the communication of results to colleagues and students.

The best answer to “How do you define digital humanities?” came from Lou Burnard: “With extreme reluctance.”

Video: The Ivory Tower and the Open Web

Here’s the video of my plenary talk “The Ivory Tower and the Open Web,” given at the Coalition for Networked Information meeting in Washington in December, 2010. A general description of the talk:

The web is now over twenty years old, and there is no doubt that the academy has taken advantage of its tremendous potential for disseminating resources and scholarship. But a full accounting of the academic approach to the web shows that compared to the innovative vernacular forms that have flourished over the past two decades, we have been relatively meek in our use of the medium, often preferring to impose traditional ivory tower genres on the web rather than import the open web’s most successful models. For instance, we would rather digitize the journal we know than explore how blogs and social media might supplement or change our scholarly research and communication. What might happen if we reversed that flow and more wholeheartedly embraced the genres of the open web?

I hope the audience for this blog finds it worthy viewing. I enjoyed talking about burrito websites, Layer Tennis, aggregation and curation services, blog networks, Aaron Sorkin’s touchiness, scholarly uses of Twitter, and many other high- and low-brow topics all in one hour. (For some details in the images I put up on the screen, you might want to follow along with this PDF of the slides.) I’ll be expanding on the ideas in this talk in an upcoming book with the same title.

Searching for the Victorians

[A rough transcript of my keynote at the Victorians Institute Conference, held at the University of Virginia on October 1-3, 2010. The conference had the theme "By the Numbers." Attended by "analog" Victorianists as well as some budding digital humanists, I was delighted by the incredibly energetic reaction to this talk—many terrific questions and ideas for doing scholarly text mining from those who may have never considered it before. The talk incorporates work on historical text mining under an NEH grant, as well as the first results of a grant that Fred Gibbs and I were awarded from Google to mine their vast collection of books.]

Why did the Victorians look to mathematics to achieve certainty, and how we might understand the Victorians better with the mathematical methods they bequeathed to us? I want to relate the Victorian debate about the foundations of our knowledge to a debate that we are likely to have in the coming decade, a debate about how we know the past and how we look at the written record that I suspect will be of interest to literary scholars and historians alike. It is a philosophical debate about idealism, empiricism, induction, and deduction, but also a practical discussion about the methodologies we have used for generations in the academy.

Victorians and the Search for Truth

Let me start, however, with the Heavens. This is Neptune. It was seen for the first time through a telescope in 1846.

At the time, the discovery was hailed as a feat of pure mathematics, since two mathematicians, one from France, Urbain Le Verrier, and one from England, John Couch Adams, had independently calculated Neptune’s position using mathematical formulas. There were dozens of poems written about the discovery, hailing the way these mathematicians had, like “magicians” or “prophets,” divined the Truth (often written with a capital T) about Neptune.

But in the less-triumphal aftermath of the discovery, it could also be seen as a case of the impact of cold calculation and the power of a good data set. Although pure mathematics, to be sure, were involved—the equations of geometry and gravity—the necessary inputs were countless observations of other heavenly bodies, especially precise observations of perturbations in the orbit of Uranus caused by Neptune. It was intellectual work, but intellectual work informed by a significant amount of data.

The Victorian era saw tremendous advances in both pure and applied mathematics. Both were involved in the discovery of Neptune: the pure mathematics of the ellipse and of gravitational pull; the computational modes of plugging observed coordinates into algebraic and geometrical formulas.

Although often grouped together under the banner of “mathematics,” the techniques and attitudes of pure and applied forms diverged significantly in the nineteenth century. By the end of the century, pure mathematics and its associated realm of symbolic logic had become so abstract and removed from what the general public saw as math—that is, numbers and geometric shapes—that Bertrand Russell could famously conclude in 1901 (in a Seinfeldian moment) that mathematics was a science about nothing. It was a set of signs and operations completely divorced from the real world.

Meanwhile, the early calculating machines that would lead to modern computers were proliferating, prodded by the rise of modern bureaucracy and capitalism. Modern statistics arrived, with its very unpure notions of good-enough averages and confidence levels.

The Victorians thus experienced the very modern tension between pure and applied knowledge, art and craft. They were incredibly self-reflective about the foundations of their knowledge. Victorian mathematicians were often philosophers of mathematics as much as practitioners of it. They repeatedly asked themselves: How could they know truth through mathematics? Similarly, as Meegan Kennedy has shown, in putting patient data into tabular form for the first time—thus enabling the discernment of patterns in treatment—Victorian doctors began wrestling with whether their discipline should be data-driven or should remain subject to the “genius” of the individual doctor.

Two mathematicians I studied for Equations from God used their work in mathematical logic to assail the human propensity to come to conclusions using faulty reasoning or a small number of examples, or by an appeal to interpretive genius. George Boole (1815-1864), the humble father of the logic that is at the heart of our computers, was the first professor of mathematics at Queen’s College, Cork. He had the misfortune of arriving in Cork (from Lincoln, England) on the eve of the famine and increasing sectarian conflict and nationalism.

Boole spend the rest of his life trying to find a way to rise above the conflict he saw all around him. He saw his revolutionary mathematical logic as a way to dispassionately analyze arguments and evidence. His seminal work, The Laws of Thought, is as much a work of literary criticism as it is of mathematics. In it, Boole deconstructs texts to find the truth using symbolical modes.

The stained-glass window in Lincoln Cathedral honoring Boole includes the biblical story of Samuel, which the mathematician enjoyed. It’s a telling expression of Boole’s worry about how we come to know Truth. Samuel hears the voice of God three times, but each time cannot definitively understand what he is hearing. In his humility, he wishes not to jump to divine conclusions.

Not jumping to conclusions based on limited experience was also a strong theme in the work of Augustus De Morgan (1806-1871). De Morgan, co-discoverer of symbolic logic and the first professor of mathematics at University College London, had a similar outlook to Boole’s, but a much more abrasive personality. He rather enjoyed proving people wrong, and also loved to talk about how quickly human beings leap to opinions.

De Morgan would give this hypothetical: “Put it to the first comer, what he thinks on the question whether there be volcanoes on the unseen side of the moon larger than those on our side. The odds are, that though he has never thought of the question, he has a pretty stiff opinion in three seconds.” Human nature, De Morgan thought, was too inclined to make mountains out of molehills, conclusions from scant or no evidence. He put everyone on notice that their deeply held opinions or interpretations were subject to verification by the power of logic and mathematics.

As Walter Houghton highlighted in his reading of the Victorian canon, The Victorian Frame of Mind, 1830-1870, the Victorians were truth-seekers and skeptics. They asked how they could know better, and challenged their own assumptions.

Foundations of Our Own Knowledge

This attitude seems healthy to me as we present-day scholars add digital methods of research to our purely analog ones. Many humanities scholars have been satisfied, perhaps unconsciously, with the use of a limited number of cases or examples to prove a thesis. Shouldn’t we ask, like the Victorians, what can we do to be most certain about a theory or interpretation? If we use intuition based on close reading, for instance, is that enough?

Should we be worrying that our scholarship might be anecdotally correct but comprehensively wrong? Is 1 or 10 or 100 or 1000 books an adequate sample to know the Victorians? What we might do with all of Victorian literature—not a sample, or a few canonical texts, as in Houghton’s work, but all of it.

These questions were foremost in my mind as Fred Gibbs and I began work on our Google digital humanities grant that is attempting to apply text mining to our understanding of the Victorian age. If Boole and De Morgan were here today, how acceptable would our normal modes of literary and historical interpretation be to them?

As Victorianists, we are rapidly approaching the time when we have access—including, perhaps, computational access—to the full texts not of thousands of Victorian books, or hundreds of thousands, but virtually all books published in the Victorian age. Projects like Google Books, the Internet Archive’s OpenLibrary, and HathiTrust will become increasingly important to our work.

If we were to look at all of these books using the computational methods that originated in the Victorian age, what would they tell us? And would that analysis be somehow more “true” than looking at a small subset of literature, the books we all have read that have often been used as representative of the Victorian whole, or, if not entirely representative, at least indicative of some deeper Truth?

Fred and I have received back from Google a first batch of data. This first run is limited just to words in the titles of books, but even so is rather suggestive of the work that can now be done. This data covers the 1,681,161 books that were published in English in the UK in the long nineteenth century, 1789-1914. We have  normalized the data in many ways, and for the most part the charts I’m about to show you graph the data from zero to one percent of all books published in a year so that they are on the same scale and can be visually compared.

Multiple printings of a book in a single year have been collapsed into one “expression.” (For the library nerds in the audience, the data has been partially FRBRized. One could argue that we should have accepted the accentuation of popular titles that went through many printings in a single year, but editions and printings in subsequent years do count as separate expressions. We did not go up to the level of “work” in the FRBR scale, which would have collapsed all expressions of a book into one data point.)

We plan to do much more; in the pipeline are analyses of the use of words in the full texts (not just titles) of those 1.7 million books, a comprehensive exploration of the use of the Bible throughout the nineteenth century, and more. And more could be be done to further normalize the data, such as accounting for the changing meaning of words over time.


So what does the data look like even at this early stage? And does it seem valid? That is where we began our analysis, with graphs of the percent of all books published with certain words in the titles (y-axis) on a year by year basis (x-axis). Victorian intellectual life as it is portrayed in this data set is in many respects consistent with what we already know.

The frequency chart of books with the word in “revolution” in the title, for example, shows spikes where it should, around the French Revolution and the revolutions of 1848. (Keen-eyed observers will also note spikes for a minor, failed revolt in England in 1817 and the successful 1830 revolution in France.)

Books about science increase as they should, though with some interesting leveling off in the late Victorian period. (We are aware that the word “science” changes over this period, becoming more associated with natural science rather than generalized knowledge.)

The rise of factories…

and the concurrent Victorian nostalgia for the more sedate and communal Middle Ages…

…and the sense of modernity, a new phase beyond the medieval organization of society and knowledge that many Britons still felt in the eighteenth century.

The Victorian Crisis of Faith, and Secularization

Even more validation comes from some basic checks of key Victorian themes such as the crisis of faith. These charts are as striking as any portrayal of the secularization that took place in Great Britain in the nineteenth century.

Correlation Is (not) Truth

So it looks fairly good for this methodology. Except, of course, for some obvious pitfalls. Looking at the charts of a hundred words, Fred noticed a striking correlation between the publication of books on “belief,” “atheism,” and…”Aristotle”?

Obviously, we cannot simply take the data at face value. As I have called this on my blog, we have to be on guard for oversimplifications that are the equivalent of saying that War and Peace is about Russia. We have to marry these attempts at what Franco Moretti has called “distant reading” with more traditional close reading to find rigorous interpretations behind the overall trends.

In Search of New Interpretations

Nevertheless, even at this early stage of the Google grant, there are numerous charts that are suggestive of new research that can be done, or that expand on existing research. Correlation can, if we go from the macro level to the micro level, help us to illustrate some key features of the Victorian age better. For instance, the themes of Jeffrey von Arx’s Progress and Pessimism: Religion, Politics and History in Late Nineteenth Century Britain, in which he notes the undercurrent of depression in the second half of the century, are strongly supported and enhanced by the data.

And given the following charts, we can imagine writing much more about the decline of certainty in the Victorian age. “Universal” is probably the most striking graph of our first data set, but they all show telling slides toward relativism that begin before most interpretations in the secondary literature.

Rather than looking for what we expect to find, perhaps we can have the computer show us tens, hundreds, or even thousands of these graphs. Many will confirm what we already know, but some will be strikingly new and unexpected. Many of those may show false correlations or have other problems (such as the changing or multiple meaning of words), but some significant minority of them will reveal to us new patterns, and perhaps be the basis of new interpretations of the Victorian age.

What if I were to give you Victorianists hundreds of these charts?

I believe it is important to keep our eyes open about the power of this technique. At the very least, it can tell us—as Augustus De Morgan would—when we have made mountains out of a molehills. If we do explore this new methodology, we might be able to find some charts that pique our curiosity as knowledgeable readers of the Victorians. We’re the ones that can accurately interpret the computational results.

We can see the rise of the modern work lifestyle…

…or explore the interaction between love and marriage, an important theme in the recent literature.

We can look back at the classics of secondary literature, such as Houghton’s Victorian Frame of Mind, and ask whether those works hold up to the larger scrutiny of virtually all Victorian books, rather than just the limited set of books those authors used. For instance, while in general our initial study supports Houghton’s interpretations, it also shows relatively few books on heroism, a theme Houghton adopts from Thomas Carlyle.

And where is the supposed Victorian obsession with theodicy in this chart on books about “evil”?

Even more suggestive are the contrasts and anomalies. For instance, publications on “Jesus” are relatively static compared to those on “Christ,” which drop from nearly 1 in 60 books in 1843 to less than 1 in 300 books 70 years later.

The impact of the ancient world on the Victorians can be contrasted (albeit with a problematic dual modern/ancient meaning for Rome)…

…as can the Victorians’ varying interest in the afterlife.

I hope that these charts have prodded you to consider the anecdotal versus the comprehensive, and the strengths and weaknesses of each. It is time we had a more serious debate—not just in the digital humanities but in the humanities more generally—about measurement and interpretation that the Victorians had. Can we be so confident in our methods of extrapolating from some literary examples to the universal whole?

This is a debate that we should have in the present, aided by our knowledge of what the Victorians struggled with in the past.

[Image credits (other than graphs): Wikimedia Commons]

NEH’s Office of Digital Humanities

ODH LogoWhat began as a plucky “initiative” has now become a permanent “office.” The National Endowment for the Humanities will announce in a few hours that their Digital Humanities Initiative has now been given a full home, in recognition of how important digital technology and media are for the future of the humanities. The DHI has become the Office of Digital Humanities, with a new website and a new RSS feed for news. From the ODH welcome message:

The Office of Digital Humanities (ODH) is an office within the National Endowment for the Humanities (NEH). Our primary mission is to help coordinate the NEH’s efforts in the area of digital scholarship. As in the sciences, digital technology has changed the way scholars perform their work. It allows new questions to be raised and has radically changed the ways in which materials can be searched, mined, displayed, taught, and analyzed. Technology has also had an enormous impact on how scholarly materials are preserved and accessed, which brings with it many challenging issues related to sustainability, copyright, and authenticity. The ODH works not only with NEH staff and members of the scholarly community, but also facilitates conversations with other funding bodies both in the United States and abroad so that we can work towards meeting these challenges.

Congrats to the NEH for this move forward.

Digital Humanities at the Annual Meetings, Winter 2007-2008

In addition to rising job opportunities, the rise of digital humanities was felt at the annual meetings of professional humanities organizations this winter. The Association for Computers and the Humanities compiled a list of the many sessions with digital humanities talks at the December 2007 Modern Language Association convention; at the American Philosophical Association‘s annual meeting, the APA Committee on Philosophy and Computers coordinated special sessions on “The Ethics of Emerging Technologies” and “Technology in Support of Philosophy Research” (covered in Inside Higher Ed); and the American Historical Association had a number of events at its annual meeting ranging from teaching with new media, to digital archives, to “Tech Tools for Historians” (where yours truly spoke about Zotero to a large and thankfully quite excited crowd). Once again, a nice upward trend.

Symposium on the Future of Scholarly Communication

For those who missed it, between October 12 and 27, 2007, there was a very thoughtful and insightful online discussion of how the publication of scholarship is changing—or trying to change—in the digital age. Participating in the discussion were Ed Felton, David Robinson, Paul DiMaggio, and Andrew Appel from Princeton University (the symposium was hosted by the Center for Information Technology Policy at Princeton), Ira Fuchs of the Mellon Foundation, Peter Suber of the indispensable Open Access News blog (and philosophy professor at Earlham College), Stan Katz, the President Emeritus of the American Council of Learned Societies, and Laura Brown of Ithaka (and formerly the President of Oxford University Press USA).

The symposium is really worth reading from start to finish. (Alas, one of the drawbacks of hosting a symposium on a blog is that it keeps everything in reverse chronological order; it would be great if CITP could flip the posts now that the discussion has ended.) But for those of us in the humanities the most relevant point is that we are going to have a much harder transition to an online model of scholarship than in the sciences. The main reason for this is that for us the highest form of scholarship is the book, whereas in the sciences it is the article, which is far more easily put online, posted in various forms (including as pre- and e-prints), and networked to other articles (through, e.g., citation analysis). In addition, we’re simply not as technologically savvy. As Paul DiMaggio points out, “every computer scientist who received his or her Ph.D. in computer science after 1980 or so has a website” (on which they can post their scholarly production), whereas the number is about 40% for political scientists and I’m sure far less for historians and literature professors.

I’m planning a long post in this space on the possible ways for humanities professors to move from print to open online scholarship; this discussion is great food for thought.

Intelligence Analysts and Humanities Scholars

About halfway through the Chicago Colloquium on Digital Humanities and Computer Science last week, the always witty and insightful Martin Mueller humorously interjected: “I will go away from this conference with the knowledge that intelligence analysts and literary scholars are exactly the same.” As the chuckles from the audience died down, the core truth of the joke settled in—for those interested in advancing the still-nascent field of the digital humanities, are academic researchers indeed becoming clones of intelligence analysts by picking up the latter’s digital tools? What exactly is the difference between an intelligence analyst and a scholar who is scanning, sorting, and aggregating information from massive electronic corpora?

Mueller’s remark prods those of us exploring the frontiers of the digital humanities to do a better job describing how our pursuit differs from other fields making use of similar computational means. A good start would be to highlight that while the intelligence analyst sifts through mountains of data looking for patterns, anomalies, and connections that might be (in the euphemistic argot of the military) “actionable” (when policy makers piece together bits of intelligence and decide to take action), the digital humanities scholar should be looking for patterns, anomalies, and connections that strengthen or weaken existing theories in their field, or produce new theories. In other words, we not only uncover evidence, but come to overarching conclusions and make value judgments; we are at once the FBI, the district attorney, the judge, and the jury. (Perhaps the “National Intelligence Estimates” that are the highest form of synthesis in the intelligence community come closest to what academics do.)

The gentle criticism I gave to the Chicago audience at the end of the colloquium was that too many presentations seemed one (important) piece away from completing this interpretive whole. Through extraordinary guile, a series of panelists showed how digital methods can determine the gender of Shakespeare’s interlocutors, show more clearly the repetition of key phrases in Gertrude Stein’s prose, or more clearly map the ideology and interactions of FDR’s advisors during and after Pearl Harbor. But of course the real questions that need to be answered—answers that will make other humanities scholars stand up and take notice of digital methods—are, of course, how the identification of gender reshapes (or reinforces) our views of Shakespeare’s plays, how the use of repetition changes our perspectives on Gertrude Stein’s writings, or how a better understanding of presidential advisors alters our historical narrative of America’s entry into the second World War.

In Chicago, I tried to give this critical, final moment of insight reached through digital means a name—the “John Snow moment”—in honor of the Victorian pharmacist who discovered the cause of cholera by using a novel research tool unfamiliar to traditional medical science. Rather than looking at symptoms or other patient information on a case-by-case basis as a cholera outbreak killed and sickened hundreds of people in London in 1854, Snow instead mapped all incidences of the disease by the street addresses of the patients, thus quickly discovering that the cases clustered around a Soho water pump. The city council removed the water pump’s handle, quickly curtailing the disease and inaugurating a new era of epidemiology. Snow proved that cholera was a waterborne disease. Now that’s actionable intelligence.

What can digital scholars do to reach this level of insight? A key first step, reinforced by my experience in Chicago, is that academics interested in the power of computational methods must work to forge tools that satisfy their interpretive needs rather than simply accepting the tools that are currently available from other domains of knowledge, like intelligence. Ostensibly the Chicago Colloquium was about bringing together computer scientists and humanities scholars to see how we might learn from each other and enable new forms of research in an age of millions of digitized books. But as I noted in my remarks on the closing panel, too often this interaction seemed like a one-way street, with humanities scholars applying existing computer science tools rather than engaging the computer scientists (or programming themselves) to create new tools that would be better suited to their own needs. Hopefully such new tools will lead to more John Snow moments in the humanities in the near future.