Tools – Dan Cohen

Digital Journalism and Digital Humanities

I’ve increasingly felt that digital journalism and digital humanities are kindred spirits, and that more commerce between the two could be mutually beneficial. That sentiment was confirmed by the extremely positive reaction on Twitter to a brief comment I made on the launch of Knight-Mozilla OpenNews, including from Jon Christensen (of the Bill Lane Center for the American West at Stanford, and formerly a journalist), Shana Kimball (MPublishing, University of Michigan), Tim Carmody (Wired), and Jenna Wortham (New York Times).

Here’s an outline of some of the main areas where digital journalism and digital humanities could profitably collaborate. It’s remarkable, upon reflection, how much overlap there now is, and I suspect these areas will only grow in common importance.

1) Big data, and the best ways to scan and visualize it. All of us are facing either present-day or historical archives of almost unimaginable abundance, and we need sophisticated methods for finding trends, anomalies, and specific documents that could use additional attention. We also require robust ways of presenting this data to audiences to convey theses and supplement narratives.

2) How to involve the public in our work. If confronted by big data, how and when should we use crowdsourcing, and through which mechanisms? Are there areas where pro-am work is especially effective, and how can we heighten its advantages while diminishing its disadvantages? Since we both do work on the open web rather than in the cloistered realms of the ivory tower, what are we to make of the sometimes helpful, sometimes rocky interactions with the public?

3) The narrative plus the archive. Journalists are now writing articles that link to or embed primary sources (e.g., using DocumentCloud). Scholars are now writing articles that link to or embed primary sources (e.g., using Omeka). Formerly hidden sources are now far more accessible to the reader.

4) Software developers and other technologists are our partners. No longer relegated to secondary status as “the techies who make the websites,” we need to work intellectually and practically with those who understand how digital media and technology can advance our agenda and our content. For scholars, this also extends to technologically sophisticated librarians, archivists, and museum professionals. Moreover, the line between developer and journalist/scholar is already blurring, and will blur further.

5) Platforms and infrastructure. We care a great deal about common platforms, ranging from web and data standards, to open source software, to content management systems such as WordPress and Drupal. Developers we work with can create platforms with entirely novel functionality for news and scholarship.

6) Common tools. We are all writers and researchers. When the New York Times produces a WordPress plugin for editing, it affects academics looking to use WordPress as a scholarly communication platform. When our center updates Zotero, it affects many journalists who use that software for organizing their digital research.

7) A convergence of length. I’m convinced that something interesting and important is happening at the confluence of long-form journalism (say, 5,000 words or more) and short-form scholarship (ranging from long blog posts to Kindle Singles geared toward popular audiences). It doesn’t hurt that many journalists writing at this length could very well have been academics in a parallel universe, and vice versa. The prevalence of high-quality writing that is smart and accessible has never been greater.

This list is undoubtedly not comprehensive; please add your thoughts about additional common areas in the comments. It may be worth devoting substantial time to increasing the dialogue between digital journalists and digital humanists at the next THATCamp Prime, or perhaps at a special THATCamp focused on the topic. Let me know if you’re interested. And more soon in this space.

February 8, 2012 15 Comments

Using WordPress as a Book-Writing Platform

I’ve had a few people ask about the writing environment I’m using for The Ivory Tower and the Open Web (introduction posted a couple of days ago). I’m writing the book entirely in WordPress, which really has matured into a terrific authoring platform. Some notes:

1) The addition of the TinyMCE WYSIWYG text-editing tools made WordPress today’s version of the beloved Word 5.1, the lean, mean, writing machine that Word used to be before Microsoft bloated it beyond recognition.

2) WordPress 3.2 joined the distraction-free trend mainstreamed by apps like Scrivener and Instapaper, where computer administrative debris (as Edward Tufte once called the layers of eye-catching controls that frame most application windows) fades away. If you go into full-screen mode in the editor everything disappears but your text. WordPress devs even thoughtfully added a zen “Just write” prompt to get you going. Go full-screen in your browser for extra zen.

3) For footnotes, I’m using the excellent WP-Footnotes plugin, which is not only easy to use but (perhaps critically for the future) degrades gracefully into parenthetical embedded citations outside of WordPress.

4) I’m of course using Zotero to insert and format those footnotes, using one of the features that makes Zotero better (IMHO) than other research managers: the ability to drag and drop formatted citations right from the Zotero interface into a textarea in the browser. (WP-Footnotes handles the automatic numbering.)

5) I’ve done a few tweaks to WordPress’s wp-admin CSS to customize the writing environment (there’s an “editorcontainer” that styles the textarea). In particular, I found the default width too wide for comfortable writing or reading. So I resized it to 500 pixels, which is roughly the line width of a standard book.

July 28, 2011 11 Comments

Introducing Anthologize

A long-running theme of this blog has been the perceived gulf between new forms of online scholarship—including the genre of the blog itself—and traditional forms such as the book and journal. I’m obviously delighted, then, about the outcome of One Week | One Tool, a week-long institute funded by the National Endowment for the Humanities and run by the Center for History and New Media at George Mason University. As the name suggests, twelve humanities scholars with technical chops hunkered down for one week to produce a digital tool they thought could have an impact in the humanities and beyond.

Today marks the launch of this effort: Anthologize, software that converts the popular open-source WordPress system into a full-fledged book-production platform. Using Anthologize, you can take online content such as blogs, feeds, and images (and soon multimedia), and organize it, edit it, and export it into a variety of modern formats that will work on multiple devices. Have a poetry blog? Anthologize it into a nice-looking ePub ebook and distribute it to iPads the world over. A museum with an RSS feed of the best items from your collection? Anthologize it into a coffee table book. Have a group blog on a historical subject? Anthologize the best pieces quarterly into a print or e-journal, or archive it in TEI. Get all the delicious details on the newly revealed Anthologize website.

Anthologize is free and open source software. Obviously in one week it’s impossible to have feature-complete, polished software. There will be a few rough edges. But it works right now (see below) and it’s just the start of a major effort. The grant from NEH anticipates more work for the One Week team over the next year to refine the tool, culminating in a follow-up meeting at THATCamp 2011.

I suspect there will be many users and uses for Anthologize, and developers can extend the software to work in different environments and for different purposes. I see the tool as part of a wave of “reading 2.0” software that I’ve come to rely on for packaging online content for long-form consumption and distribution, including the Readability browser plugin and Instapaper. This class of software is particularly important for the humanities, which remains very bookish, but it is broadly applicable. Anthologize is flexible enough to handle different genres of writing and content, opening up new possibilities for scholarly communication. Personally, I plan to use Anthologize to run a journal and to edit and write two upcoming books.

Credit for Anthologize goes to the amazing team that produced it: Jason Casden, Boone Gorges, Kathie Gossett, Scott Hanrath, Effie Kapsalis, Doug Knox, Zachary McCune, Julie Meloni, Patrick Murray-John, Steve Ramsay, Patrick Rashleigh, and Jana Remy. It is notable that the One Weekers ranged from a recent college grad to tenured professors, programmers and designers and interface experts who also are humanities scholars, and professionals from libraries, museums, and instructional technology. Remarkably, they first met last Sunday night and had production-ready code by Saturday morning, a website to market and support the software, an outreach plan, and a vision for the future of the software beyond its original state. Not to mention a logo to go on nice-looking swag (personally, I’ll take the book bag).

Credit also goes to the great Center for History and New Media team that instructed and supported the One Weekers in the ways we like to conceive, design, and build digital humanities tools: Sharon Leon, Jeremy Boggs, Sheila Brennan, Trevor Owens, and many others who dropped in to help out. Two huge final credits: one to Tom Scheinfeldt for conceiving and running the structured madness that was One Week | One Tool, and the National Endowment for the Humanities, which took a big risk on a very untraditional institute. We hope they, and others, like the idea and the execution of Anthologize.

And just to give you some idea of what Anthologize can do, here’s the Anthologize ePub version of this blog post on an iPad, created in five minutes:

August 2, 2010 13 Comments

New Horizons Keynote

For readers of this blog within easy travel distance of Charlottesville, Virginia, I’ll be giving the keynote address on May 19 at the second annual New Horizons conference at the University of Virginia, showcasing technology in teaching, research, and scholarship. My talk is entitled “Creating Scholarly Tools and Resources for the Digital Ecosystem,” and will include much of what I’ve learned in the Zotero project.

May 14, 2008 3 Comments

Vertov Brings Video Annotation to Zotero

From the beginning of the Zotero project, I’ve said that we have bigger fish to fry than citation management, although Zotero does that quite well, thank you very much. (Case in point: Zotero recently beat Endnote, RefWorks, and all of the other big citation managers in head-to-head competition at CiteFest.)

Zotero aims to be a digital research platform, and an extensible one at that. That’s why it’s gratifying and exciting to see the brilliant and incredibly useful Vertov plugin for Zotero. Vertov allows Zotero users to cut video and audio files into clips, annotate the clips, and integrate their annotations with other research sources and notes stored in Zotero. It has terrific functionality and should be ideal for use in the classroom as well as by film scholars and other researchers.

Vertov Screenshot

Congrats and many thanks to Concordia University’s Digital History Lab, led by Elena Razlogova, for conceptualizing and executing this great plugin.

PC Magazine Best Free Software Issue And since it’s been a little while since I’ve done shameless cheerleading for Zotero, it’s humbling to get the recognition from PC Magazine that Zotero has, for the second year in a row, been declared one of the best free software applications.

March 18, 2008 3 Comments

Enhancing Historical Research With Text-Mining and Analysis Tools

Open Book I’m delighted to announce that beginning this summer the Center for History and New Media will undertake a major two-year study of the potential of text-mining tools for historical (and by extension, humanities) scholarship. The project, entitled “Scholarship in the Age of Abundance: Enhancing Historical Research With Text-Mining and Analysis Tools,” has just received generous funding from the National Endowment for the Humanities.

In the last decade the library community and other providers of digital collections have created an incredibly rich digital archive of historical and cultural materials. Yet most scholars have not yet figured out ways to take full advantage of the digitized riches suddenly available on their computers. Indeed, the abundance of digital documents has actually exacerbated the problems of some researchers who now find themselves overwhelmed by the sheer quantity of available material. Meanwhile, some of the most profound insights lurking in these digital corpora remain locked up.

For some time computer scientists have been pursuing text mining as a solution to the problem of abundance, and there have even been a few attempts at bringing text-mining tools to the humanities (such as the MONK project). Yet there is not as much research as one might hope on what non-technically savvy scholars (especially historians) might actually want and use in their research, and how we might integrate sophisticated text analysis into the workflow of these scholars.

We will first conduct a survey of historians to examine closely their use of digital resources and prospect for particularly helpful uses of digital technology. We will then explore three main areas where text mining might help in the research process: locating documents of interest in the sea of texts online; extracting and synthesizing information from these texts; and analyzing large-scale patterns across these texts. A focus group of historians will be used to assess the efficacy of different methods of text mining and analysis in real-world research situations in order to offer recommendations, and even some tools, for the most promising approaches.

In addition to other forms of dissemination, I will of course provide project updates in this space.

[Image credit: Matt Wright]

February 4, 2008 11 Comments

ScholarPress: WordPress Plugins for Education

As if CHNM Creative Lead Jeremy Boggs and web developer Dave Lester don’t have enough to do during the day building fantastic web applications like Omeka, they have somehow managed to create a couple of incredibly useful and highly polished WordPress plugins for academia, and have launched the ScholarPress site as a hub for CHNM work in this fertile area. (Another recent WordPress plugin that academics should take note of is the Institute for the Future of the Book‘s CommentPress.)

ScholarPress’s inaugural plugins are Courseware and WPBook. Courseware (co-developed by New York Public Library’s Josh Greenberg), turns WordPress, normally a blogging platform, into a full-fledged course management system, including easy syllabus creation, assignments, bibliographies, and scheduling. (And yes, you can have a class blog too.) WPBook creates your very own Facebook application out of your WordPress blog, allowing it to be embedded in Facebook.

Want to bring your class right into Facebook, where your students spend most of their time online? Simply combine the two plugins and create a class Facebook app that your students can install. Brilliant.

November 15, 2007 2 Comments

The Strange Dynamics of Technology Adoption and Promotion in Academia

Kudos to Bruce D’Arcus for writing the blog post I’ve been meaning to write for a while. Bruce notes with some amazement the resistance that free and open source projects like Zotero meet when they encounter the institutional buying patterns and tech evangelism that is all too common in academia. The problem here seems to be that the people doing the purchasing of software are not the end users (often the libraries at colleges and universities for reference managers like EndNote or Refworks and the IT departments for course management systems) nor do they have the proper incentives to choose free alternatives.

As Roy Rosenzweig and I noted in Digital History, the exorbitant yearly licensing fee for Blackboard or WebCT (loathed by every professor I know) could be exchanged for an additional assistant professor–or another librarian. But for some reason a certain portion of academic technology purchasers feel they need to buy something for each of these categories (reference managers, CMS), and then, because they have invested the time and money and long-term contracts on those somethings, they feel they need to exclusively promote those tools without listening to the evolving needs and desires of the people they serve. Nor do they have the incentive to try new technologies or tools.

Any suggestions on how to properly align these needs and incentives? Break out the technology spending in students’ bills (“What, my university is spending that much on Blackboard?”)?

November 5, 2007 18 Comments

NYPL Labs Blog

NYPL Labs Logo

Center for History and New Media alum and incredibly innovative digital thinker Josh Greenberg is now the Director of Digital Strategy and Scholarship at the New York Public Library. One of his first actions was to set up the NYPL Labs to produce and test new tools, technologies, and interfaces. It’s great to see they now have a blog that will expose these experiments in action.

November 5, 2007 Add Comment

Understanding reCAPTCHA

One of the things I added to this blog when I moved from my own software to WordPress was the red and yellow box in the comments section, which defends this blog against comment spam by asking commenters to decipher a couple of words. Such challenge-response systems are called CAPTCHAs (a tortured and unmellifluous acroynm of “completely automated public Turing test to tell computers and humans apart”). What really caught my imagination about the CAPTCHA I’m using, called reCAPTCHA, is that it uses words from books scanned by the Internet Archive/Open Content Alliance. Thus at the same time commenters solve the word problems they are effectively serving as human OCR machines.

To date, about two million words have been deciphered using reCAPTCHA (see the article in Technology Review lauding reCAPTCHA’s mastermind, Luis von Ahn), which is a great start but by my calculation (100,000 words per average book) only the equivalent of about 20 books. Of course, it’s really much more than that because the words in reCAPTCHA are the hardest ones to decipher by machine and are sprinkled among thousands of books.

Indeed, that is the true genius of reCAPTCHA—it “tells computers and humans apart” by first using OCR software to find words computers can’t decipher, then feeds those words to humans, who can decipher the words (proving themselves human). Therefore a spammer running OCR software (as many of them do to decipher lesser CAPTCHAs), will have great difficulty cracking it. If you would like an in-depth lesson about how reCAPTCHA (and CAPTCHAs in general) works, take a listen to Steve Gibson’s podcast on the subject.

The brilliance of reCAPTCHA and its simultaneous assistance to the digital commons leads one to ponder: What other aspects of digitization, cataloging, and research could be aided by giving a large, distributed group of humans the bits that computers have great difficulty with?

And imagine the power of this system if all 60 million CAPTCHAs answered daily were reCAPTCHAs instead. Why not convert your blog or login system to reCAPTCHA today?

August 17, 2007 3 Comments