Dan Cohen

Archive for the ‘Software’ Category

Introducing Anthologize

Monday, August 2nd, 2010

A long-running theme of this blog has been the perceived gulf between new forms of online scholarship—including the genre of the blog itself—and traditional forms such as the book and journal. I’m obviously delighted, then, about the outcome of One Week | One Tool, a week-long institute funded by the National Endowment for the Humanities and run by the Center for History and New Media at George Mason University. As the name suggests, twelve humanities scholars with technical chops hunkered down for one week to produce a digital tool they thought could have an impact in the humanities and beyond.

Today marks the launch of this effort: Anthologize, software that converts the popular open-source WordPress system into a full-fledged book-production platform. Using Anthologize, you can take online content such as blogs, feeds, and images (and soon multimedia), and organize it, edit it, and export it into a variety of modern formats that will work on multiple devices. Have a poetry blog? Anthologize it into a nice-looking ePub ebook and distribute it to iPads the world over. A museum with an RSS feed of the best items from your collection? Anthologize it into a coffee table book. Have a group blog on a historical subject? Anthologize the best pieces quarterly into a print or e-journal, or archive it in TEI. Get all the delicious details on the newly revealed Anthologize website.

Anthologize is free and open source software. Obviously in one week it’s impossible to have feature-complete, polished software. There will be a few rough edges. But it works right now (see below) and it’s just the start of a major effort. The grant from NEH anticipates more work for the One Week team over the next year to refine the tool, culminating in a follow-up meeting at THATCamp 2011.

I suspect there will be many users and uses for Anthologize, and developers can extend the software to work in different environments and for different purposes. I see the tool as part of a wave of “reading 2.0″ software that I’ve come to rely on for packaging online content for long-form consumption and distribution, including the Readability browser plugin and Instapaper. This class of software is particularly important for the humanities, which remains very bookish, but it is broadly applicable. Anthologize is flexible enough to handle different genres of writing and content, opening up new possibilities for scholarly communication. Personally, I plan to use Anthologize to run a journal and to edit and write two upcoming books.

Credit for Anthologize goes to the amazing team that produced it: Jason Casden, Boone Gorges, Kathie Gossett, Scott Hanrath, Effie Kapsalis, Doug Knox, Zachary McCune, Julie Meloni, Patrick Murray-John, Steve Ramsay, Patrick Rashleigh, and Jana Remy. It is notable that the One Weekers ranged from a recent college grad to tenured professors, programmers and designers and interface experts who also are humanities scholars, and professionals from libraries, museums, and instructional technology. Remarkably, they first met last Sunday night and had production-ready code by Saturday morning, a website to market and support the software, an outreach plan, and a vision for the future of the software beyond its original state. Not to mention a logo to go on nice-looking swag (personally, I’ll take the book bag).

Credit also goes to the great Center for History and New Media team that instructed and supported the One Weekers in the ways we like to conceive, design, and build digital humanities tools: Sharon Leon, Jeremy Boggs, Sheila Brennan, Trevor Owens, and many others who dropped in to help out. Two huge final credits: one to Tom Scheinfeldt for conceiving and running the structured madness that was One Week | One Tool, and the National Endowment for the Humanities, which took a big risk on a very untraditional institute. We hope they, and others, like the idea and the execution of Anthologize.

And just to give you some idea of what Anthologize can do, here’s the Anthologize ePub version of this blog post on an iPad, created in five minutes:

Shakespeare’s Hard Drive

Monday, August 13th, 2007

Congrats to Matt Kirschenbaum on his thought-provoking article in the Chronicle of Higher Education, “Hamlet.doc? Literature in a Digital Age.” Matt makes two excellent points. First, “born digital” literature presents incredible new opportunities for research, because manuscripts written on computers retain significant metadata and draft tracking that allows for major insights into an author’s thought and writing process. Second, scholars who wish to study such literature in the future need to be proactive in pushing for writing environments, digital standards, and archival storage that will provide accessibility and persistence for these advantages.

Creating a Blog from Scratch, Part 9: The Conclusion

Wednesday, July 25th, 2007

Since its inception until today, this blog was powered by code I had written myself. Some people thought this took a lot of work; to be honest, it was just a few days of simple coding. As I noted at the beginning of this series on “Creating a Blog from Scratch,” rather than using existing software or services, such as WordPress or Blogger, I wanted to write my own blog code so that I could experiment with the form of the blog. In general, I found it to be a great exercise that I would highly recommend. It helped me understand the genre of the blog, challenge long-standing assumptions of form and function (like the tyranny of the calendar, now gone on most blogs), and think about ways one might customize a blog to fit academic needs.

But starting today, this blog will be powered by WordPress, not my own code. Am I a hypocrite? Well, yes and no. Yes, in that by switching to WordPress I have had to abandon some quirks of my original blog that had made it unique and that represented the accumulated wisdom of writing my own code. No, in that I feel I’ve learned enough in the process of the last two years that I can bend WordPress to my will enough to satisfy my need to customize and adapt.

More important, I had other needs that I just didn’t have enough time to implement by writing more of my own code, and there were other features of WordPress–a terrific open-source project–that I really wanted:

  • It took two years, but I’ve decided after initially disparaging comments (sentiments echoed recently by some well-known bloggers), I actually do think they are important to a blog and that my critics were right that the blog suffered without them. So starting today I have comments at the end of each post. (My old posts will remain free of comments since I have left them in their original format.)
  • I had also worried that the blog comments would be a haven for spam, but after the release of the wonderful reCAPTCHA system–which helps the Open Content Alliance transcribe digitized books while preventing spam–I felt that relatively spam-free commenting was possible.
  • As successful open-source software, WordPress has engendered a universe of helpful plugins, modifications, and documentation. For instance, this blog is now Zotero-compatible, thanks to the WordPress COinS plugin by my colleague Sean Takats. And of course reCAPTCHA came with a plugin for WordPress too.
  • WordPress’s system for drafting and editing posts is far more advanced than the basic screens I created. Writing this post is taking me about half the time it would have taken in my old system.
  • For the past six months I have been using ma.gnolia to add small posts to my feed (and to the sidebar of my old blog under “Briefly Noted”). I now can do this just as quickly using WordPress, and plan to post much more frequently starting in September.
  • Despite my best efforts, my old blog code failed to output valid XHTML, which I believe is increasingly important in a world where non-computer devices (such as the iPhone) are browsing the web and RSS feeds. WordPress automatically writes pages in XHTML.

I suppose I should rip off of my sleeve the badge of honor from my home-grown blogging software. But I like to see the switch to WordPress as just another step in the continual improvement of this blog, and look forward to many more years of writing in this space.

Personal WorldCat Lists Now Zotero-Compatible

Thursday, July 5th, 2007

A great example of what I’ve been calling the “fluidity of bibliography.” WorldCat adds a feature that allows registered users to save and share lists of items they find in the WorldCat catalog. We tweak Zotero to work with it. Et voila–easy to find, save, share, grab, and re-share scholarly records.

Nora Project Screencast

Tuesday, June 19th, 2007

The Nora text analysis and visualization project has a screencast out explaining how to use a new web interface to their server-based software.

Social and Semantic Computing for Historical Scholarship

Monday, May 14th, 2007

Under the assumption that many readers of this blog don’t receive the American Historical Association’s magazine Perspectives, you might be interested in this article I wrote for the May 2007 issue. In the piece I discuss the Zotero project’s connection to several recent trends in computing, and think ahead to what the Zotero server might mean for academic fields like history.

2007 Mellon Awards for Technology Collaboration

Wednesday, March 14th, 2007

The Andrew W. Mellon Foundation has launched the nominating process for the second annual Mellon Awards for Technology Collaboration (MATC). The awards, given by tech luminaries such as Tim Berners-Lee and Vint Cerf, honor not-for-profit organizations for leadership in the collaborative development of open source software tools with particular application to higher education and not-for-profit activities.

NINES Officially Launches

Tuesday, February 20th, 2007

As someone keenly interested in the possibilities of digital scholarship as well as nineteenth-century British and American intellectual history, I’m delighted to hear of the official launch of NINES (Networked Infrastructure for Nineteenth-century Electronic Scholarship), which allows researchers to search, organize, and annotate over 60,000 texts and images. A screencast of how to use Collex, their powerful web application, would be helpful for new users.

Intelligence Analysts and Humanities Scholars

Monday, November 13th, 2006

About halfway through the Chicago Colloquium on Digital Humanities and Computer Science last week, the always witty and insightful Martin Mueller humorously interjected: “I will go away from this conference with the knowledge that intelligence analysts and literary scholars are exactly the same.” As the chuckles from the audience died down, the core truth of the joke settled in—for those interested in advancing the still-nascent field of the digital humanities, are academic researchers indeed becoming clones of intelligence analysts by picking up the latter’s digital tools? What exactly is the difference between an intelligence analyst and a scholar who is scanning, sorting, and aggregating information from massive electronic corpora?

Mueller’s remark prods those of us exploring the frontiers of the digital humanities to do a better job describing how our pursuit differs from other fields making use of similar computational means. A good start would be to highlight that while the intelligence analyst sifts through mountains of data looking for patterns, anomalies, and connections that might be (in the euphemistic argot of the military) “actionable” (when policy makers piece together bits of intelligence and decide to take action), the digital humanities scholar should be looking for patterns, anomalies, and connections that strengthen or weaken existing theories in their field, or produce new theories. In other words, we not only uncover evidence, but come to overarching conclusions and make value judgments; we are at once the FBI, the district attorney, the judge, and the jury. (Perhaps the “National Intelligence Estimates” that are the highest form of synthesis in the intelligence community come closest to what academics do.)

The gentle criticism I gave to the Chicago audience at the end of the colloquium was that too many presentations seemed one (important) piece away from completing this interpretive whole. Through extraordinary guile, a series of panelists showed how digital methods can determine the gender of Shakespeare’s interlocutors, show more clearly the repetition of key phrases in Gertrude Stein’s prose, or more clearly map the ideology and interactions of FDR’s advisors during and after Pearl Harbor. But of course the real questions that need to be answered—answers that will make other humanities scholars stand up and take notice of digital methods—are, of course, how the identification of gender reshapes (or reinforces) our views of Shakespeare’s plays, how the use of repetition changes our perspectives on Gertrude Stein’s writings, or how a better understanding of presidential advisors alters our historical narrative of America’s entry into the second World War.

In Chicago, I tried to give this critical, final moment of insight reached through digital means a name—the “John Snow moment”—in honor of the Victorian pharmacist who discovered the cause of cholera by using a novel research tool unfamiliar to traditional medical science. Rather than looking at symptoms or other patient information on a case-by-case basis as a cholera outbreak killed and sickened hundreds of people in London in 1854, Snow instead mapped all incidences of the disease by the street addresses of the patients, thus quickly discovering that the cases clustered around a Soho water pump. The city council removed the water pump’s handle, quickly curtailing the disease and inaugurating a new era of epidemiology. Snow proved that cholera was a waterborne disease. Now that’s actionable intelligence.

What can digital scholars do to reach this level of insight? A key first step, reinforced by my experience in Chicago, is that academics interested in the power of computational methods must work to forge tools that satisfy their interpretive needs rather than simply accepting the tools that are currently available from other domains of knowledge, like intelligence. Ostensibly the Chicago Colloquium was about bringing together computer scientists and humanities scholars to see how we might learn from each other and enable new forms of research in an age of millions of digitized books. But as I noted in my remarks on the closing panel, too often this interaction seemed like a one-way street, with humanities scholars applying existing computer science tools rather than engaging the computer scientists (or programming themselves) to create new tools that would be better suited to their own needs. Hopefully such new tools will lead to more John Snow moments in the humanities in the near future.

Creating a Blog from Scratch, Part 5: What is XHTML, and Why Should I Care?

Thursday, January 5th, 2006

In prior posts in this series (1, 2, 3, and 4), I described with some glee my rash abandonment of common blogging software in favor of writing my own. For my purposes there seemed to be some key disadvantages to these popular packages, including an overemphasis on the calendar (I just saw the definition of a blog at the South by Southwest Interactive Festival—”a page with dated entries”—which, to paraphrase Woody Allen, is like calling War and Peace “a book about Russia”), a sameness to their designs, and comments that are rarely helpful and often filled with spam. But one of the greatest advantages of recent blog software packages is that they generally write standards-compliant code. More specifically, blog software like WordPress automatically produces XHTML. Some of you might be asking, what is XHTML, and who cares? And why would I want to spend a great deal of effort ensuring that this blog complied strictly with this language?

The large digital library contingent that reads this blog could probably enumerate many reasons why XHTML compliance is important, but I had two reasons in mind when I started this blog. (Actually, I had a third, more secretive reason that I’ll mention first: Roy Rosenzweig and I argue in our book Digital History that XHTML will likely be critical for digital humanists to adhere to in the future—don’t want to be accused of being a hypocrite.) For those for whom web acronyms are Greek, XHTML is a sibling of XML, a more rigorously structured and flexible language than the HTML that underlies most of the web. XHTML is better prepared than HTML to be platform-independent; because it separates formatting from content, XHTML (like XML) can be reconfigured easily for very different environments (using, e.g., different style sheets). HTML, with formatting and content inextricably combined, for the most part assumes that you are using a computer screen and a web browser. Theoretically XHTML can be dynamically and instantaneously recast to work on many different devices (including a personal computer). This flexibility is becoming an increasingly important feature as people view websites on a variety of platforms (not just a normal computer screen, e.g., but cell phones or audio browsers for the blind). Indeed, according to the server logs for this blog, 1.6% of visitors are using a smart phone, PDA, or other means to read this blog, a number that will surely grow. In short, XHTML seems better prepared than regular HTML to withstand the technological changes of the coming years, and theoretically should be more easily preserved than older methods of displaying information on the web. For these and other reasons a 2001 report the Smithsonian commissioned recommended the institution move to XHTML from HTML.

Of course, with standards compliance comes extra work. (And extra cost. Just ask webmasters at government agencies trying to make their websites comply with Section 508, the mandatory accessibility rules for federal information resources.) Aside from a brief flirtation with the what-you-see-is-what-you-get, write-the-HTML-for-you program Dreamweaver in the late 1990s, I’ve been composing web pages using a text editor (the superb BBEdit) for over ten years, so my hands are used to typing certain codes in HTML, in the same way you get used to a QWERTY keyboard. XHTML is not that dissimilar from HTML, but it still has enough differences to make life difficult for those used to HTML. You have to remember to close every tag; some attributes related to formating are in strange new locations. One small example of the minor infractions I frequently trip up on writing XHTML: the oft-used break tag to add a line to a web page must “close itself” by adding a slash before the end bracket (not <br>, but <br />). But I figured doing this blog would give me a good incentive to start writing everything in strict XHTML.

Yeah, right. I clearly haven’t been paying enough attention to detail. The page you’re reading likely still has dozens of little coding errors that make it fail strict compliance with the World Wide Web Consortium’s XHTML standard. (If you would like a humbling experience that brings to mind receiving a pop quiz back from your third-grade teacher with lots of red ink on it, try the W3C’s XHTML Validator.) I haven’t had enough time to go back and correct all of those little missing slashes and quotation marks. WordPress users out there can now begin their snickering; their blog software does such mundane things for them, and many proudly (and annoyingly) display little “XHTML 1.0 compliant” badges on their sites. Go ahead, rub it in.

After I realized that it would take serious effort to bring my code up to code, so to speak, I sat back and did the only thing I could do: rationalize. I didn’t really need strict XHTML compliance because through some design slight-of-hand I had already been able to make this blog load well on a wide range of devices. I learned from other blog software that if you put the navigation on the right rather than the more common left you see on most websites, the body of each post shows up first on a PDA or smart phone. It also means that blind visitors don’t have to suffer through a long list of your other posts before getting to the article they want to read.

As far as XHTML is concerned, I’ll be brushing up on that this summer. Unless I move this blog to WordPress by then.

Part 6: One Year Later