Video: The Ivory Tower and the Open Web

Here’s the video of my plenary talk “The Ivory Tower and the Open Web,” given at the Coalition for Networked Information meeting in Washington in December, 2010. A general description of the talk:

The web is now over twenty years old, and there is no doubt that the academy has taken advantage of its tremendous potential for disseminating resources and scholarship. But a full accounting of the academic approach to the web shows that compared to the innovative vernacular forms that have flourished over the past two decades, we have been relatively meek in our use of the medium, often preferring to impose traditional ivory tower genres on the web rather than import the open web’s most successful models. For instance, we would rather digitize the journal we know than explore how blogs and social media might supplement or change our scholarly research and communication. What might happen if we reversed that flow and more wholeheartedly embraced the genres of the open web?

I hope the audience for this blog finds it worthy viewing. I enjoyed talking about burrito websites, Layer Tennis, aggregation and curation services, blog networks, Aaron Sorkin’s touchiness, scholarly uses of Twitter, and many other high- and low-brow topics all in one hour. (For some details in the images I put up on the screen, you might want to follow along with this PDF of the slides.) I’ll be expanding on the ideas in this talk in an upcoming book with the same title.

The Social Contract of Scholarly Publishing

When Roy Rosenzweig and I finished writing a full draft of our book Digital History, we sat down at a table and looked at the stack of printouts.

“So, what now?” I said to Roy naively. “Couldn’t we just publish what we have on the web with the click of a button? What value does the gap between this stack and the finished product have? Isn’t it 95% done? What’s the last five percent for?”

We stared at the stack some more.

Roy finally broke the silence, explaining the magic of the last stage of scholarly production between the final draft and the published book: “What happens now is the creation of the social contract between the authors and the readers. We agree to spend considerable time ridding the manuscript of minor errors, and the press spends additional time on other corrections and layout, and readers respond to these signals—a lack of typos, nicely formatted footnotes, a bibliography, specialized fonts, and a high-quality physical presentation—by agreeing to give the book a serious read.”

I have frequently replayed that conversation in my mind, wondering about the constitution of this social contract in scholarly publishing, which is deeply related to questions of academic value and reward.

For the ease of conversation, let’s call the two sides of the social contract of scholarly publishing the supply side and the demand side. The supply side is the creation of scholarly works, including writing, peer review, editing, and the form of publication. The demand side is much more elusive—the mental state of the audience that leads them to “buy” what the supply side has produced. In order for the social contract to work, for engaged reading to happen and for credit to be given to the author (or editor of a scholarly collection), both sides need to be aligned properly.

The social contract of the book is profoundly entrenched and powerful—almost mythological—especially in the humanities. As John Updike put it in his diatribe against the digital (and most humanities scholars and tenure committees would still agree), “The printed, bound and paid-for book was—still is, for the moment—more exacting, more demanding, of its producer and consumer both. It is the site of an encounter, in silence, of two minds, one following in the other’s steps but invited to imagine, to argue, to concur on a level of reflection beyond that of personal encounter, with all its merely social conventions, its merciful padding of blather and mutual forgiveness.”

As academic projects have experimented with the web over the past two decades we have seen intense thinking about the supply side. Robust academic work has been reenvisioned in many ways: as topical portals, interactive maps, deep textual databases, new kinds of presses, primary source collections, and even software. Most of these projects strive to reproduce the magic of the traditional social contract of the book, even as they experiment with form.

The demand side, however, has languished. Far fewer efforts have been made to influence the mental state of the scholarly audience. The unspoken assumption is that the reader is more or less unchangeable in this respect, only able to respond to, and validate, works that have the traditional marks of the social contract: having survived a strong filtering process, near-perfect copyediting, the imprimatur of a press.

We need to work much more on the demand side if we want to move the social contract forward into the digital age. Despite Updike’s ode to the book, there are social conventions surrounding print that are worth challenging. Much of the reputational analysis that occurs in the professional humanities relies on cues beyond the scholarly content itself. The act of scanning a CV is an act fraught with these conventions.

Can we change the views of humanities scholars so that they may accept, as some legal scholars already do, the great blog post as being as influential as the great law review article? Can we get humanities faculty, as many tenured economists already do, to publish more in open access journals? Can we accomplish the humanities equivalent of, which provides as good, if not better, in-depth political analysis than most newspapers, earning the grudging respect of journalists and political theorists? Can we get our colleagues to recognize outstanding academic work wherever and however it is published?

I believe that to do so, we may have to think less like humanities scholars and more like social scientists. Behavioral economists know that although the perception of value can come from the intrinsic worth of the good itself (e.g., the quality of a wine, already rather subjective), it is often influenced by many other factors, such as price and packaging (the wine bottle, how the wine is presented for tasting). These elements trigger a reaction based on stereotypes—if it’s expensive and looks well-wrapped, it must be valuable. The book and article have an abundance of these value triggers from generations of use, but we are just beginning to understand equivalent value triggers online—thus the critical importance of web design, and why the logo of a trusted institution or a university press can still matter greatly, even if it appears on a website rather than a book.

Social psychologists have also thought deeply about the potent grip of these idols of our tribe. They are aware of how cultural norms establish and propagate themselves, and tell us how the imposition of limits creates hierarchies of recognition. Thinking in their way, along with the way the web works, one potential solution on the demand side might come not from the scarcity of production, as it did in a print world, but from the scarcity of attention. That is, value will be perceived in any community-accepted process that narrows the seemingly limitless texts to read or websites to view. Curation becomes more important than publication once publication ceases to be limited.

[image credit: Priki]

The Pirate Problem

Jolly Roger FlagLast summer, a few blocks from my house, a new pub opened. Normally this would not be worth noting, except for the fact that this bar is staffed completely by pirates, with eye patches, swords, and even the occasional bird on the shoulder. These are not real pirates, of course, but modern men and women dressed up as pirates. But they wear the pirate garb with no hint of irony or thespian affect whatsoever; these are dedicated, earnest pirates.

At this point I should note that I do not live in Orlando, Florida, or any other place devoted to make-believe, but in a sleepy suburb of Washington, D.C., that is filled with Very Serious Professionals. When the pirate pub opened, the neighborhood VSPs (myself very much included) concluded that it was strange and silly and that it was an incontrovertible fact that no one would patronize the place. Or if they did, it would be as a lark.

We clung to this belief for approximately 24 hours, until, upon a casual stroll by the storefront, we witnessed six pirate-garbed pubgoers outside. Singing sea chanteys. Without sheet music. The tavern has been filled ever since.

Such an experience usefully reminds oneself that there are ways of acting and thinking that we can’t understand or anticipate. Who knew that there was a highly developed pirate subculture, and that it thrived among the throngs of politicos and think-tankers and professors of Washington? Who are these people?

My thoughts turned to pirates during my experience at a workshop at the University of North Carolina at Chapel Hill a week ago, which was devoted to the digitization of the unparalleled Southern Historical Collection, and—in a less obvious way—to thinking about the past and future of humanities scholarship. Dozens of historians came to the workshop to discuss the way in which the SHC, the source of so many books and articles about the South and the home of 16 million archival documents, should be put on the web.

I gave the keynote, which I devoted to prodding the attendees into recognizing that the future of archives and research might not be like the past, and I showed several examples from my work and the work of CHNM that used different ways of searching and analyzing documents that are in digital, rather than analog, forms. Longtime readers of this blog will remember some of the examples, including an updated riff on what a future historian might learn about the state of religion in turn-of-the-century America by data mining our September 11 Digital Archive.

The most memorable response from the audience was from an award-winning historian I know from my graduate school years, who said that during my talk she felt like “a crab being lowered into the warm water of the pot.” Behind the humor was the difficult fact that I was saying that her way of approaching an archive and understanding the past was about to be replaced by techniques that were new, unknown, and slightly scary.

This resistance to thinking in new ways about digital archives and research was reflected in the pre-workshop survey of historians. Extremely tellingly, the historians surveyed wanted the online version of the SHC to be simply a digital reproduction of the physical SHC:

With few exceptions, interviewees believed that the structure of the collection in the virtual space should replicate, not obscure, the arrangement of the physical collection. Thus, navigating a manuscript collection online would mimic the experience of navigating the physical collection, and the virtual document containers—e.g., folders—and digital facsimiles would map clearly back to the physical containers and documents they represent. [Laura Clark Brown and David Silkenat, “Extending the Reach of Southern Sources,” p. 10]

In other words, in the age of Google and advanced search tools and techniques, most historians just want to do their research they way they’ve always done it, by taking one letter out of the box at a time. One historian told of a critical moment in her archival work, when she noticed a single word in a letter that touched off the thought that became her first book.

So in Chapel Hill I was the pirate with the strange garb and ways of behaving, and this is a good lesson for all boosters of digital methods within the humanities. We need to recognize that the digital humanities represent a scary, rule-breaking, swashbuckling movement for many historians and other scholars. We must remember that these scholars have had—for generations and still in today’s graduate schools—a very clear path for how they do their work, publish, and get rewarded. Visit archive; do careful reading; find examples in documents; conceptualize and analyze; write monograph; get tenure.

We threaten all of this. For every time we focus on text mining and pattern recognition, traditionalists can point to the successes of close reading—on the power of a single word. We propose new methods of research when the old ones don’t seem broken. The humanities have an order, and we, mateys, threaten to take that calm ship into unknown waters.

[Image credit: &y.]

Measuring the Audience of a Digital Humanities Project

Karen Motylewski of the Institute of Museum and Library Services recently pressed an audience of recent IMLS grantees to think about how they might measure the success of their digital projects. As she was well aware, academics often bristle at the quantitative measurement of the audience for their websites because it smacks of commercialism. Also, we professors and librarians and curators generally avoid taking classes in such base topics as marketing. But Karen has a point. Indeed, Roy Rosenzweig and I devote an entire chapter in Digital History to how to build an audience—not for commercial or narcissistic reasons, but because an academic digital project should be, as we say, “useful and used.” I started this blog to explain in greater depth some of the projects and research I’m working on in the digital humanities, but I also did it (as readers of my five-part series on “Creating a Blog from Scratch” will know; 1, 2, 3, 4, 5) to learn first-hand about the composition of blogs and the technologies behind them. Writing my own code for this blog forced me to examine in detail—and occasionally rethink—some blogging conventions (technical, design, and content). And one of the benefits of doing so has been a realization that I have significantly underestimated the power of RSS. I now think it may be the best measurement of utility for an academic website, far better than server logs or other quantitative measurements. Let me explain why.

Think of your reading habits—specifically, periodicals. You probably subscribe to a newspaper, a magazine or two (or three), and perhaps some academic or specialist journals. Every time you go to the dentist, you also probably voraciously read all of those salacious magazines and lifestyle handbooks you don’t subscribe to. If you’re in a particularly bad waiting room, you probably read anything that’s lying around, even if you would never buy those magazines at a newstand. As anyone in the magazine or newspaper business will tell you, what they really want is subscribers, not casual, one-time readers. Subscribers have shown a level of interest in, and dedication to, a periodical that is several levels above all other readers.

Now look carefully at web server logs—the trail of a website’s readers. Most visitors to a typical website are like the third type of magazine reader—simply passing through on the way to get their cavities filled. They generally come from search engines, quickly scan a page, and leave, their IP address never to be seen again.

Moreover, up to three-quarters of traffic to most websites is from bots (i.e., Google’s indexing spider)—a machine audience that you probably care little about, except as a way to drive traffic to your site from search requests. On this site in March 2006, the human audience looked at about 10,000 pages; machines requested over 26,000 pages. This doesn’t even take into account “server spam,” which consists of fake requests to your server to make it look like another website is sending a lot of traffic your way. In March, was the number one “referrer” to this blog. Great.

So now we are down to about 10% of the top line number of “visitors” to your website. You are likely getting depressed. But here’s where another point Roy and I make comes into play: “think about community, not numbers of visitors.” That other 10% includes a number of people who love your site and what it has to offer but only visit every once in a while.

Then there are the subscribers. RSS truly provides an online analog to periodical subscriptions; “subscriptions” is a very good word for it since subscribers receive each update automatically. RSS finally allows digital humanities projects to assess how many people are really committed to a site. Notably, this number may or may not follow overall site traffic patterns. For instance, here’s a comparison of server logs for this site with RSS subscriptions:

In the noise of all of the bot traffic and disinterested visitors (top chart; the orange bar represents unique visitors, the dark blue is page views), I’m grateful that subscriptions to this blog (bottom chart) have climbed steadily since its inception four months ago. Should this blog have the enormous traffic of a BoingBoing? No. That’s not why I started it. I’m trying to reach a fairly specific audience that is several orders of magnitude smaller than the big tech/geek audience for BoingBoing. Success means reaching and having a conversation with those people—the people who I believe are doing critical work for the future of education, libraries, and the humanities—not with a mass audience. I hope this site is slowly creeping toward that modest goal. By tracking RSS subscriptions, other digital humanities projects can also see if they’re reaching their envisioned audience.

But how do you use RSS if your site isn’t a blog? If your site is a digital collection or archive, you can add a “news about this site” or “new features/new additions” RSS feed, as we have done for the Hurricane Digital Memory Bank. If your project involves software development, you can put code update announcements into an RSS feed. Even if your site is relatively static, new services such as will send out notifications of site changes to interested parties. Once you have an RSS feed (you should link to it from your home page so that RSS-aware browsers can find it quickly), you can then use services such as Feedburner to track RSS subscriptions more carefully.

With all of its faults and problems, I suspect we will soon be saying, “The server log is dead.” Long live RSS.

The Final Four’s Impact on Websites

I work at George Mason University. Unless you live off the grid (and if so, how are you reading this?), you’ve probably heard that our basketball team is in the Final Four this weekend. There has been a great deal of talk around campus about the impact this astonishing feat will have on the university’s stature and undergraduate admissions. But what about its effect on Mason’s websites? A bit of unscientific evidence from Alexaholic, which creates website traffic graphs using data from’s Alexa web service:

Our domain has gone from being about the 5300th most popular on the web to about 2100th since Mason was selected (controversially) for the tournament on March 12. OK, we’re not exactly in Yahoo territory, but we’ve bypassed dozens of other universities in our steep two-week climb.

Search Engine Optimization for Smarties

A Google search for “Sputnik” gives you an authoritative site from NASA in the top ten search results, but also a web page from the skydiver and ballroom-dancing enthusiast Michael Wright. This wildly democratic mix of sources perennially leads some educators to wring their hands about the state of knowledge, as yet another op-ed piece in the New York Times does today (“Searching for Dummies” by Edward Tenner). It’s a strange moment for the Times to publish this kind of lament; it seems like an op-ed left over from 1997, and as I’ve previously written in this space (and elsewhere with Roy Rosenzweig), contrary to Tenner’s one example of searching in vain for “World History,” online historical information is actually getting better, not worse (especially if you assess the web as a whole rather than complain about a few top search results). Anyway, Tenner does make one very good point: “More owners of free high-quality content should learn the tradecraft of tweaking their sites to improve search engine rankings.” This “tradecraft” is generally called “search engine optimization,” and I’ve long thought I should let those in academia (and other creators of reliable, noncommercial digital resources) in on the not-so-secret ways you can move your website higher up in the Google rankings (as well as in the rankings of other search engines).

1. Start with an appropriate domain name. Ideally, your domain should contain the top keywords you expect people searching for your topic to type into Google. At CHNM we love the name “Echo” for our history of science website, but we probably should have made the URL rather than Professors like to name digital projects something esoteric or poetic, preferably in Greek or Latin. That’s fine. But make the URL something more meaningful (and yes, more prosaic, if necessary) for search engines. If you read Google’s Web Search API documentation, you’ll realize that their spider can actually parse domain names for keywords, even if you run these words together.

2. If you’ve already launched your website, don’t change its address if it already has a lot of links to it. “Inbound” links are the currency of Google rankings. (You can check on how many links there are to your site by typing “link:[your domain name here]” into Google.) We can’t change Echo’s address now, because it’s already got hundreds of links to it, and those links count for a lot. (Despite the poetic name, we’re in the top ten for “history of science.”) There are some fancy ways to “redirect” sites from an old domain to a new one, but it’s tricky.

3. Get as many links to your site as you can from high-quality, established, prominent websites. Here’s where academics and those working in museums and libraries are at an advantage. You probably already have access to some very high-ranking, respected sites. Work at the Smithsonian or the Library of Congress? Want an extremely high-ranking website on any topic? Simply link to the new website (appropriately named, of course) from the home page of your main site (the home page is generally the best page to get a link from). Wait a month or two and you’re done, because and wield enormous power in Google’s mathematical ranking system. A related point is…

4. Ask other sites to link to your site using the keywords you want. If you have a site on the Civil War, a bad link is one that says, “Like the Civil War? Check out this site.” A helpful link is one that says, “This is a great site on the Civil War.” If you use the Google Sitemap service, it will tell you what the most popular keywords are in links to your site.

5. Include keywords in file names and directory names across your site, and don’t skimp on the letters. This point is similar to #1, only for subtopics and pages on your site. Have a bibliography of Civil War books? Name the file “civilwarbibliography.html” rather than just “biblio.html” or some nonsense letters or numbers.

6. Speaking of nonsense letters and numbers, if your site is database-driven, recast ungainly numbers and letters in the URL (known in geek-speak as the “query string”), e.g., change to Have someone who knows how to do “URL rewriting” change those URLs to readable strings (if you use the Apache web server software, as 70% of sites do, the software that does this is called “mod_rewrite”; it still keeps those numbers and letters in memory, but doesn’t let the human or machine audiences see them).

7. Be very careful about hiring someone to optimize your site, and don’t do anything shifty like putting white text with your keywords on a white background. Read Google’s warning about search engine optimization and shady methods and their propensity to ban sites for subterfuge.

8. Don’t bother with metatags. Google and other search engines don’t care about these old, hidden HTML tags that were supposed to tell search engines what a web page was about.

9. Be patient. For most sites, it’s a slow rise to the top, accumulating links, awareness in the real world and on the web, etc. Moreover, there is definitely a first-mover advantage—being highly ranked creates a virtuous circle, because by being in the top ten, other sites link to your site because they find it more easily than others. Thus Michael Wright’s page on Sputnik, which is nine years old, remains stubbornly in the top ten. But one of the advantages a lot of academic and nonprofit sites have over the Michael Wrights of the world is that we’re part of institutions that are in it for the long run (and don’t have ballroom dancing classes). I’m more sanguine than Edward Teller that in near future, great sites, many of them from academia, will rise to the top, and be found by all of those Google-centric students the educators worry about.

But these sites (and their producers) could use a little push. Hope this helps.

(You might also want to read the chapter Roy and I wrote on building an audience for your website in Digital History, especially the section that includes a discussion of how Google works, as well as another section of the book on “Site Structure and Good URLs.”)

First Impressions of Amazon Connect

Having already succumbed to the siren’s song that prodded me narcissistically to create a blog, I had very little resistance left when emailed me to ask if I might like to join the beta of program that allows authors to reach potential buyers and existing owners of their books by writing blog-like posts. Called “Amazon Connect,” this service will soon be made available to the authors of all of the books available for purchase on Amazon. Here are some notes about my experience joining the program (and how you can join if you’re an author), some thoughts about what Amazon Connect might be able to do, and some insider information about their upcoming launch.

First, the inside scoop. As far as I can tell, Amazon Connect began around Thanksgiving 2005 with a pilot that enlisted about a dozen authors. It has been slowly expanding since then but is still in beta, and a quiet beta at that. It’s unlikely you’ve seen an Amazon Connect section on one of their web pages. However, I recently learned from the Amazon Connect team that in early February the service will have its official launch, with a big publicity push.

After that point, each post an author makes will appear on the page for his or her book(s). I found out by writing a post of my own that his feature is actually already enabled, as you can see by looking at the page for Digital History (scroll down the page a bit to see my post).

But the launch will also entail a much more significant change—to the home page of itself, which is of course individualized for each user. Starting in February, on the home page of every Amazon user who has purchased your book(s), your posts will show up immediately. Since it’s unlikely that a purchaser of a book will return to that book’s buy page, this appearance on the Amazon home page is important: Authors will effectively gain the ability to send messages to a sizable number of their readers.

Since generally it has been impossible to compile a decent contact list for those who buy a specific book (unless you’re in the NSA or CIA), Amazon’s idea is intriguing. While Amazon Connect is clearly intended to sell more books, and the writing style they advocate less than academic (“a conversational, first-person tone”), it’s remarkable to think that the author of a scholarly monograph might be able to reach a good portion of their audience this way. Indeed, I suspect that for authors of academic press books that might not sell hundreds of thousands of copies, the proportion of buyers of their book that use Amazon is much higher than for popular books (since those books are sold in a higher percentage at physical Barnes & Noble and Borders stores, and increasingly at Costco and Wal-Mart). Could Amazon Connect foster smaller communities of authors and readers, for more esoteric topics?

If you are an author and would like to join the Amazon Connect beta in time for the February launch, here’s what you need to do:

1) First, you must have an Amazon account. If you already have one, go to the special Amazon Connect website, login, and claim your book(s) using the “Register Your Bibliography” link. This involves listing the contact info for your publisher, editor, publicist, or other third party that can verify that you are actually the author of the book(s) you list. About a week later you’ll get an email confirming that you have been verified.

2) Create a profile. You are required to upload a photo, write a short biography, and provide some other information about yourself (such as your email address) that you can choose to share with your audience (I didn’t fill a lot of this out, such as my favorite movies).

3) Once you’ve been added to the system, you can start writing posts. Good luck saying hello to your readers, and remember Amazon Connect rule #5: “No boring content”!