<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dan Cohen's Digital Humanities Blog &#187; Archives</title>
	<atom:link href="http://www.dancohen.org/category/archives/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dancohen.org</link>
	<description>Covering the intersection of digital technology and research, teaching, and learning in the humanities, including search, data mining, website development and design, and programming.</description>
	<lastBuildDate>Fri, 28 May 2010 01:34:15 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Digital Ephemera and the Calculus of Importance</title>
		<link>http://www.dancohen.org/2010/05/17/digital-ephemera-and-the-calculus-of-importance/</link>
		<comments>http://www.dancohen.org/2010/05/17/digital-ephemera-and-the-calculus-of-importance/#comments</comments>
		<pubDate>Mon, 17 May 2010 15:04:58 +0000</pubDate>
		<dc:creator>Dan Cohen</dc:creator>
				<category><![CDATA[Archives]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.dancohen.org/?p=880</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Digital+Ephemera+and+the+Calculus+of+Importance&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Twitter&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2010-05-17&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2010/05/17/digital-ephemera-and-the-calculus-of-importance/&amp;rft.language=English"></span>
[Thoughts prompted by an invitation to write a piece on the significance of "Notes, Lists, and Everyday Inscriptions" for The New Everyday, an innovative experiment in web publishing sponsored by MediaCommons. Since the editors of this edition of The New Everyday asked for something out of the ordinary for their curated collection, I thought it [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Digital+Ephemera+and+the+Calculus+of+Importance&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Twitter&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2010-05-17&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2010/05/17/digital-ephemera-and-the-calculus-of-importance/&amp;rft.language=English"></span>
<p>[<em>Thoughts prompted by an invitation to write a piece on the significance of "Notes, Lists, and Everyday Inscriptions" for </em><a href="http://mediacommons.futureofthebook.org/the-new-everyday/about">The New Everyday</a><em>, an innovative experiment in web publishing sponsored by <a href="http://mediacommons.futureofthebook.org/">MediaCommons</a>. Since the editors of this edition of </em>The New Everyday<em> asked for something out of the ordinary for their curated collection, I thought it was time to unveil my Gladwell-esque theory of how criminal profiling and archival priorities share a mathematical foundation.</em>]</p>
<p>How important are small written ephemera such as notes, especially now that we create an almost incalculable number of them on digital services such as Twitter? Ever since the Library of Congress surprised many with <a href="http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/">its announcement</a> that it would accession the billions of public tweets since 2006, the subject has been one of significant debate. Critics lamented what they felt was a lowering of standards by the library—a trendy, presentist diversion from its national mission of saving historically valuable knowledge. In their minds, Twitter is a mass of worthless and mundane musings by the unimportant, and thus obviously unworthy of an archivist&#8217;s attention. The humorist Andy Borowitz summarized this cultural critique in a mocking headline: &#8220;<a href="http://twitter.com/BorowitzReport/status/12180322899">Library of  Congress to Acquire Entire Twitter Archive; Will Rename Itself &#8216;Museum  of Crap.&#8217;</a>&#8221;</p>
<p>Few readers of this blog will be surprised to find that I take a rather different view of the matter. How could we not want to preserve a vast record of everyday life and thoughts from tens of millions of people, however mundane? (For more on my views of the  Twitter/Library of Congress debate, and  to inflate my ego, please  consult articles from the <em><a href="http://www.nytimes.com/2010/05/02/business/02digi.html">New   York Times</a></em>, the <em><a href="http://www.washingtonpost.com/wp-dyn/content/article/2010/05/05/AR2010050505309.html">Washington   Post</a></em>, and <em><a href="http://www.slate.com/id/2251429">Slate</a></em>.)</p>
<p>As any practicing historian knows, some of the most critical collections of primary sources are ephemera that someone luckily saved for the future. For example, historians of the English Civil War are deeply thankful that <span>Humphrey Bartholomew had the presence of mind to save 50,000 pamphlets (once considered throwaway pieces of hack writing) from the seventeenth century and give them to a library at Oxford. </span>Similarly, I recently discovered during a behind-the-scenes tour of the Cambridge University Library that the library&#8217;s off-limits tower, long rumored by undergraduates to be filled with pornography, is actually stocked with old genre fiction such as Edwardian spy novels. (See photographic evidence, below.) Undoubtedly the librarians of 1900 were embarrassed by the stuff; today, social historians and literary scholars can rejoice that they didn&#8217;t throw these cheap volumes out. As I have argued in this space, <em>scholars have uses for archives  that archivists cannot anticipate</em>.</p>
<p><img class="alignnone size-full wp-image-891" title="genre_fiction_cambridge_library" src="http://www.dancohen.org/wp/wp-content/uploads/2010/05/genre_fiction_cambridge_library.jpg" border="0" alt="" width="500" /></p>
<p>But let me set aside for a moment my optimistic disposition about the Twitter archive and instead meet the critics halfway. Suppose that we really don&#8217;t know if the archive will be useful or not—or worse, perhaps we are relatively sure it will be utterly worthless. Does that necessarily mean that the Library or Congress should not have accessioned it? I was thinking about this fair-minded version of the &#8220;What to save?&#8221; conundrum recently when I remembered a penetrating article about criminal profiling, which, of all things, helpfully reveals the correct calculus about the importance of digital ephemera such as tweets.</p>
<p style="text-align: center;">* * *</p>
<p>The act of stopping certain air travelers for additional checks—to give them more costly attention—is a difficult task riven by conflicting theories of whom to check and (as mathematicians know) associated search algorithms. Do utterly random checks work best? Should the extra searches focus on certain groups or certain bits of information (one-way tickets, cash purchases)? Many on the right (which is also home, I suspect, to many of the critics who scoff at the Twitter archive) believe in strong profiling—that is, spending nearly the entire budget and time of the Transportation Security Administration profiling Middle Easterners and Muslims. Many on the left counter that this strong profiling leads to insidious  stereotyping.</p>
<p>A more powerful critique of strong profiling was advanced last year by the computational statistician <a href="http://www.nr.com/whp/">William Press</a> in &#8220;<a href="http://dx.doi.org/10.1073/pnas.0813202106">Strong Profiling is Not Mathematically Optimal  for Discovering Rare Malfeasors</a>&#8221; (Proceedings of the National Academy of Sciences, 2009). Press acknowledges that the issue of profiling (whether for terrorists at the airport or for criminals in a traffic stop) has enormous social and political implications. But he seeks to answer a more basic question: does strong profiling actually work? Or is there a more optimal mathematical formula for spending scarce time and resources to achieve the desired outcome?</p>
<p>Press examines two idealized mathematical cases. The first, the &#8220;authoritarian&#8221; strategy, assumes that we have perfect surveillance of society and precisely know the odds that someone will be a criminal (and thus worthy of additional screening). The second, the &#8220;democratic&#8221; strategy, assumes that our knowledge of people is messy and incomplete. In that case of imperfect information the mathematics is much more complex, because we can&#8217;t assign a reliable probability of criminality to each person and then give them security attention at an intensity commensurate to that value. It turns out that in the democratic case, the fuzzier mathematics strongly suggest a broader range of attention.</p>
<p>Moreover, even beyond the obvious fact that that the democratic model is closest to real life, <em>the democratic algorithm for profiling is better than the authoritarian model, even if that state of omnipotent knowledge was achievable</em>. Even if we had Minority Report-style knowledge, or even if we believed that the universe of potential criminals was entirely a subset of a particular group, it would be unwise to fully rely on this knowledge. To do so would lead to &#8220;oversampling,&#8221; an inefficient overemphasis on particular individuals. Of course we should pay attention to those with the maximum probability of being a criminal. But we also have to mix into our algorithm some attention to those who are seemingly innocent to achieve the best outcome—to stop the most crimes.</p>
<p>Through some mathematics we need not get into here, Press concludes that the optimal formula for paying attention to subjects is to avoid using the straight probability that each person is a criminal and instead use the square root of that value. For instance, if you feel Person A is 100 times more likely to be a terrorist than Person B, you should spend 10 times, not 100 times, the resources on Person A over Person B. Moreover, as our certainty about potential suspects decreases, the democratic sampling model <a href="http://www.pnas.org/content/106/6/1716/F1.expansion.html">becomes increasingly more efficient</a> compared to the authoritarian model.</p>
<p>Although couched in the language of crime prevention, what Press is really talking about is <em>the calculus of importance</em>. As Press himself notes, &#8220;The idea of sampling by square-root   probabilities is quite general and  can have many other applications.&#8221;</p>
<p style="text-align: center;">* * *</p>
<p>As it turns out, the calculus of importance is the same for the Transportation Security Administration and for the Library of Congress. Press&#8217;s conclusions apply directly to the archivist&#8217;s dilemma of  how to spend limited resources on saving objects in a digital age. The  criminals in our library scenario are people or documents likely to be important  to future researchers; innocents are those whom future historians will  find uninteresting. Additional screening is the act of archiving—that  is, selection for greater attention.</p>
<p>What does this mean for the archiving of digital emphemera such as status updates—those little, seemingly worthless online notes? It means we should continue to expend the majority of resources on those documents and people of most likely future interest, but not to the exclusion of objects and figures that currently seem unimportant.</p>
<p>In other words, if you believe that the notebooks of a known writer are likely to be 100 times more important to future historians and researchers than the blog of a nobody, you should spend 10, not 100, times the resources in preserving those notebooks over the blog. It&#8217;s still a considerable gap, but much less than the traditional (authoritarian) model would suggest. The calculus of importance thus implies that libraries and archives should consciously pursue contents such as those in the Cambridge University Library tower, even if they feel it runs counter to common sense.</p>
<p>So even if the skeptics are right and the Twitter archive is a boondoggle for the Library of Congress, it is the correct kind of bet on the future value of digital ephemera, the equivalent of the TSA spending 10% of their budget to examine more closely threats other than those posed by twentysomething Arabs.</p>
<p>The accessioning of the Twitter archive  by the Library of Congress is  not an expensive affair. Tweets are small  digital objects, and even  billions of them fit on a few cheap drives. Even  with digital asset management, IT labor across time, and  electricity costs, storing billions of tweets is economical, especially compared to the cost of storing physical books.  University of Michigan Librarian Paul Courant <a href="http://shapeofthings.org/papers/PCourant.docx">has  calculated</a> [Word doc] that the present value of the cost to store  a book on  library shelves in perpetuity is about $100 (mostly in physical plant costs). An equivalent electronic text costs just $5.</p>
<p>This vast disparity only serves to reinforce the calculus of importance and archival imperatives of institutions such as the Library of Congress. The library and other keepers of our cultural heritage should be doing much more to save the digital ephemera of our age, no matter what we contemporaries think of these scrawls on the web. You never know when a historian will pan a bit of gold out of that seemingly worthless stream.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dancohen.org/2010/05/17/digital-ephemera-and-the-calculus-of-importance/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Virtual Museum of the Gulag Seized</title>
		<link>http://www.dancohen.org/2008/12/29/virtual-museum-of-the-gulag-seized/</link>
		<comments>http://www.dancohen.org/2008/12/29/virtual-museum-of-the-gulag-seized/#comments</comments>
		<pubDate>Mon, 29 Dec 2008 21:44:42 +0000</pubDate>
		<dc:creator>Dan Cohen</dc:creator>
				<category><![CDATA[Archives]]></category>
		<category><![CDATA[History]]></category>

		<guid isPermaLink="false">http://www.dancohen.org/?p=540</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Virtual+Museum+of+the+Gulag+Seized&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=History&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2008-12-29&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2008/12/29/virtual-museum-of-the-gulag-seized/&amp;rft.language=English"></span>
Depressing and not getting enough notice: masked police recently raided the office of the Russian human rights group Memorial, which has been digitally cataloguing the artifacts and names of those affected by the Soviet Gulag. The police took drives containing biographical information on more than 50,000 victims of Stalinist repression and over 10,000 digital photographs, [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Virtual+Museum+of+the+Gulag+Seized&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=History&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2008-12-29&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2008/12/29/virtual-museum-of-the-gulag-seized/&amp;rft.language=English"></span>
<p>Depressing and not getting enough notice: masked police <a href="http://www.timesonline.co.uk/tol/news/world/europe/article5333440.ece">recently raided</a> the office of the Russian human rights group Memorial, which has been digitally cataloguing the artifacts and names of those affected by the Soviet Gulag. The police took drives containing biographical information on more than 50,000 victims of Stalinist repression and over 10,000 digital photographs, among other unique archival documents. We worked with Memorial on our <a href="http://www.gulaghistory.org">Gulag history project</a>. (Thanks to Elena Razlogova for bringing this to my attention.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dancohen.org/2008/12/29/virtual-museum-of-the-gulag-seized/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Pirate Problem</title>
		<link>http://www.dancohen.org/2008/04/22/the-pirate-problem/</link>
		<comments>http://www.dancohen.org/2008/04/22/the-pirate-problem/#comments</comments>
		<pubDate>Tue, 22 Apr 2008 16:12:00 +0000</pubDate>
		<dc:creator>Dan Cohen</dc:creator>
				<category><![CDATA[Academia]]></category>
		<category><![CDATA[Archives]]></category>
		<category><![CDATA[Audience]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Scholarship]]></category>

		<guid isPermaLink="false">http://www.dancohen.org/?p=284</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The+Pirate+Problem&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Academia&amp;rft.subject=Archives&amp;rft.subject=Audience&amp;rft.subject=Research&amp;rft.subject=Scholarship&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2008-04-22&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2008/04/22/the-pirate-problem/&amp;rft.language=English"></span>
Last summer, a few blocks from my house, a new pub opened. Normally this would not be worth noting, except for the fact that this bar is staffed completely by pirates, with eye patches, swords, and even the occasional bird on the shoulder. These are not real pirates, of course, but modern men and women [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The+Pirate+Problem&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Academia&amp;rft.subject=Archives&amp;rft.subject=Audience&amp;rft.subject=Research&amp;rft.subject=Scholarship&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2008-04-22&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2008/04/22/the-pirate-problem/&amp;rft.language=English"></span>
<p><img class="alignleft size-full wp-image-293" title="160460760_b2a957955c_m" src="http://www.dancohen.org/wp/wp-content/uploads/2008/04/160460760_b2a957955c_m.jpg" alt="Jolly Roger Flag" hspace="10" width="240" height="160" align="left" />Last summer, a few blocks from my house, a new pub opened. Normally this would not be worth noting, except for the fact that <em>this bar is staffed completely by pirates</em>, with eye patches, swords, and even the occasional bird on the shoulder. These are not real pirates, of course, but modern men and women dressed up as pirates. But they wear the pirate garb with no hint of irony or thespian affect whatsoever; these are <em>dedicated, earnest</em> pirates.</p>
<p>At this point I should note that I do not live in Orlando, Florida, or any other place devoted to make-believe, but in a sleepy suburb of Washington, D.C., that is filled with Very Serious Professionals. When the pirate pub opened, the neighborhood VSPs (myself very much included) concluded that it was strange and silly and that it was an incontrovertible fact that no one would patronize the place. Or if they did, it would be as a lark.</p>
<p>We clung to this belief for approximately 24 hours, until, upon a casual stroll by the storefront, we witnessed six pirate-garbed pubgoers outside. Singing sea chanteys. <em>Without sheet music</em>. The tavern has been filled ever since.</p>
<p>Such an experience usefully reminds oneself that there are ways of acting and thinking that we can&#8217;t understand or anticipate. Who knew that there was a highly developed pirate subculture, and that it thrived among the throngs of politicos and think-tankers and professors of Washington? Who are these people?</p>
<p>My thoughts turned to pirates during my experience at a workshop at the <a href="http://www.unc.edu">University of North Carolina at Chapel Hill</a> a week ago, which was <a href="http://www.lib.unc.edu/mss/archivalmassdigitization/">devoted to the digitization of the unparalleled Southern Historical Collection</a>, and—in a less obvious way—to thinking about the past and future of humanities scholarship. Dozens of historians came to the workshop to discuss the way in which the SHC, the source of so many books and articles about the South and the home of 16 million archival documents, should be put on the web.</p>
<p>I gave the keynote, which I devoted to prodding the attendees into recognizing that the future of archives and research might not be like the past, and I showed several examples from my work and the work of <a href="http://chnm.gmu.edu">CHNM</a> that used different ways of searching and analyzing documents that are in digital, rather than analog, forms. Longtime readers of this blog will remember some of the examples, including an updated riff on <a href="http://www.dancohen.org/2006/08/08/mapping-what-americans-did-on-september-11/">what a future historian might learn</a> about the state of religion in turn-of-the-century America by data mining our <a href="http://911digitalarchive.org">September 11 Digital Archive</a>.</p>
<p>The most memorable response from the audience was from an award-winning historian I know from my graduate school years, who said that during my talk she felt like &#8220;a crab being lowered into the warm water of the pot.&#8221; Behind the humor was the difficult fact that I was saying that her way of approaching an archive and understanding the past was about to be replaced by techniques that were new, unknown, and slightly scary.</p>
<p>This resistance to thinking in new ways about digital archives and research was reflected in the pre-workshop survey of historians. Extremely tellingly, the historians surveyed wanted the online version of the SHC to be simply a digital reproduction of the physical SHC:</p>
<blockquote><p>With few exceptions, interviewees believed that the structure of the collection in the virtual space should replicate, not obscure, the arrangement of the physical collection. Thus, navigating a manuscript collection online would mimic the experience of navigating the physical collection, and the virtual document containers—e.g., folders—and digital facsimiles would map clearly back to the physical containers and documents they represent. [Laura Clark Brown and David Silkenat, "Extending the Reach of Southern Sources," p. 10]</p></blockquote>
<p>In other words, in the age of Google and advanced search tools and techniques, most historians just want to do their research they way they&#8217;ve always done it, by taking one letter out of the box at a time. One historian told of a critical moment in her archival work, when she noticed <em>a single word</em> in a letter that touched off the thought that became her first book.</p>
<p>So in Chapel Hill I was the pirate with the strange garb and ways of behaving, and this is a good lesson for all boosters of digital methods within the humanities. We need to recognize that the digital humanities represent a scary, rule-breaking, swashbuckling movement for many historians and other scholars. We must remember that these scholars have had—for generations and still in today&#8217;s graduate schools—a very clear path for how they do their work, publish, and get rewarded. Visit archive; do careful reading; find examples in documents; conceptualize and analyze; write monograph; get tenure.</p>
<p>We threaten all of this. For every time we focus on text mining and pattern recognition, traditionalists can point to the successes of close reading—on the power of a single word. We propose new methods of research when the old ones don&#8217;t seem broken. The humanities have an order, and we, mateys, threaten to take that calm ship into unknown waters.</p>
<p>[<em>Image credit:<a href="http://flickr.com/photos/smull/160460760/"> &amp;y</a>.</em>]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dancohen.org/2008/04/22/the-pirate-problem/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>The American Historical Association&#8217;s Archives Wiki</title>
		<link>http://www.dancohen.org/2008/02/09/the-american-historical-associations-archives-wiki/</link>
		<comments>http://www.dancohen.org/2008/02/09/the-american-historical-associations-archives-wiki/#comments</comments>
		<pubDate>Sun, 10 Feb 2008 03:03:23 +0000</pubDate>
		<dc:creator>Dan Cohen</dc:creator>
				<category><![CDATA[Archives]]></category>
		<category><![CDATA[Wikis]]></category>

		<guid isPermaLink="false">http://www.dancohen.org/2008/02/09/the-american-historical-associations-archives-wiki/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The+American+Historical+Association%26%238217%3Bs+Archives+Wiki&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Wikis&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2008-02-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2008/02/09/the-american-historical-associations-archives-wiki/&amp;rft.language=English"></span>
The American Historical Association has come up with a great idea for a wiki: a website that details the contents of historical archives around the world and includes information about visiting and using those archives. As with any wiki, historians and other researchers can improve the contents of the site by collaboratively editing pages. The [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The+American+Historical+Association%26%238217%3Bs+Archives+Wiki&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Wikis&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2008-02-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2008/02/09/the-american-historical-associations-archives-wiki/&amp;rft.language=English"></span>
<p>The <a href="http://www.historians.org">American Historical Association</a> has come up with a great idea for a <a href="http://en.wikipedia.org/wiki/Wiki">wiki</a>: a <a href="http://archiveswiki.historians.org/index.php/Main_Page">website that details the contents of historical archives</a> around the world and includes information about visiting and using those archives. As with any wiki, historians and other researchers can improve the contents of the site by collaboratively editing pages. The site should prove to be an important resource for scholars to consult before making expensive and time-consuming trips. It launches with information about <a href="http://archiveswiki.historians.org/index.php/Category:All_Archives">nearly 100 archives</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dancohen.org/2008/02/09/the-american-historical-associations-archives-wiki/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Research Jobs at JSTOR</title>
		<link>http://www.dancohen.org/2008/01/11/research-jobs-at-jstor/</link>
		<comments>http://www.dancohen.org/2008/01/11/research-jobs-at-jstor/#comments</comments>
		<pubDate>Fri, 11 Jan 2008 18:43:04 +0000</pubDate>
		<dc:creator>Dan Cohen</dc:creator>
				<category><![CDATA[Archives]]></category>
		<category><![CDATA[Jobs]]></category>

		<guid isPermaLink="false">http://www.dancohen.org/2008/01/11/research-jobs-at-jstor/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Research+Jobs+at+JSTOR&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Jobs&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2008-01-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2008/01/11/research-jobs-at-jstor/&amp;rft.language=English"></span>
JSTOR is continuing to work on making its critical archive more helpful and dynamic for scholars and students. They recently posted two research positions that might be of interest to readers of this blog.
]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Research+Jobs+at+JSTOR&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Jobs&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2008-01-11&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2008/01/11/research-jobs-at-jstor/&amp;rft.language=English"></span>
<p><a href="http://jstor.org">JSTOR</a> is continuing to work on making its critical archive more helpful and dynamic for scholars and students. They recently posted <a href="http://www.jstor.org/about/job_postings.html">two research positions</a> that might be of interest to readers of this blog.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dancohen.org/2008/01/11/research-jobs-at-jstor/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Understanding reCAPTCHA</title>
		<link>http://www.dancohen.org/2007/08/17/understanding-recaptcha/</link>
		<comments>http://www.dancohen.org/2007/08/17/understanding-recaptcha/#comments</comments>
		<pubDate>Fri, 17 Aug 2007 13:40:31 +0000</pubDate>
		<dc:creator>Dan Cohen</dc:creator>
				<category><![CDATA[Archives]]></category>
		<category><![CDATA[Digitization]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://www.dancohen.org/2007/08/17/understanding-recaptcha/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Understanding+reCAPTCHA&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Digitization&amp;rft.subject=Tools&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2007-08-17&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2007/08/17/understanding-recaptcha/&amp;rft.language=English"></span>
One of the things I added to this blog when I moved from my own software to WordPress was the red and yellow box in the comments section, which defends this blog against comment spam by asking commenters to decipher a couple of words. Such challenge-response systems are called CAPTCHAs (a tortured and unmellifluous acroynm [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Understanding+reCAPTCHA&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Digitization&amp;rft.subject=Tools&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2007-08-17&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2007/08/17/understanding-recaptcha/&amp;rft.language=English"></span>
<p><a href="http://www.dancohen.org/wp/wp-content/uploads/2007/08/recaptcha.gif" title="reCAPTCHA"><img src="http://www.dancohen.org/wp/wp-content/uploads/2007/08/recaptcha.gif" alt="reCAPTCHA" align="left" border="0" hspace="10" /></a>One of the things I added to this blog when <a href="http://www.dancohen.org/2007/07/25/creating-a-blog-from-scratch-part-10-the-conclusion/">I moved from my own software</a> to <a href="http://www.wordpress.org">WordPress</a> was the red and yellow box in the comments section, which defends this blog against comment spam by asking commenters to decipher a couple of words. Such challenge-response systems are called CAPTCHAs (a tortured and unmellifluous acroynm of &#8220;completely automated public Turing test to tell computers and humans apart&#8221;). What really caught my imagination about the CAPTCHA I&#8217;m using, called <a href="http://recaptcha.net">reCAPTCHA</a>, is that it uses words from books scanned by the <a href="http://www.archive.org">Internet Archive</a>/<a href="http://www.opencontentalliance.org">Open Content Alliance</a>. Thus at the same time commenters solve the word problems they are effectively serving as human OCR machines.</p>
<p>To date, about two million words have been deciphered using reCAPTCHA (see <a href="http://www.technologyreview.com/tr35/Profile.aspx?Cand=T&amp;TRID=631">the article in <em>Technology Review</em></a> lauding reCAPTCHA&#8217;s mastermind, Luis von Ahn), which is a great start but by my calculation (100,000 words per average book) only the equivalent of about 20 books. Of course, it&#8217;s really much more than that because the words in reCAPTCHA are the hardest ones to decipher by machine and are sprinkled among thousands of books.</p>
<p>Indeed, that is the true genius of reCAPTCHA—it &#8220;tells computers and humans apart&#8221; by first using OCR software to find words computers can&#8217;t decipher, then feeds those words to humans, who can decipher the words (proving themselves human). Therefore a spammer running OCR software (as many of them do to decipher lesser CAPTCHAs), will have great difficulty cracking it. If you would like an in-depth lesson about how reCAPTCHA (and CAPTCHAs in general) works, take a listen to <a href="http://www.twit.tv/sn101">Steve Gibson&#8217;s podcast on the subject</a>.</p>
<p>The brilliance of reCAPTCHA and its simultaneous assistance to the digital commons leads one to ponder: What other aspects of digitization, cataloging, and research could be aided by giving a large, distributed group of humans the bits that computers have great difficulty with?</p>
<p>And imagine the power of this system if all 60 million CAPTCHAs answered <em>daily</em> were reCAPTCHAs instead. Why not <a href="http://recaptcha.net/whyrecaptcha.html">convert your blog or login system</a> to reCAPTCHA today?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dancohen.org/2007/08/17/understanding-recaptcha/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Shakespeare&#8217;s Hard Drive</title>
		<link>http://www.dancohen.org/2007/08/13/shakespeares-hard-drive/</link>
		<comments>http://www.dancohen.org/2007/08/13/shakespeares-hard-drive/#comments</comments>
		<pubDate>Tue, 14 Aug 2007 01:39:54 +0000</pubDate>
		<dc:creator>Dan Cohen</dc:creator>
				<category><![CDATA[Archives]]></category>
		<category><![CDATA[Scholarship]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Standards]]></category>

		<guid isPermaLink="false">http://www.dancohen.org/2007/08/13/shakespeares-hard-drive/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Shakespeare%26%238217%3Bs+Hard+Drive&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Scholarship&amp;rft.subject=Software&amp;rft.subject=Standards&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2007-08-13&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2007/08/13/shakespeares-hard-drive/&amp;rft.language=English"></span>
Congrats to Matt Kirschenbaum on his thought-provoking article in the Chronicle of Higher Education, &#8220;Hamlet.doc? Literature in a Digital Age.&#8221; Matt makes two excellent points. First, &#8220;born digital&#8221; literature presents incredible new opportunities for research, because manuscripts written on computers retain significant metadata and draft tracking that allows for major insights into an author&#8217;s thought [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Shakespeare%26%238217%3Bs+Hard+Drive&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Scholarship&amp;rft.subject=Software&amp;rft.subject=Standards&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2007-08-13&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2007/08/13/shakespeares-hard-drive/&amp;rft.language=English"></span>
<p>Congrats to <a href="http://www.otal.umd.edu/~mgk/blog/">Matt Kirschenbaum</a> on his thought-provoking article in the Chronicle of Higher Education, &#8220;<a href="http://chronicle.com/free/v53/i50/50b00801.htm">Hamlet.doc? Literature in a Digital Age</a>.&#8221; Matt makes two excellent points. First, &#8220;born digital&#8221; literature presents incredible new opportunities for research, because manuscripts written on computers retain significant metadata and draft tracking that allows for major insights into an author&#8217;s thought and writing process. Second, scholars who wish to study such literature in the future need to be proactive in pushing for writing environments, digital standards, and archival storage that will provide accessibility and persistence for these advantages.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dancohen.org/2007/08/13/shakespeares-hard-drive/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>&#8220;The Object of History&#8221; Site Launches</title>
		<link>http://www.dancohen.org/2007/02/07/the-object-of-history-site-launches/</link>
		<comments>http://www.dancohen.org/2007/02/07/the-object-of-history-site-launches/#comments</comments>
		<pubDate>Thu, 08 Feb 2007 00:50:57 +0000</pubDate>
		<dc:creator>Dan Cohen</dc:creator>
				<category><![CDATA[Archives]]></category>
		<category><![CDATA[Digitization]]></category>
		<category><![CDATA[Museums]]></category>

		<guid isPermaLink="false">http://www.dancohen.org/2007/02/07/the-object-of-history-site-launches/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=%26%238220%3BThe+Object+of+History%26%238221%3B+Site+Launches&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Digitization&amp;rft.subject=Museums&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2007-02-07&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2007/02/07/the-object-of-history-site-launches/&amp;rft.language=English"></span>
Thanks to the hard work of my colleagues at the Center for History and New Media, led by Sharon Leon, you can now go behind the scenes with the curators of the National Museum of American History. This month the discussion begins with the famous Greensboro Woolworth&#8217;s lunch counter and the origins of the Civil [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=%26%238220%3BThe+Object+of+History%26%238221%3B+Site+Launches&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Digitization&amp;rft.subject=Museums&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2007-02-07&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2007/02/07/the-object-of-history-site-launches/&amp;rft.language=English"></span>
<p>Thanks to the hard work of my colleagues at the Center for History and New Media, led by Sharon Leon, you can now <a href="http://objectofhistory.org/">go behind the scenes with the curators</a> of the National Museum of American History. This month the discussion begins with the famous Greensboro Woolworth&#8217;s lunch counter and the origins of the Civil Rights movement. Each month will highlight a new object and its corresponding context, delivered in rich multimedia and with the opportunity to chat with the curators themselves.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dancohen.org/2007/02/07/the-object-of-history-site-launches/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Closer Look at the National Archives-Footnote Agreement</title>
		<link>http://www.dancohen.org/2007/02/05/a-closer-look-at-the-national-archives-footnote-agreement/</link>
		<comments>http://www.dancohen.org/2007/02/05/a-closer-look-at-the-national-archives-footnote-agreement/#comments</comments>
		<pubDate>Mon, 05 Feb 2007 18:09:12 +0000</pubDate>
		<dc:creator>Dan Cohen</dc:creator>
				<category><![CDATA[Archives]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Digitization]]></category>
		<category><![CDATA[Open Access]]></category>

		<guid isPermaLink="false">http://www.dancohen.org/2007/02/05/a-closer-look-at-the-national-archives-footnote-agreement/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=A+Closer+Look+at+the+National+Archives-Footnote+Agreement&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Copyright&amp;rft.subject=Digitization&amp;rft.subject=Open+Access&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2007-02-05&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2007/02/05/a-closer-look-at-the-national-archives-footnote-agreement/&amp;rft.language=English"></span>
I&#8217;ve spent the past two weeks trying to get a better understanding of the agreement signed by the National Archives and Footnote, about which I raised several concerns in my last post. Before making further (possibly unfounded) criticisms I thought it would a good idea to talk to both NARA and Footnote. So I picked [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=A+Closer+Look+at+the+National+Archives-Footnote+Agreement&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Copyright&amp;rft.subject=Digitization&amp;rft.subject=Open+Access&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2007-02-05&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2007/02/05/a-closer-look-at-the-national-archives-footnote-agreement/&amp;rft.language=English"></span>
<p>I&#8217;ve spent the past two weeks trying to get a better understanding of <a href="http://www.archives.gov/iarchives/iarchives-digitization-agreement.html">the agreement</a> signed by the National Archives and Footnote, about which I raised several concerns <a href="http://www.dancohen.org/blog/posts/national_archives_footnote_agreement">in my last post</a>. Before making further (possibly unfounded) criticisms I thought it would a good idea to talk to both NARA and Footnote. So I picked up the phone and found several people eager to clarify things. At NARA, Jim Hastings, director of access programs, was particularly helpful in explaining their perspective. (Alas, NARA&#8217;s public affairs staff seemed to have only the sketchiest sense of key details.) Most helpful&#8212;and most eager to rebut my earlier post&#8212;were Justin Schroepfer and Peter Drinkwater, the marketing director and product lead at Footnote. Much to their credit, Justin and Peter patiently answered most of my questions about the agreement and the operation of the Footnote website.</p>
<p>Surprisingly, everyone I spoke to at both NARA and Footnote emphasized that despite the seemingly set-in-stone language of the legal agreement, there is a great deal of latitude in how it is executed, and they asked me to spread the word about how historians and the general public can weigh in. It has received virtually no publicity, but NARA is currently in a public comment phase for the Footnote (a/k/a iArchives) agreement. Scroll down to the bottom of the &#8220;<a href="http://www.archives.gov/comment/index.html">Comment on Draft Policy</a>&#8221; page at NARA&#8217;s website and you&#8217;ll find a request for public comment (you should email your thoughts to <a href="mailto:Vision@nara.gov">Vision@nara.gov</a>). It&#8217;s a little odd to have a request for comment after the ink is dry on an agreement or policy, and this URL probably should have been included in the press release of the Footnote agreement, but I do think after speaking with them that both NARA and Footnote are receptive to hearing responses to the agreement. Indeed, in response to this post and my prior post on the agreement, Footnote has set up a web page, &#8220;<a href="http://blog.footnote.com/finding-the-right-balance/">Finding the Right Balance</a>,&#8221; to receive feedback from the general public on the issues I&#8217;ve raised. They also asked me to round up professional opinion on the deal.</p>
<p>I assume Footnote will explain their policies in greater depth on their blog, but we agreed that it would be helpful to record some important details of our conversations in this space. Here are the answers Justin and Peter gave to a few pointed questions.</p>
<p>When I first went to the Footnote site, I was unpleasantly surprised that it required registration even to look at &#8220;milestone&#8221; documents like Lincoln&#8217;s draft of the Gettysburg Address. (Unfortunately, Footnote doesn&#8217;t have a list of all of its free content yet, so it&#8217;s hard to find such documents.) Justin and Peter responded that when they launched the site there was an error in the document viewer, so they had to add authentication to all document views. A fix was rolled out on January 23, and it&#8217;s now possible to view these important documents without registering.</p>
<p>You do need to register, however, to print or download any document, whether it&#8217;s considered &#8220;free&#8221; or &#8220;premium.&#8221; Why? Justin and Peter candidly noted that although they have done digitization projects before, the National Archives project, which contains millions of critical&#8212;and public domain&#8212;documents, is a first for them. They are understandably worried about the &#8220;leakage&#8221; of documents from their site, and want to  take it one step at a time. So to start they will track all downloads to see how much escapes, especially in large batches. I noted that downloading and even reusing these documents (even en masse) very well might be legal, despite Footnote&#8217;s terms of service, because the scans are &#8220;slavish&#8221; copies of the originals, which are not protected by copyright. Footnote lawyers are looking at copyright law and what other primary-source sites are doing, and they say that they view these initial months as a learning experience to see if the terms of service can or should change. Footnote&#8217;s stance on copyright law and terms of usage will clearly be worth watching.</p>
<p>Speaking of terms of usage, I voiced a similar concern about Footnote&#8217;s policies toward minors. As you&#8217;ll recall, Footnote&#8217;s terms of service say the site is intended for those 18 and older, thus seeming to turn away the many K-12 classes that could take advantage of it. Justin and Peter were most passionate on this point. They told me that Footnote would like to give free access to the site for the K-12 market, but pointed to the restrictiveness of U.S. child protection laws. Because the Footnote site allows users to upload documents as well as view them, they worry about what youngsters might find there in addition to the NARA docs. These laws also mandate the &#8220;over 18&#8243; clause because the site captures personal information. It seems to me that there&#8217;s probably a technical solution that could be found here, similar to the one PBS.org uses to provide K-12 teaching materials without capturing information from the students.</p>
<p>Footnote seems willing to explore such a possibility, but again, Justin and Peter chalked up problems to the newness of the agreement and their inexperience running an interactive site with primary documents such as these. Footnote&#8217;s lawyers consulted (and borrowed, in some cases) the boilerplate language from terms of service at other sites, like Ancestry.com. But again, the Footnote team emphasized that they are going to review the policies and look into flexibility under the laws. They expect to tweak their policies in the coming months.</p>
<p>So, now is your chance to weigh in on those potential changes. If you do send a comment to either Footnote or NARA, try to be specific in what you would like to see. For instance, at the Center for History and New Media we are exploring the possibility of mining historical texts, which will only be possible to do on these millions of NARA documents if the Archives receives not only the page images from Footnote but also the OCRed text. (The handwritten documents cannot be automatically transcribed using optical character recognition, of course, but there are many typescript documents that have been converted to machine-readable text.) NARA has not asked to receive the text for each document back from Footnote&#8212;only the metadata and a combined index of all documents. There was some discussion that NARA is not equipped to handle the flood of data that a full-text database would entail. Regardless, I believe it would be in the best interest of historical researchers to have NARA receive this database, even if they are unable to post it to the web right away.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dancohen.org/2007/02/05/a-closer-look-at-the-national-archives-footnote-agreement/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Flawed Agreement between the National Archives and Footnote, Inc.</title>
		<link>http://www.dancohen.org/2007/01/15/the-flawed-agreement-between-the-national-archives-and-footnote-inc/</link>
		<comments>http://www.dancohen.org/2007/01/15/the-flawed-agreement-between-the-national-archives-and-footnote-inc/#comments</comments>
		<pubDate>Tue, 16 Jan 2007 01:18:40 +0000</pubDate>
		<dc:creator>Dan Cohen</dc:creator>
				<category><![CDATA[Archives]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Digitization]]></category>
		<category><![CDATA[Open Access]]></category>

		<guid isPermaLink="false">http://www.dancohen.org/2007/01/15/the-flawed-agreement-between-the-national-archives-and-footnote-inc/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The+Flawed+Agreement+between+the+National+Archives+and+Footnote%2C+Inc.&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Copyright&amp;rft.subject=Digitization&amp;rft.subject=Open+Access&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2007-01-15&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2007/01/15/the-flawed-agreement-between-the-national-archives-and-footnote-inc/&amp;rft.language=English"></span>
I suppose it&#8217;s not breaking news that libraries and archives aren&#8217;t flush with cash. So it must be hard for a director of such an institution when a large corporation, or even a relatively small one, comes knocking with an offer to digitize one&#8217;s holdings in exchange for some kind of commercial rights to the [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=The+Flawed+Agreement+between+the+National+Archives+and+Footnote%2C+Inc.&amp;rft.aulast=Cohen&amp;rft.aufirst=Dan&amp;rft.subject=Archives&amp;rft.subject=Copyright&amp;rft.subject=Digitization&amp;rft.subject=Open+Access&amp;rft.source=Dan+Cohen%27s+Digital+Humanities+Blog&amp;rft.date=2007-01-15&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://www.dancohen.org/2007/01/15/the-flawed-agreement-between-the-national-archives-and-footnote-inc/&amp;rft.language=English"></span>
<p>I suppose it&#8217;s not breaking news that libraries and archives aren&#8217;t flush with cash. So it must be hard for a director of such an institution when a <a href="http://www.google.com">large corporation</a>, or even a <a href="http://www.iarchives.com/index.shtml">relatively small one</a>, comes knocking with an offer to digitize one&#8217;s holdings in exchange for some kind of commercial rights to the contents. But as a historian worried about open access to our cultural heritage, I&#8217;m a little concerned about <a href="http://www.archives.gov/press/press-releases/2007/nr07-41.html">the new agreement</a> between <a href="http://www.footnote.com">Footnote, Inc.</a> and the <a href="http://www.archives.gov">United States National Archives</a>. And I&#8217;m surprised that somehow this agreement has thus far flown under the radar of all of those who attacked the troublesome <a href="http://www.historians.org/Perspectives/issues/2006/0605/0605nch1.cfm">Smithsonian/Showtime agreement</a>. Guess what? From now until 2012 it will cost you $100 a year, or even more offensively, $1.99 a page, for online access to critical historical documents such as the Papers of the Continental Congress.</p>
<p>This was <a href="http://www.archives.gov/press/press-releases/2007/nr07-41.html">the agreement</a> signed by Archivist of the United States Allen Weinstein and Footnote, Inc., a Utah-based digital archives company, on January 10, 2007. For the next five years, unless you have the time and money to travel to Washington, you&#8217;ll have to fork over money to Footnote to take a peek at Civil War pension documents or the case files of the early FBI. The National Archives says this agreement is &#8220;non-exclusive&#8221;&#8212;I suppose crossing their fingers that Google will also come along and make a deal&#8212;but researchers shouldn&#8217;t hold their breaths for other options.</p>
<p>Footnote.com, the website that provide access to these millions of documents, charges for anything more than viewing a small thumbnail of a page or photograph. Supposedly the value-added of the site (aside from being able to see detailed views of the documents) is that it allows you to save and annotate documents in your own library, and share the results of your research (though not the original documents). Hmm, I seem to remember that there&#8217;s <a href="http://www.zotero.org">a tool</a> being developed that will allow you to do all of that&#8212;for free, no less.</p>
<p>Moreover, you&#8217;ll also be subject to some fairly onerous <a href="http://www.footnote.com/termsandconditions.php">terms of usage</a> on Footnote.com, especially considering that this is our collective history and that all of these documents are out of copyright. (For a detailed description of the legal issues involved here, please see <a href="http://chnm.gmu.edu/digitalhistory/copyright/index.php">Chapter 7</a> of <a href="http://chnm.gmu.edu/digitalhistory/index.php"><i>Digital History</i></a>, <a href="http://chnm.gmu.edu/digitalhistory/copyright/index.php">&#8220;Owning the Past?&#8221;</a>, especially <a href="http://chnm.gmu.edu/digitalhistory/copyright/3.php">the section covering the often bogus claims</a> of copyright on scanned archival materials.) I&#8217;ll let the terms speak for themselves (plus one snide aside): &#8220;Professional historians and others conducting scholarly research may use the Website [gee, thanks], provided that they do so within the scope of their professional work, that they obtain written permission from us before using an image obtained from the Website for publication, and that they credit the source. You further agree that&#8230;you will not copy or distribute any part of the Website or the Service in any medium without Footnote.com&#8217;s prior written authorization.&#8221;</p>
<p>Couldn&#8217;t the National Archives have at least added a provision to the agreement with Footnote to allow students free access to these documents? I guess not; from the terms of usage: &#8220;The Footnote.com Website is intended for adults over the age of 18.&#8221; What next? Burly bouncers carding people who want to see the Declaration of Independence?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dancohen.org/2007/01/15/the-flawed-agreement-between-the-national-archives-and-footnote-inc/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
