Dan Cohen

Archive for the ‘Web’ Category

Data on How Professors Use Technology

Sunday, January 15th, 2006

Rob Townsend, the Assistant Director of Research and Publications at the American Historical Association and the author of many insightful (and often indispensible) reports about the state of higher education, writes with some telling new data from the latest National Study of Postsecondary Faculty (conducted by the U.S. Department of Education roughly every five years since 1987). Rob focused on several questions about the use of technology in colleges and universities. The results are somewhat surprising and thought-provoking.

Here are two relatively new questions, exactly as they are written on the survey form (including the boldface in the first question; more on that later), which you can download from the Department of Education website. “[FILL INSTNAME]” is obviously replaced in the actual questionnaire by the faculty member’s institution.

Q39. During the 2003 Fall Term at [FILL INSTNAME], did you have one or more web sites for any of your teaching, advising, or other instructional duties? (Web sites used for instructional duties might include the syllabus, readings, assignments, and practice exams for classes; might enable communication with students via listservs or online forums; and might provide real-time computer-based instruction.)

Q41: During the 2003 Fall Term at [FILL INSTNAME], how many hours per week did you spend
communicating by e-mail (electronic mail) with your students?

Using the Department of Education’s web service to create bar graphs from their large data set, Rob generated these two charts:

Rob points out that historians are on the low end of e-mail usage in the academy, though it seems not too far off from other disciplines in the humanities and social sciences. A more statistically significant number to get (and probably impossible using this data set) would be the time spent on e-mail per student, since the number of students varies widely among the disciplines. [Update: Within hours of this post Rob had crunched the numbers and came up with an average of 2 minutes per student for history instructors (average of 83 students divided by 2.8 hours spent writing e-mail per week).]

For me, the surprising chart is the first one, on the adoption of the web in teaching, advising, or other instructional duties. Only about a 5-10% rise in the use of the web from 1998 to 2003 for most disciplines, and a decline for English and Literature? This, during a period of enormous, exponential growth in the web, a period that also saw many institutions of higher education mandate that faculty put their syllabi on the Internet (often paying for expensive course management software to do so)?

I have two theories about this chart, with the possibility that both theories are having an effect on the numbers. First, I wonder if that boldfaced “you” in Q39 made a number of professors answer “no” if technically they had someone else (e.g., a teaching assistant or department staffer) put their syllabus or other course materials online. I did some further research after hearing from Rob and noticed that buried in the 1998 survey questionnaire was a slightly different wording, with no boldface: “During the 1998 Fall Term, did you have websites for any of the classes you taught?” Maybe those wordsmiths in English and Literature were parsing the language of the 2003 question a little too closely (or maybe they were just reading it correctly, unlike faculty members from the other disciplines).

My second theory is a little more troubling for cyber-enthusiasts who believe that the Internet will take over the academy in the next decade, fully changing the face of research and instruction. Take a look at this chart from the Pew Internet and American Life Project:

Note how after an initial surge in Internet adoption in the late 1990s the rate of growth has slowed considerably. A minority, small but significant, will probably never adopt the Internet as an important, daily medium of interaction and information. If we believe the Department of Education numbers, within this minority is apparently a sizable segment of professors. According to additional data extracted by Rob Townsend, it looks like this segment is about 16% of history professors and about 21% of English and Literature professors. (These are faculty members who in the fall of 2003 did not use e-mail or the web at all in their instruction.) Remarkably, among all disciplines about a quarter (24.2%) of the faculty fall into this no-tech group. Seems to me it’s going to be a long, long time before that number is reduced to zero.

Kojo Nnamdi Show Questions

Tuesday, January 10th, 2006

Roy Rosenzweig and I had a terrific time on The Kojo Nnamdi Show today. If you missed the radio broadcast you can listen to it online on the WAMU website. There were a number of interesting calls from the audience, and we promised several callers that we would answer a couple of questions off the air; here they are.

Barbara from Potomac, MD asks, “I’m wondering whether new products that claim to help compress and organize data (I think one is called “C-Gate” [Kathy, an alert reader of his blog, has pointed out that Barbara probably means the giant disk drive company Seagate]) help out [to solve the problem of storing digital data for the long run]? The ads claim that you can store all sorts of data—from PowerPoint presentations and music to digital files—in a two-ounce standalone disk or other device.”

As we say in the book, we’re skeptical of using rare and/or proprietary formats to store digital materials for the long run. Despite the claims of many companies about new and novel storage devices, it’s unclear whether these specialized devices will be accessible in ten or a hundred years. We recommend sticking with common, popular formats and devices (at this point, probably standard hard drives and CD- or DVD-ROMs) if you want to have the best odds of preserving your materials for the long run. The National Institute of Standards and Technology (NIST) provides a good summary of how to store optical media such as CDs and DVDs for long periods of time.

Several callers asked where they could go if they have materials on old media, such as reel-to-reel or 8-track tapes, that they want to convert to a digital format.

You can easily find online some of the companies we mentioned that will (for a fee) transfer your own media files onto new devices. Google for the media you have (e.g., “8-track tape”) along with the words “conversion services” or “transfer services.” I probably overestimated the cost for these services; most conversions will cost less than $100 per tape. However, the older the media the more expensive it will be. I’ll continue to look into places in the Washington area that might provide these services for free, such as libraries and archives.

Digital History on The Kojo Nnamdi Show

Sunday, January 8th, 2006

From the shameless plug dept.: Roy Rosenzweig and I will be discussing our book Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web this Tuesday, January 10, on The Kojo Nnamdi Show. The show is produced at Washington’s NPR station, WAMU. We’re on live from noon to 1 PM EST, and you’ll be able to ask us questions by phone (1-800-433-8850), via email (kojo@wamu.org), or through the web. The show will be replayed from 8-9 PM EST on Tuesday night, and syndicated via iTunes and other outlets as part of NPR’s terrific podcast series (look for The Kojo Nnamdi Show/Tech Tuesday). You’ll also be able to get the audio stream directly from the show’s website. I’ll probably answer some additional questions from the audience in this space.

Creating a Blog from Scratch, Part 5: What is XHTML, and Why Should I Care?

Thursday, January 5th, 2006

In prior posts in this series (1, 2, 3, and 4), I described with some glee my rash abandonment of common blogging software in favor of writing my own. For my purposes there seemed to be some key disadvantages to these popular packages, including an overemphasis on the calendar (I just saw the definition of a blog at the South by Southwest Interactive Festival—”a page with dated entries”—which, to paraphrase Woody Allen, is like calling War and Peace “a book about Russia”), a sameness to their designs, and comments that are rarely helpful and often filled with spam. But one of the greatest advantages of recent blog software packages is that they generally write standards-compliant code. More specifically, blog software like WordPress automatically produces XHTML. Some of you might be asking, what is XHTML, and who cares? And why would I want to spend a great deal of effort ensuring that this blog complied strictly with this language?

The large digital library contingent that reads this blog could probably enumerate many reasons why XHTML compliance is important, but I had two reasons in mind when I started this blog. (Actually, I had a third, more secretive reason that I’ll mention first: Roy Rosenzweig and I argue in our book Digital History that XHTML will likely be critical for digital humanists to adhere to in the future—don’t want to be accused of being a hypocrite.) For those for whom web acronyms are Greek, XHTML is a sibling of XML, a more rigorously structured and flexible language than the HTML that underlies most of the web. XHTML is better prepared than HTML to be platform-independent; because it separates formatting from content, XHTML (like XML) can be reconfigured easily for very different environments (using, e.g., different style sheets). HTML, with formatting and content inextricably combined, for the most part assumes that you are using a computer screen and a web browser. Theoretically XHTML can be dynamically and instantaneously recast to work on many different devices (including a personal computer). This flexibility is becoming an increasingly important feature as people view websites on a variety of platforms (not just a normal computer screen, e.g., but cell phones or audio browsers for the blind). Indeed, according to the server logs for this blog, 1.6% of visitors are using a smart phone, PDA, or other means to read this blog, a number that will surely grow. In short, XHTML seems better prepared than regular HTML to withstand the technological changes of the coming years, and theoretically should be more easily preserved than older methods of displaying information on the web. For these and other reasons a 2001 report the Smithsonian commissioned recommended the institution move to XHTML from HTML.

Of course, with standards compliance comes extra work. (And extra cost. Just ask webmasters at government agencies trying to make their websites comply with Section 508, the mandatory accessibility rules for federal information resources.) Aside from a brief flirtation with the what-you-see-is-what-you-get, write-the-HTML-for-you program Dreamweaver in the late 1990s, I’ve been composing web pages using a text editor (the superb BBEdit) for over ten years, so my hands are used to typing certain codes in HTML, in the same way you get used to a QWERTY keyboard. XHTML is not that dissimilar from HTML, but it still has enough differences to make life difficult for those used to HTML. You have to remember to close every tag; some attributes related to formating are in strange new locations. One small example of the minor infractions I frequently trip up on writing XHTML: the oft-used break tag to add a line to a web page must “close itself” by adding a slash before the end bracket (not <br>, but <br />). But I figured doing this blog would give me a good incentive to start writing everything in strict XHTML.

Yeah, right. I clearly haven’t been paying enough attention to detail. The page you’re reading likely still has dozens of little coding errors that make it fail strict compliance with the World Wide Web Consortium’s XHTML standard. (If you would like a humbling experience that brings to mind receiving a pop quiz back from your third-grade teacher with lots of red ink on it, try the W3C’s XHTML Validator.) I haven’t had enough time to go back and correct all of those little missing slashes and quotation marks. WordPress users out there can now begin their snickering; their blog software does such mundane things for them, and many proudly (and annoyingly) display little “XHTML 1.0 compliant” badges on their sites. Go ahead, rub it in.

After I realized that it would take serious effort to bring my code up to code, so to speak, I sat back and did the only thing I could do: rationalize. I didn’t really need strict XHTML compliance because through some design slight-of-hand I had already been able to make this blog load well on a wide range of devices. I learned from other blog software that if you put the navigation on the right rather than the more common left you see on most websites, the body of each post shows up first on a PDA or smart phone. It also means that blind visitors don’t have to suffer through a long list of your other posts before getting to the article they want to read.

As far as XHTML is concerned, I’ll be brushing up on that this summer. Unless I move this blog to WordPress by then.

Part 6: One Year Later

Nature Compares Science Entries in Wikipedia with Encyclopaedia Britannica

Wednesday, December 14th, 2005

In an article published tomorrow, but online now, the journal Nature reveals the results of a (relatively small) study it conducted to compare the accuracy of Wikipedia with Encyclopaedia Britannica—at least in the natural sciences. The results may strike some as surprising.

As Jim Giles summarizes in the special report: “Among 42 entries tested, the difference in accuracy was not particularly great: the average science entry in Wikipedia contained around four inaccuracies; Britannica, about three…Only eight serious errors, such as misinterpretations of important concepts, were detected in the pairs of articles reviewed, four from each encyclopaedia. But reviewers also found many factual errors, omissions or misleading statements: 162 and 123 in Wikipedia and Britannica, respectively.”

These results, obtained by sending experts such as the Princeton historian of science Michael Gordin matching entries from the democratic/anarchical online source and the highbrow, edited reference work and having them go over the articles with a fine-toothed comb, should feed into the current debate over the quality of online information. My colleague Roy Rosenzweig has written a much more in-depth (and illuminating) comparison of Wikipedia with print sources in history, due out next year in the Journal of American History, which should spark an important debate in the humanities. I suspect that the Wikipedia articles in history are somewhat different than those in the sciences—it seems from Nature’s survey that there may be more professional scientists contributing to Wikipedia than professional historians—but couple of the basic conclusions are the same: the prose on Wikipedia is not so terrific but most of its facts are indeed correct, to a far greater extent than Wikipedia’s critics would like to admit.

First Monday is Second Tuesday This Month

Tuesday, December 13th, 2005

For those who have been asking about the article I wrote with Roy Rosenzweig on the reliability of historical information on the web (summarized in a previous post), it has just appeared on the First Monday website, perhaps a little belatedly given the name of the journal.

Reliability of Information on the Web

Monday, December 5th, 2005

Given the current obsession with the reliability (or more often in media coverage, the unreliability) of information on the web—the New York Times weighed in on the matter yesterday, and USA Today carried a scathing op-ed last week—I feel lucky that an article Roy Rosenzweig and I wrote entitled “Web of Lies? Historical Information on the Internet” happens to appear today in First Monday. If you’re interested in the subject, it’s probably best to read the full article, but I’ll provide a quick summary of our argument here.

Using my H-Bot software tool, Roy and I scanned the Internet to assess the quality of online information about history. In short, we found that while critics are correct that there are many error-riddled web pages, on the whole the web presents a relatively sound portrayal of historical facts through a process of consensus. With the right tools, these facts can be extracted from the web, leaving the more problematic web pages aside.

Moreover, this process of historical data mining on the web should prompt further discussion about the significance of all of this historical information online. To do some of our own prompting, we had a special multiple-choice test-taking version of H-Bot take the National Assessment of Educational Progress U.S. History exam using nothing but the web and some fancy algorithms borrowed from computer science. [Spoiler alert: it passed.] This raises new questions that move far beyond simple debates over the reliability of information on the web and into the very nature of teaching, learning, and research in our digital age.