Those who have heard me talk about the Digital Public Library of America over the past six months know that I’m fond of saying that DPLA is as much a social project as a technical project. Much of what we do focuses on collaboration and coordination, which involves looking not just at technical—or legal—elements, but social ones.
It’s much easier to think of an issue solely as a technical problem (we just need to figure out how to code that properly), or as a legal problem (we just need to bind everyone under a contractual arrangement to achieve the desired outcome), than as a social issue, since the latter requires attention to more amorphous aspects such as ethics and politics. But being more nuanced about the mix of the social, technical, and legal can pay dividends.
Take DPLA’s metadata. (Please. Take our metadata. It’s all freely available on our site.) One of the questions I frequently get is why the Digital Public Library of America requires the metadata for items in our collection to be donated under a CC0 license. That license is maximally permissive; as its longer name implies, CC0 is in fact a Public Domain Dedication.
Metadata obviously has elements of the technical and legal. Without a stringent technical standard into which we normalize data from over a thousand institutions, and a serious digital infrastructure to transform that metadata into interfaces such as maps and timelines, we couldn’t work much magic. And since we are conscious of the legal realm that many cultural heritage materials exist in, we do ask for a contract that specifies CC0 for the metadata. (However, there are many who would argue that even a CC0 license is unnecessary and should not even be demanded; by its very nature, a purely descriptive set of metadata should not be copyrightable (under U.S. law), but this is a discussion for another day.)
But why not ask for the most modest of additional restrictions, such as a license where attribution is required—a license with a -BY attached to the right? If we wish to tip our hat to those who created or donated the metadata, why not legally mandate it?
Those who use, reuse, and commingle data know the complex issues that arise with even simple additional requirements such as this. Data that flows from many sources will pick up, like fallen branches in the stream, a variety of ensnaring reeds, adding significant friction and complexity to some applications. But good-meaning people still want to provide attribution, and individuals and institutions might have social expectations of receiving credit. What to do?
Move the attribution from the legal realm into the social or ethical realm by pairing a permissive license with a strong moral entreaty.
For instance, the Tate recently released metadata for 70,000 works of art, and 3500 artists. The license they put on the data was CC0. But right next to that license is this block on “Usage Guidelines”:
These usage guidelines are based on goodwill. They are not a legal contract but Tate requests that you follow these guidelines if you use Metadata from our Collection dataset.
The Metadata published by Tate is available free of restrictions under the Creative Commons Zero Public Domain Dedication.
This means that you can use it for any purpose without having to give attribution. However, Tate requests that you actively acknowledge and give attribution to Tate wherever possible. Attribution supports future efforts to release other data. It also reduces the amount of ‘orphaned data’, helping retain links to authoritative sources.
These usage guidelines are based on goodwill, they are not a legal contract but Europeana requests that you follow these guidelines if you use metadata from Europeana.
All metadata published by Europeana are available free of restriction under theCreative Commons CC0 1.0 Universal Public Domain Dedication. However, Europeana requests that you actively acknowledge and give attribution to all metadata sources, such as the data providers (being a specific cultural heritage institution) and any data aggregators, including Europeana.
Give credit where credit is due.
DPLA does the same thing with our Data Best Use Practices page.
I have been calling this implied or ethical attribution. Or, if you like short and snappy symbols, think of it as CC0 (+BY) rather than CC-BY (or ODB-BY).
The cynics, of course, will say that bad actors will do bad things with all that open data. But here’s the thing about the open web: bad actors will do bad things, regardless. They will ignore whatever license you have asserted, or use technical means to circumvent your technical lock. And yes, with CC0 commercial entities as well might come and take all of that metadata—but that data includes pointers back to items and scans at libraries, archives, and museums, which are (or should be) in the business of disseminating knowledge as widely as possible. By being free with our metadata, we do not devalue those nonprofit institutions, but rather emphasize more broadly the incredible contents they hold.
The flip side of worries about bad actors is that we underestimate the number of good actors doing the right thing. It has been our experience looking at the many software developers (including commercial ones) who have used our data across the web and in DPLA-powered apps, for instance, that they have all maintained proper attribution, even though the CC0 license theoretically means that they can do with the data whatever they want.
I think CCO (+BY) is the best of both worlds: the data in a free-flowing environment that enables creativity and reuse, with attribution still maintained by the vast majority of people who consider themselves part of a social contract.