It's been around for awhile. The problem has always existed. An orphan work is a book that's still protected by copyright, but for which you cannot find an author or the rights holder, somebody who can give you permission to scan the book.
We changed our copyright law in the last thirty years, and copyright became automatic, even if the work is not published, and there's no notice or registration. That means that there are a lot more copyrights in the world, and there are a lot more orphan works, especially now that copyright is defined by how long it lasts after the creator has died. It means that copyright is always by definition going to pass to heirs unless it's already been transferred to publishers. But most of the time it goes to heirs, and most of the time they don't know they hold the copyright.
The University of Michigan library is administering the process. They go through a process of trying to find the rights holder—who's the publisher, who is the author, was it registered for copyright, in whose name, and then searching directories and databases to try to find information about those entities. The last step for Hathi is to put out a list of things they think are orphan works and say, if we're wrong, tell us, and we won't release this work.
Obviously the process was flawed, and they know that. However, in one sense, it worked exactly the way it was supposed to. When the Authors Guild filed the suit, no works on the orphan list had actually been distributed. Michigan has now put together a small advisory group of attorneys, including me, to go over and tighten the process. And they're not going to start distributing any of the orphan works until they're much more comfortable than they are now with the process.
Hathi is first doing a lot of research to determine if there's a rights holder out there. If they think that there's a good chance that there's a rights holder interested in exploiting their copyright, they're not even putting the work on the list. So they're not demanding that rights holders opt out.
The biggest problem is knowing where to stop. In a massive project like this, what's enough research to identify rights holders? If there's no copyright record, you don't know if the author or the publisher was the rights holder. So you've got two different branches to research, and one of them is going to be a dead end. With the companies, there's usually a pretty easy endpoint—we can either find the company or its successor or we can't. With the author, how far do you go? Do you become a genealogist, who can spend years trying to track down a long lost relative? There has to be a stopping point, otherwise the project simply becomes unwieldy. So you have to find a reasonable balance.
The copies had already been made six or seven years ago. The lawsuit against Hathi doesn't name Google, which did the scanning. Their complaint is that the files that were given back to the universities for deposit in Hathi were unauthorized and infringing, and that therefore their distribution is illegal. [Duke has thus far not provided works to Google for scanning for the Google Books project.] They obviously also think the copying itself was illegal, but none of the defendants in the case did that.
We're making a very limited use [of these works] that falls under the general copyright exception of fair use. All we're doing is what's sometimes called "time and space shifting." Duke students would only get access to digital copies of books they could get if they came to the library and requested a physical copy.
There are a couple of ways. One is to go back to its previous proposal, which said, if there's been a good-faith attempt to determine who the rights holder is, but then a rights holder only comes forward later, that person is entitled to a reasonable license fee but not to damages. That would make very little difference for us because the fair-use provision already provides for the waiver of damages for educational institutions.
Another way Congress could do it is to authorize a collective-rights agency to license orphan works. Canada has done that. But that means there would be nowhere near the kind of access to orphan works that would have been possible. For a lot of these works, it's simply not worth paying even a nominal fee. Even though we think it would be a value to our community, we don't expect this to be something that everybody uses.
My personal opinion is that [a collective- rights agency] is what the Authors Guild would like to see. In the countries that have adopted this, the collecting agency holds onto license fees for some designated period of time in case a rights holder comes forward. If no one comes forward, the fees are simply distributed with other royalties that are collected for similar kinds of works. But that's not the same as saying the creators would benefit. If it's just paying the publishing industry in general, I don't think that serves the purpose of copyright law, which is to incentivize creativity.
There are two sort of contradictory effects. I think the impact on the commercial publishing industry would be negligible. As long as there's been a reasonable determination of the eligibility of orphan works, there's no impact on the publishing industry. And when a work turns up not to be an orphan, Hathi has already said we will immediately suspend access to it.
I think the impact for academia is significant. It will be possible to do word searches in these files. It will be possible to access more work from your desk or your dorm room. Some of those books are quite obscure, which is why nobody's exploiting them. We keep a lot of stuff on the shelf because it's going to be perfect for somebody doing some research at some point. But in the meantime, if we can make those works more accessible, it'll be that much easier for the person for whom that book is perfect to find it.
This interview was conducted, condensed, and edited by Elissa Lerner.
Copyrights and Copywrongs
Kevin Smith, Director of Scholarly Communications, Duke Libraries
November 30, 2011