Licensing Geographic Data

Saturday, April 4, 2009

The following is based on my understanding of the United States Law, the only law with which I am familiar. The article was written with free advice gratefully received from Shubha Ghosh, Professor of Law at the Wisconsin Law School. All errors are mine.


Whether or not data can be controlled by licenses is a commonly asked question. This question stems from a confusion about licenses and contracts, data and facts, and the role of intellectual property rights and laws. These terms are used interchangeably and loosely, by most of us who are not lawyers, leading to confusion (yes, IANAL).

A license is a legal instrument that conveys a right, accompanied with a promise, by the grantor to not sue the grantee if that right is exercised. In the context of property law, a license is a unilateral permission to use someone else‘s property. In the context of digital files, a license describes the conditions of usage under which those files may be used. A license on a digital file can exist whether or not there are any corresponding users of that file. A user would have to abide by the license that covers the usage of that file, and if any of the conditions of usage described in that license are violated, then the user would have to cease using that file. Licenses are covered by federal copyright laws.

A contract is like a license, but requires at least two parties agreeing to it. Without at least two parties, a contract cannot exist. A contract specifically describes the obligations of both parties to the contract. For example, “If I give you this data file with experiment readings, you will give me a chart showing a scattergram” is a contract, provided both you and I agree to it. Contracts fall under the purview of state law.

So, one could have a digital file that is released under a particular license, but could also be given to someone under a contract to actually deliver something in return. For example, I could license my music mp3 under a Creative Commons 3.0 NC license and give it to you under a contract that obligates you to add a strings soundtrack to the file and give the file back to me. If you agree to the contract and take my mp3, then you will have to deliver that mp3 with a strings soundtrack added to it. If you fail to give me an mp3 with the promised strings soundtrack incorporated in it, you will be in breach of your contract. Of course, the license itself would allow you to do anything else with the mp3 as well as long as it was not used in a commercial project (that condition comes from the NC clause in the CC 3.0 NC license). This example should make the difference between a license and a contract very clear.

It should be noted that a CC license can also take on the nature of a contract. For example, a CC 3.0 BY license obligates the user of the licensed item to give attribution to the creator of the item (that obligation stems from the BY clause in the CC 3.0 BY license). This can be problematic in the case of data set because of attribution-stacking whereby a user can get legally obligated to attribute all the contributors to a crowd-sourced data set.

Note, I have been using the term “digital file” in the previous paragraphs. That is because the digital file actually can consist of anything from facts to purely creative expression. For example, a digital file containing a list of comma-separate values (CSV) representing a list of houses and their geographic coordinates is pure data which cannot be protected by copyright, while a digital file containing a photograph of the same houses taken from an airplane is a pure creative work of authorship depicting the same features as the CSV data, but is protected by copyright. To complicate matters further, a digital file containing a software that can analyze the CSV data and create a visual representation is a creative work of authorship protected by copyright, and may also implement a process that could be patentable. In this sense, digital files are universal container for a variety of products, from facts to original works of authorship.

Therein lies the other source of confusion—facts versus creative expression of facts—since, in reality, digital files containing “data” lie somewhere in between fact and expression. This is the problem with most data.

The United States Constitution protects facts from being treated as property thereby ensuring that facts are available for use by everyone. The United States Supreme Court has upheld this in its decisions, most famously in Feist Publications, Inc. v. Rural Telephone Service Co., 499 U.S. 340 (1991). At the same time, the United States also allows protection of creative expressions of facts and ideas, or “original works of authorship,” as stated by copyright law.

So now, our task boils down to determining if that digital file of ours contains facts or creative expression of facts. Unfortunately, this task is far from easy, and in fact, it is usually so difficult that it is probably better to not attempt it. Some examples are obvious ó the lat/lon of Madison, WI, is 43.1∫N and 89.5∫W, and that is a fact, and that fact cannot be copyrighted. It cannot be licensed or protected in any other way, and usually, no one in their right mind would bind themselves to a contract in order to get that information. On the other hand, an artist could render the lat/lon values of Madison, WI in an artistic way using Photoshop, and the digital file containing that Photoshop image would certainly be protected by copyright.

Certainly, digital files containing software are covered by licenses. Keep in mind, while the process that the software implements could actually be patented, the software itself is treated as a literary work, and covered by copyright law.

Even more confusion arises from the common practice of applying “End User License Agreements (EULA)” or “Software License Agreements (SLA)” to software, and extracting a promise from the user of the software to be held to certain obligations such as not reverse-engineering or decompiling the software, or transferring a copy of the software to someone else. In spite of their names, these are not true licenses; instead, they are agreements or contracts. It is not a coincidence that these “licenses” have the word “agreement” in their names because they are contracts that look like licenses.

But, back to our problem of data ó it is very difficult to determine where raw data, or facts, end and interpreted data, or creative expression of facts, begin. For example, sensors readings on a charge-coupled device (CCD) are certainly raw data, however, when such readings are acquired through a camera and stored on a reproducible medium, they become photographs that are protected by copyright. In the geospatial realm, raw data collected by remote sensing satellites are reflectance values of geographic features, that is, pure facts. These sensor readings are processed in image-processing programs to create colored photographs that can be protected by copyright.

Further complications arise vis a vis data in that while factual data themselves can‘t be copyrighted, the manner in which the data are organized can indeed be copyrighted. Data, typically held in a database, are organized to optimize any one or more of atomicity, consistency, isolation, durability as well as security and speed of access. This organization reflects creativity, and hence, it is protected by copyright. To the extent that the organization of data assists in the execution of a function or a process, it can be patented. And, to make matters even more complicated, data can even be protected by trade secret ó for example, names and addresses of customers can be of strategic value to a business, and the business can protect those data as a trade secret.

Back to our original question ó can one license geographic data? Yes and no. One can‘t apply a pure, intellectual property license to geographic data to the extent that those data represent facts. To the extent that the data represent an “original work of authorship,” that is, an interpretation of facts, they can indeed be copyrighted. The database, that is, the container of the data, being an original work of authorship, can also be copyrighted. However, a copyright infringement claim for the database cannot be enforced if the data contained within are in public domain. In fact, as established in Assessment Technologies of WI, LLC v. WIREdata, Inc., 350 F.3d 640 (7th Cir., 2003) if the actual data are in public domain, and if they cannot be extracted without violating the copyright over the copyrightable elements of the database, then the copyright claim itself can be weakened and even nullified.

On the other hand, one can definitely attach a license agreement or a contract to geographic data that governs its usage, and demands certain obligations from its user. This was shown in an earlier ProCD, Inc. v. Zeidenberg 86 F.3d 1447 (7th Cir., 1996) decision which held that the “shrinkwrap” license agreement that prevented a CD-ROM‘s user from copying the CD containing public domain business telephone listings was enforceable against the user.

Since ProCD preceded Assessment Technologies by 7 years, ProCD is actually referenced in the Assessment Technologies decision. ProCD was a “contract law” case, and a contract binds only those who are party to the contract. Since the user of ProCD, by breaking the “shrinkwrap,” is agreeing to contract, that user is bound by ProCD‘s contract. The Assessment Technologies case, on the other hand, is a copyright case. Assessment Technologies and WIREdata were not bound by contract, so Assessment Technologies was suing WIREdata for violating a copyright. Hence, their case was thrown out.

The lesson here is that copyright applies to everyone whether or not they agree with it. For example, this article that you are reading is covered by my rights and licensed under a CC 3.0 BY license, which is both a license and a contract. My rights in this article will continue to exist whether or not you agree with them. On the other hand, a contract applies only to those who agree with it, and by reading this, you are obligated to give me credit were you to use parts of it in your work. If you don‘t agree with the contract, well, then don‘t use my work, and my contract won‘t apply to you.

All this indicates the following to us—for the most part, law is not something written in stone. Instead, a law is a human construct that changes and evolves. Even more importantly, the interpretation of a law is constantly changing and evolving. The only creator of law is the U.S. (or State) Congress, and the final word on interpreting any law rests with the U.S. (or State) Supreme Court. Until then, even the decision of a lower court judge can be challenged and overturned. In reality, most of the times the threat of a lawsuit, not the lawsuit itself, is what guides our behavior. Whether or not they are raw data or interpreted data can only be determined with finality in a court of law, and not even then.

Since determining what portion of our data is factual and what portion is interpreted, hence, protected by copyright, is so difficult, one school of thought believes that it is probably best to waive all the rights that one might have in one‘s data. This would be similar to putting one‘s work in the Public Domain. For an example of one such way, see the CC0 protocol.

Finally, while federal law ensures that data collected by federal agencies are made available freely to all for no more than the cost of reproducing the data, such guarantee doesn‘t apply to state and local agencies which can restrict access to data in many different ways usually through contracts disguised as licenses.