Scientific Data License

Saturday, October 20, 2007

Any intellectual activity is based on, among other things, data, and these data are composed of factual and non-factual parts. A "creative" activity is likely to have data that have more non-factual than factual content, while a "scientific" activity is likely to be just the reverse — more factual content and very little, if any, non-factual content. Scientific data such as observations and measurements (air temperature, spatial coordinates, surface reflectivity, or any kind of physical, chemical or biological properties) are all facts, and per the US Copyright Law, they can't be copyrighted. These facts, however, can be presented in creative ways, and those creative expressions can be copyrighted. For example, while physical locations depicted on a map can't be copyrighted, the way that map is colored to make it aesthetically pleasing or even more suitable to convey particular information can be copyrighted. Likewise, while the actual data in a database can't be copyrighted, the way those data are broken up and arranged (normalized) in a set of related tables in a relational database can be copyrighted.

Sir Isaac Newton, writing to physicist Robert Hooke in 1675, said "If I have seen a little further it is by standing on the shoulders of giants." The way of science is to stand on the shoulders of others, and this is reflected in the culture of attribution provided in every scientific publication that ends with a list of citations and a bibliography. Not citing a work on which one's research draws is a very serious ethical lapse in science. Unless the research forms the basis for a business enterprise, being cited is the only and sufficient reward for drawing upon one's research. With the advent of computer search and tracking, enumerating where and how many times one is cited in an Impact Factor has become a measure of the worth of one's research.

A license is a permission to do something without any reciprocal action required. If the terms of the license are broken, the only remedial action available is to stop doing what was permitted by the license. A contract, on the other hand, lays down reciprocal action required in return for doing something. Difference between a license and a contract: A license gives a permission to do something without requiring anything in return. Licenses are enforced under copyright law at the Federal level. A violation of the license can result in payment of damages that have already occurred and suspension of the license. The licensee can't do anything more than that has been licensed. A contract requires an obligation to be performed in return for the licensed good or service. Contracts are enforced under contract law that is interpreted at the State level.

The copyright law has become too complicated. While the Copyright Act of 1909 was 25 pages long, the current Act is more than ten times as long weighing in at almost 300 pages. The entire text of the Title 17 of the United States Code aka Copyright Act of 1976 can be downloaded as a PDF. For the law to be usable by common people who are not lawyers, it has to be understandable by such people, but, besides being long, the copyright law is also horribly opaque and complicated.

The workshop came to following conclusions —

We start from the basis that:

  • facts are free;
  • contracts cannot apply to facts; and
  • citation, being a norm of the scholarly method, is the absolute minimum obligation required of the licensee.

Hence, the desired legal protocol should:

  • Be legally accurate;
  • Since laws vary drastically from place to place, the only way for the license to be legally accurate is for it to be as simple as possible. The fewer moving parts it has, the less chance of it breaking;
  • The license should be applicable to multi-jurisdiction teams;
  • Have low transaction costs, that is, be simple for scientists to understand and implement; and
  • Facilitate interoperability at the technological, semantic, and legal levels;
  • The license should not impede commercial adoption and use.

Additional properties: (Principle of Least Harm) If the licensor makes a mistake in characterizing facts versus non-facts then the maximum available recourse would be providing attribution, which is the norm in science anyway.

Waive rights to the portion of data that are copyrightable, and ask for attribution.