Data Chain

Friday, October 21, 2011

The scientific data chain is long and convoluted. Stretching from, perhaps, raw data that come from a sensor on the one end to highly processed, interpreted, “cooked” analysis that is published in a scientific journal, data go through many steps and transformations. Each step creates new insights and new information, which, in turn, becomes data for the next step of analysis.

Not every step in the chain may produce data suitable for general consumption. Not every level of data may be usable without specialized training, software, or understanding. Some interim manifestations may simply be temporary staging points for the next step of processing and analysis in the chain.

While the web is the sole medium of ubiquitous data transfer and access, publishing data on the web may not make sense for every step in the data chain. And, even when it does, web publication of data doesn’t preclude or co-opt other steps and manifestations in the data chain. Those wanting data in more raw a state than available via a web app will still approach the scientist or do whatever they have done for centuries. Those wanting more analyzed versions of information will seek publications that make sense of the data. There will be, however, a large number of those who will take the data published via the web mechanism and make it there own. Perhaps they will use it as input to their own models, or make a web application which transforms the data and presents it differently, or import it in their spreadsheets or statistical analysis software, or mash it up with other data streams.

Nevertheless, understanding the data chain is a fundamental prerequisite to making assertions that “data should be free” or “data should be archived.” Below are a few examples of the data chain.

Seismology Data Chain

From voltage variations in a sensor stuck in the ground to wiggles on the scientist’s computer screen, and to an RSS feed of earthquake events, seismology data chain is indeed long and convoluted.

Tree Allometry Data Chain

A shorter workflow for establishing allometric relationships, the tree allometry data chain is, nevertheless, not straightforward.

Photosynthetic Light Data Chain

Determining the effect of light on crops and its conversion into biomass requires light measurements in field and validating them against farmer reported yields. Learn more about the photosynthetic light data chain.

Stable Isoptope Biogeochemistry Data Chain

Measuring the extent to which the biochemistry tends to concentrate C12 instead of C13 inside cyanobacteria fossils gives us an idea of the metabolic strategy, aka the photosynthetic strategy, of those organism. Check out the stable isotope biogeochemistry data chain.