05 April 2013

Retrieving data day queries

Perhaps the most famous data retrieval case in the history of science comes from sixteenth century orbital mechanics. Copernicus had laid the foundations for a viable heliocentric system; Kepler stood ready to finalise it. Between the two, both problem and solution, lay the mysteries of Mars: "the wanderer planet". The data which Kepler needed already existed, in a database of naked eye observations painstakingly constructed over two decades by Danish philosopher Tycho Brahe.
The problem was twofold. Brahe had nailed his colours to a mixed system at odds with that of Copernicus; and his data were his claim to posterity. He employed Kepler as an assistant, but jealously guarded access to the full observational data set.
Kepler did, eventually, gain access to the data. It wasn’t easy, nor always amicable (though allegations that he murdered Brahe to achieve it have been discounted), but it was done. He still had to learn how to retrieve it productively, but six years of mining and analysis finally bridged the gap to produce a final, successful, validated model.
Things have changed almost unrecognisably over the four or five centuries since Copernicus, Kepler and Brahe, but some features recognisably remain amid the new. Investment in research is balanced against the advantages of shared access. Boundaries, proprietary or otherwise, remain between researchers and data repositories. Murder and less extreme espionage methods may be rare (though not unheard of) as means of gaining access to data stores, but Kepler would no doubt recognise in essence the processes of negotiation and persuasion which allow those boundaries to be permeated.
The biggest early twenty first century data retrieval issue, however, is a different one. Acquisition in large quantity is becoming ever easier. Storage is, in relative terms, becoming cheaper. The headache often becomes how to ensure that one retrieves the right data for particular purposes from the ever ballooning volumes which are thus becoming available.
And then there is the problem of storage format obsolescence. Unlikely as it may seem, digital information which is by definition recent and (you might think) ought to be more easily accessible, and more carefully curated, can sometimes be harder to reach than older analogue stores. [More...]

No comments: