Jump to Navigation

The tide is high

Blogs > eScholarship: research data, publishing, impact ...

hlg.jpg

In the US, 2.5 petabytes of data are stored annually just for mammograms.

The volume of earth-observation data from the European Space Agency's satellites passed three petabytes in 2007. The projection for 2020? A seven-fold rise.

By 2020, the Square Kilometre Array radio telescope (still on the drawing board) could generate a petabyte of data every 20 seconds.

These astounding numbers were produced by the High Level Expert Group on Scientific Data in their 2010 submission to the European Commission, Riding the wave: How Europe can gain from the rising tide of scientific data.

The key questions the submission raises are:

  • How will we preserve all this data?
  • How will we protect its integrity and authenticity?
  • How will we convey the context and provenance of data?
  • How will we protect the privacy of individuals linked to the data?
  • And most importantly - how will we pay for it?

As the submission states:

"Many of these issues involve trust. Data-intensive science operates at a distance and in a distributed way, often among people who have never met, never spoken, and, sometimes, never communicated directly in any form whatsoever.

They must share results, opinions and data as if they were in the same room.

But in truth, they have no real way of knowing for sure if, on the other end of the line, they will find man or machine, collaborator or competitor, reliable partner or con-artist, careful archivist or data slob."