Jump to Navigation

Category: Open Data

Repository of the Week - the Data Hub

The Data Hub describes itself as "the easy way to get, use and share data".

The Data Hub is a community driven catalogue of datasets on the Internet. It uses open-source data cataloguing software CKAN, which provides each dataset record with fields for descriptions, formats, ownership, access and subject areas, among others.

Most of the data indexed is open data, which means it is openly licensed, and free to use.

On the site, you can:

  • Find data - the Hub contains 3840 datasets that can be viewed or downloaded
  • Share data - sign up to add your own datasets

Datasets can also be located under groups, such as Linking Open Data, which contains 81 datasets and Bibliographic Data, which has 77 datasets.

In some cases, the Data Hub can provide data storage, and basic visualisation tools.

Visit the Data Hub at - http://thedatahub.org/

Repository of the Week - The Atlas of Living Australia

AtlasLivingAustralia.jpgThe Atlas of Living Australia (Atlas) contains information on all the known species in Australia aggregated from a wide range of data providers: museums, herbaria, community groups, government departments, individuals and universities. The Atlas is the Australian node of the Global Biodiversity Information Facility (GBIF), and since 2001 the GBIF has been encouraging free and open access to biodiversity data through global online networks.

The Atlas of Living Australia can be used to:

There are 370 datasets available in the Atlas, and the licensing tags make it clear which data can be used and how. The site also provides extensive explanatory information and help pages, including overviews on how data are integrated and described.

Search for records in the Atlas or browse the site today at http://www.ala.org.au/.

The Atlas of Living Australia is an Australian Government Initiative and is licensed under the Creative Commons Attribution 3.0 Australia License.

Data - it's out there

infographic.jpgWith the current global emphasis on sharing research data with the public, you might wonder - what can the public actually do with data? How can they access it, understand it, or apply it? Why might it be of interest to them, or you?

The term 'data' refers to an item of information, or, items of information considered collectively for reference or analysis (OED). 'Data' applies across disciplines and could refer to statistics on the publication of comic books, DNA sequencing or, marine life in the Arctic - it can refer to countless sets of information.

The purpose of this post then is to enlighten readers about interesting and engaging ways that data are currently being presented and utilised on the web to inform the public about current issues, and other information available to them. For starters, did you know that everyone contributes to the growing wealth of digital data, whether you work in research or not? Take a look at this infographic from Mashable - every owner of a mobile phone, email address or iTunes account produces data in the digital age. You can also check out the impact of real-time tweets across the world via A World of Tweets.

The UK's newspaper The Guardian frequently experiments with public data and uses it to support current news stories. It has a dedicated 'Datablog' with the sole purpose of transforming data into useful and easily understandable formats about key issues important to the UK and globally. Some examples include: Water leakages: which company is the worst?; The world's top 100 airports: listed, ranked and mapped; Freedom of Information request 2011: how many were there and which ones were turned down?; or What does 15 years of baby name data tell us about modern Britain?.
The Guardian sites all sources of data and makes data freely available to every reader.

Other sites and services, and particularly research centers, make data available for download to use in your own way, or create visual representations for the reader or researcher. Visualisation and infographics are the terms generally used to describe this process and there are many tools available online that allow you to work with data in this way. A few examples of such tools include: Piktochart, Gephi, Tableau public and Taxgedo.

So where can you get public data?

Apart from researchers being increasingly required to share data, many governments are also opening up data for the public (see http://data.gov.uk, http://www.data.gov/ and http://data.gov.au). You can also try datacatalogs.org - a list of open data throughout the world; The Data Hub - where you can find, share and collaborate on data; Google Public Data; or, Freebase. But there are many places to acquire data if you do a simple online search or investigate your University's academic research centers and faculty websites.

Who is talking about data in the public realm?

Beyond the academic sphere, there are large communities online discussing data and its use in the public, as well as foundations geared towards data investigation. For example, the Knight Foundation in the US has organised the Civic Data Challenge whereby citizens of the US (aged thirteen and above) are invited to access, analyse, interpret and visualise data from Civic Health CPS datasets. In addition, there are many interested individuals proactively investigating, sharing and blogging about data. Here are a few sites worth checking and some blogs worth following: Visual.ly; Well-formed Data; Daily Infographic; Visualising Data; Visualising.org; The Guardian's Datablog; and a personal favourite - Information is Beautiful.

In this video, David McCandless of Information is Beautiful illustrates the importance (and perhaps infobeatiful_med.gifplayfulness) of contextualising research data and information through creative visual means.


NCBI - Meeting the challenge

NCBI_Logo.jpg

The National Center for Biotechnology Information (NCBI) is supportive of open data and sharing data to further collaboration and research in the biosciences.

A challenge that NCBI is faced with today, is to transform the wealth of data emerging from laboratories worldwide into knowledge which will "lead to a better understanding of biological processes underlying both health and disease."

NCBI disseminates its resources to research and medical communities with the view to integrate data and shape more meaningful views of this information. This challenge has been met through the development of a large number of databases and shared data available from the NCBI site.

Two datasets of note include,GenBank and dbGap:

GenBank

GenBank database is maintained by the National Institutes of Health and made available through NCBI. The database stores all known public DNA sequences. Data are submitted to GenBank from individual scientists and science centres involved with the Human Genome Project, and are also annotated and labelled by NCBI investigators.

dbGaP

dbGap is the database of Genotypes and Phenotypes (dbGaP). It was developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype. dbGap has two levels of access - open and controlled. The open-access data can be browsed online or downloaded.

NCBI also provides a variety of tools to use and explore the data, as well as a range of educational materials, how-to guides and training resources.