FAIR data: what is it and why is it important
| Sieuwert van Otterloo |
FAIR data is a set of principles to make sure that any data that has been collected is stored properly. FAIR data was introduced for scientific data, but the principles are also useful for government data or company data. Any valuable data that is used by multiple organisations should be made FAIR.
History of FAIR data
FAIR data was introduced in 2016 through an article in the research journal Scientific Data. In this article, several researchers highlighted the problem of scientific data. A lot of scientific studies generate and use a lot of data, and this data cannot be included in the scientific articles that are supposed to contain the research outcomes. Even if it could be printed, printed information is not easy to process. As a result, it is often hard to use scientific results, continue the research or validate the research. As more and more research is data driven, this lack of data availability is becoming a major issue.
FAIR data is a set of principles that ensure that data is available for further research or use. It is an acronym, and stands for four aspects: Findability, Accessibility, Interoperability, Reuse. For each aspect there are multiple principles. Broadly speaking, the principles ensure that other people that need the data can find the data in the right format for further use.
FAIR data is not just interesting for researchers, but also for business and organisations that exchange data. Many organisations have problems in using data collected in earlier projects , or with exchanging information between organisations. Location data, installation data, time series data, financial and administrative data can all benefit from being easy to use.
Relations to open data and open source
FAIR data is not the same as open data and open source. All these initiatives aim to remove practical obstacles for using information. In the case of FAIR it is about compatibility issues, and not about legal issues.
Open data is aimed at making data publicly available free of cost. It removes financial obstacles for reusing data, and enables innovation.
Publishing data as open data does not make this data FAIR: if the data is not in the right format or descriptions are missing, the data might be open but can still be hard to use.
Currently, most FAIR data is research data and is also open. In the future, we can expect companies to also apply the principles to their commercial or closed data. The data will only be available to business partners, but in an accessible and interoperable format.
Metadata and open standards
Many of the FAIR principles depend on the availability of meta-data. Meta-data means the data ‘about the data’ and the word is used to refer to the explanation and documentation that you need to understand data. Take for example the image of a chart shown above. This data could be very useful, but only if you know what the ‘775’ means. Is is speed? Milligrams? Money? If so: What currency? Similarly, the 13:00 looks like a time-of-day, but without meta-data we cannot be sure. Broadly speaking: the main problem with data sharing is that the context/metadata is often lost in transit. FAIR data means data with the right metadata to make the data clear and useful.
FAIR-ness is concerned with open standards: open standards are required to make data accessible, since data in a closed or unknown standard is hard to use. One could say that FAIR is a much more elaborate explanation of the same concept as open standards. Many companies use open standard such as XML or Json to make data accessible. However just specifying such a generic standard is not enough to make data reusable. The other FAIR principles are also needed.
What are the FAIR principles?
The FAIR principles can be found at the Go-fair website. Each of the four fair main principles (Findable, Accessible, Interoperable and Reusable) has a few sub-clauses:
Findable means that the data itself and the metadata are easy to find for humans and computers:
- F1. (Meta)data are assigned a globally unique and persistent identifier
- F2. Data are described with rich metadata (defined by R1 below)
- F3. Metadata clearly and explicitly include the identifier of the data they describe
- F4. (Meta)data are registered or indexed in a searchable resource
Accessible means that standard protocols are used:
- A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
- A1.1 The protocol is open, free, and universally implementable
- A1.2 The protocol allows for an authentication and authorisation procedure, where necessary
- A2. Metadata are accessible, even when the data are no longer available
Interoperable means that it is easy to combine the data with existing data:
- I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
- I2. (Meta)data use vocabularies that follow FAIR principles
- I3. (Meta)data include qualified references to other (meta)data
Reusable means that the data can be used later on. It includes both good descriptions and clear licences.
- R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
- R1.1. (Meta)data are released with a clear and accessible data usage license
- R1.2. (Meta)data are associated with detailed provenance
- R1.3. (Meta)data meet domain-relevant community standards
There are two principles (A1.1 and R1.1) that are about licensing, so there is a link between open source and FAIR data. Most of the principles however deal with making sure that data is described in the right way.
Applications and next steps
We expect that FAIR-ness becomes a standard requirement for certain IT projects, especially in research projects or publicly funded projects. This would be a good step, since there is no reason for making data usage harder than it has to be. A lot of double work can be avoided if data is stored in a reusable version. It is likely that FAIRness will become an important principle in enterprise architecture. Smaller organisations should strive to make all external data exchanges FAIR. Large enterprises probably need FAIR principles for internal data communication. The Go Fair Foundation is a Dutch not-for-profit organisation that is working on various FAIR projects, including market development.
FAIR has a complicated relationship with privacy. GDPR has a right of data portability (see this overview of data subjects’ rights) between social networks that basically requires FAIR data. On the other hand GDPR forbids sharing of data between organisation without a clear purpose. We expect FAIR data principles to be also applied for personal data, however such data is unlikely to become freely available.
FAIR is already normal within the life sciences. We expect other sectors to follow. The FAIR principles are domain-independent, so they can be applied to any domain: from log-files to CRM and financial data to map data and images. There are many interesting links to information security (log data FAIRification), but also to location data, financial data, bookkeeping and perhaps even accounting standards and Internet of Things.
Img src: Marcus Winkler via Unsplash.
Dr. Sieuwert van Otterloo is a court-certified IT expert with interests in agile, security, software research and IT-contracts.