Sharing data

In the context of RDM, data sharing refers to the practice of publicly sharing data from completed (parts of) research, i.e. outside your project or research team. It is different from exchanging data with collaborators while your research is active.

Why share data?

Making your finalised data (or snapshots of your data) available to others has a number of benefits, including:

  • Increasing transparency of your research
  • Accelerating scientific discovery by enabling new (types of) research
  • Enhancing the visibility and impact of your research
  • Creating new opportunities for collaboration

More and more publishers and journals, funders, and institutions (including Ghent University) expect research data - especially data resulting from publicly funded research - to be shared where possible.

Ghent University's RDM policy framework urges researchers to make relevant research data openly available in a timely manner, unless there are legitimate reasons for (temporarily) restricting data sharing.

Degrees of data sharing

Sharing research data is not an all-or-nothing choice, but a spectrum. It ranges from making data fully open on one end, to keeping them fully closed on the other, with various possible forms of restricted/controlled access in-between.

Data that can be 'freely used, modified and shared by anyone for any purpose' (opendefinition.org).

Data that are temporarily under embargo, or that cannot be shared at all.

Data that are not shared in a fully open way, but made available under more restricted access and use conditions. This means that there are limits on who can access and use the data, how, and/or for what purpose.

 

'As open as possible, as closed as necessary'

Which level of sharing you should choose largely depends on what is appropriate given the nature of your data, and on how well you planned for data sharing (e.g. so that you have the right permissions/consent in place).

In any case, there is a growing consensus among research funders, institutions and other stakeholders that access to research data should be 'as open as possible, as closed as necessary'. This principle is included in the European Code of Conduct for Research Integrity (2017), for example, which Ghent University subscribes to.

Restrictions on data sharing

Research data cannot always be shared (immediately) in a fully open way. Sometimes they can only be made available under more restricted conditions and/or after an embargo period, or – in some circumstances – not even at all.

Possible reasons for restricting the sharing of data are:

The data constitute or contain personal data, i.e. any information about a (directly or indirectly) identified or identifiable living natural person.

You otherwise have a duty or agreed to keep the data confidential (e.g. by signing a non-disclosure agreement, or an agreement containing a confidentiality clause).

The data could potentially cause harm (e.g. to endangered species, vulnerable sites or groups, public health, national security...) if made public.

The data are not generated in the course of your own research project, but are supplied to you by another party (e.g. a commercial provider, government agency...).

Research data  or rather the form in which they are expressed  may in certain circumstances be protected by copyright and/or database right. For example, data captured in an original textual or audio-visual form, or data creatively selected (from a larger whole), processed, and structured can be protected by copyright. Copying and sharing protected research data in principle requires permission from all rights owners.

The research data may constitute a patentable invention, or contain commercially valuable knowhow. Sharing them (prematurely) could jeopardize your valorization efforts.

Research funders, institutions and reputable journals/publishers with data sharing mandates will normally allow you to opt out of their open data requirements for legitimate reasons such as the above. If you do so, you will often be expected to provide proper justification (e.g. in your Data Management Plan, or in a data availability statement included in your published article).

Ways of data sharing

In principle, there are various ways of sharing data beyond your project or research team, each with their pros and cons. For example, you can:

  • Email data upon request
  • Make them available via a personal or project website
  • Add them as supplementary materials to a journal article
  • Share data via a data repository/data archive

Advantages of sharing via a trusted data repository

Generally speaking, it is preferable not to adopt a DIY approach, but to share research data via a data repository. Even better is to share via a trusted, domain-specific data repository – if one is available for your research area. Using trusted repositories for data sharing has several advantages, such as:

  • They take away the burden of handling data reuse queries and managing data access.  
  • They offer more guarantees in terms of sustainable access to data.
  • They make your data discoverable and citable.
  • They can go a long way towards making your data FAIR (Findable, Accessible, Interoperable and Reusable).

Licensing data

When making research data publicly available, it is important to let potential users know in advance what they are allowed to do with those data. Licensing is an effective way to communicate such permissions.

A trusted data repository will normally apply a license to any dataset it holds, which you typically select (from a list of options) when depositing data.

Good practice is to apply a standard and open license for open research data, as it ensures legal interoperability and the widest possible reuse.

Among the standard licenses commonly used for research data is the suite of Creative Commons (CC) licenses, which offer different levels of permission. CC licenses conformant with the “Open Definition” are:

  • Public Domain Dedication (CC0 1.0): waives copyright and related rights (e.g. databases).  
  • Attribution (CC-BY-4.0): gives others maximum freedom to reuse (i.e. copy, redistribute, adapt) your work, provided they give appropriate credit.
  • Attribution Share-Alike (CC-BY-SA-4.0): same as CC-BY-4.0, but requires redistribution of derivative works under this same license. 
Need help selecting an appropriate standard license? Check out this EUDAT license selector tool.

Note that in order to grant a license, you need to be the rights holder in the data (or have the permission/right to act on their behalf). For open data in which no copyright or related rights exist, the Public Domain Mark can be used. 

For data requiring access restrictions, a standard license is usually not appropriate. In such cases a bespoke license will be needed instead (e.g. an ‘end user license’ or ‘user agreement’ as implemented by a trusted data repository) to make the data available.

Citing data

Research data can be cited in the same way as publications. In fact, the European Code of Conduct for Research Integrity (2017) stipulates that research data should be acknowledged as legitimate and citable products of research.

 

Making your data citable enables you to claim and receive credit for producing high-quality datasets, and enhances the potential impact of your research. When you reuse data from someone else, you should in turn also cite these in your publications to contribute to a culture of data citation.

Data citation requires data to have a persistent identifier (PID), such as a DOI, PURL, or Handle.

What is a PID?

A PID uniquely identifies a digital object and ensures that it can always be found, even if its web address (URL) changes. A central registry ensures that following the PID will point you to the digital object’s current location.

You can typically get a PID for your datasets by depositing them in a trusted data repository.

Digital Object Identifier (DOI)

The DOI is a commonly used identifier for research datasets. It always comprises:

  • A prefix: ‘10.’ + 4 or more numbers; identifies the organisation that registered the DOI at DataCite.
  • A suffix: identifies the dataset.

Appending a DOI to the resolver system http://dx.doi.org/ takes you to the location of the digital object in question. An example for a dataset held at the Dryad repository is: https://doi.org/10.5061/dryad.4h16331.

Want to know more? Check out this video on PIDs and data citation.

A data citation should contain the following minimum elements:

  • Author (creator of the dataset)
  • Publication date
  • Title
  • Version (if applicable)
  • Publisher (the organisation hosting/distributing the dataset, i.e. the repository)
  • Identifier

Citations can contain additional elements such as resource type and location (a persistent URL for the dataset, e.g. DOI + resolver service). Data repositories will usually suggest the appropriate data citation format for the datasets they hold.

Example

Kavelaars, Marwa M.; Lens, Luc; Müller, Wendt (2019), Data from: Sharing the burden: on the division of parental care and vocalizations during incubation, Dryad, Dataset, https://doi.org/10.5061/dryad.4h16331.

More information