Preserving data

Preserving research data refers to the practice of keeping data available and usable in the longer term, beyond the end of your research project.

Why preserve data?

The main reasons for data preservation are:

  • Ensuring that your research can be verified and reproduced
  • Maintaining data for future reuse, e.g. for further research or teaching

Increasingly, funders, publishers, and institutions (including Ghent University) are requiring (certain) research data to be retained for a specified period and/or for a specific purpose.

Ghent University's RDM policy framework expects 'relevant' research data to be preserved for a minimum of 5 years. In the first place, this means data that are reasonably needed to verify and reproduce published scientific claims. Data with high reuse potential are also relevant to keep for the longer term.

Preserving vs. storing data

Preserving data from completed research is different from storing and backing up data files while your research is still ongoing. The latter typically involves data that are mutable; the former concerns data (or milestone versions of data) that are ‘frozen’ and not in active use.

Long-term preservation requires appropriate actions to prevent data from becoming unavailable and unusable over time, for example because of:  

  • Outdated software or hardware
  • Storage media degradation
  • A lack of sufficient descriptive and contextual information to keep data understandable

In other words, data preservation involves more than simply not deleting the data files created and stored in the course of your research project!

What to keep?

Not all (versions or parts of) research data can or have to be kept indefinitely.

Maintaining data in a usable form for the longer term takes effort and has a considerable cost. Selecting which (parts of) data to keep, and for how long, is therefore an essential component of data preservation.

As a researcher you have a key role in deciding what to retain and what not, as you know your data best. Such decisions may depend on factors such as:

  • The type of data involved
  • The norms in your discipline
  • Whether you are keeping data for potential future reuse, for verification, or for other purposes. Depending on the purpose, you may need to keep the raw data or data in a more processed form (or perhaps you want to preserve different forms of the same dataset for different purposes, and for different retention periods).

Appraisal and selection of research data is still an evolving field, but some generic, high-level criteria are emerging to guide decisions on what to keep. Common criteria for keeping data include:

  • Legal or ethical requirements to keep (certain) data for a specified retention period (e.g. for clinical trials data)
  • Funder, institutional or publisher policies
  • High potential reuse value of the data
  • Great scientific, historical, or cultural significance of the data
  • The data are unique and/or cannot easily be re-created.  
  • The benefits outweigh the costs of data preservation.

The other side of the picture is that there can be valid reasons for disposing of (parts/versions of) data after finishing your research (e.g. duplicate copies, superseded versions, …) or later on, after expiration of the applicable retention period.

Without associated information, research data quickly become useless. For all data selected for preservation, you should therefore keep a data package consisting of:  

  • The research data files themselves
  • The necessary accompanying documentation and metadata to ensure that those data remain findable, comprehensible, and (re)usable

 

It is also important to document and justify your choices to keep or remove data, for example in your Data Management Plan.

Where to keep data?

Research data and documentation selected for retention should be kept in a suitable location and in a secure manner to ensure that they remain available and usable beyond the end of your project, with appropriate access rights.

Where appropriate, depositing data in an established, trustworthy research data repository (sometimes also called a data center, data archive or scientific database) is generally the preferred option. This has the added benefit of at the same time allowing you to make your data available to others.

Data repository types

There are different kinds of data repository, including:

  • General-purpose repositories: accept a wide range of data types (and sometimes other research outputs as well) from all disciplines. Examples are: 
  • Domain-specific repositories: focus on specific data types or data from specific research domains.
  • Institutional repositories: hold research data outputs from a particular research institution.
Need to find a data repository? An international, searchable register of existing research data repositories is available at re3data.org. You can also search for repositories/databases via FAIRsharing.org.

Data repositories are mostly suitable for research data that can be publicly shared – although that doesn’t necessarily have to mean sharing in a fully open way (see degrees of data sharing). Some data repositories can cater for data that cannot be made (immediately) available under full open access, for example by allowing temporary embargoes, or by offering more restricted or controlled levels of access.

However, sometimes it may not be possible or not appropriate to deposit data in an external repository, e.g. for legal, ethical, contractual, practical, or other reasons. In such cases, research data selected for preservation will need to be kept in-house.

There are hundreds of existing data repositories or archives to choose from. Keep in mind, however, that not all repositories are created equal. Some repositories focus more on disseminating and making your data visible than on ensuring their preservation in the long term.

Basic tips

  • Check the list of repositories recommended by your journal/publisher. Many journals and publishers with data sharing policies recommend, and for some data types even require, the use of specific repositories. For example, see the list of recommended repositories from Springer Nature or PLOS.
  • Deposit data in a broadly recognised domain-specific repository if one is available for your specific domain or data type. Trusted domain repositories might not accept all individual datasets, however: they tend to focus on high-quality data with potential for reuse.
  • Select a general-purpose repository, such as Zenodo or Open Science Framework, if no established repository exists for your research domain.

Additional considerations

  • Does the repository match your data needs (e.g. in terms of accepted data types and formats, access levels, licenses, legal requirements for data protection…)?
  • Does it charge for its services?
  • Does it have an explicit commitment to long-term preservation?
  • Does it provide a landing page for each dataset, with publicly available metadata?
  • Does it assign persistent and unique identifiers?
  • Does it provide clarity about access levels and conditions?
  • Does it provide information about usage licenses?
  • Is it certified?
  • Is it community-based, or a commercial solution?
Looking for information about a specific repository? Check its characteristics via the re3data.org and FAIRsharing.org registries, and/or the website of the repository itself.

Non-digital research data and materials

RDM mostly focuses on digital research data. However, you may also collect analogue research data (e.g. surveys on paper…) as part of your research, or other non-digital materials that strictly speaking do not constitute research data (e.g. samples). Sometimes such non-digital data and materials also need to be retained after the end of your project.

  • Consider whether digitizing the data is an option (e.g. this may be worthwhile for data that will be kept permanently for future reuse).
  • If not, your Faculty, Department, research group, lab etc. may offer facilities to retain your data for verification or legal compliance purposes for a finite retention period. An example is the Faculty of Psychology and Educational Sciences’ Archive for Research Material.
  • Contact rdm.support@ugent.be in case you have paper research data that could merit permanent preservation for future reuse purposes. 

There are also repositories for non-digital materials you can make use of. Examples include:  

Preparing data for preservation

There are certain preconditions for maintaining data in a usable form over time, such as:

Keeping data safe for the future therefore requires some preparation and effort on your part.

Established domain-specific repositories will usually only accept data that meet their standards for file formats, documentation and metadata, data quality… If you plan to use a data repository or archive, check in advance what the requirements are, so you can adequately prepare your data for deposit.

Preserving research data after the end of your project is made significantly easier if you properly plan for data management from the outset, and implement good RDM practices during the active research phase.

More information