File formats

What is a file format?

A file format describes how information is stored within a digital file. Although each file format is unique, different file formats exist for similar types of information (e.g. text can be stored with a plain text file as well as with a word file). The difference between file formats are situated at the following levels:

  • Simple vs complex formats: e.g. the .txt format is a very simple way of storing text, while a .docx file has more complex properties.
  • Open vs closed file formats: Closed (or proprietary) file formats are not open in a sense that they cannot be freely used. Often they are owned by companies or are patented. Open formats can be used and implemented by anyone.

Which formats to use?

The choice of which file formats to use for research data depends on:

  • discipline-specific standards and customs
  • planned data analyses
  • software availability/cost
  • hardware used – e.g. audio capture, fMRI scanner

Using a specific format can hold risks. For instance, using formats which can only be used within specific software makes the digital data vulnerable to obsolescence of the software. This can lead situations of being locked out of one's own data.

Long-term access

For long-term access, file formats are more likely to be accessible in the future if they have the following characteristics;

  • Non-proprietary (not protected by trademark, patent or copyright)
  • Open, documented standard
  • Common usage by research community
  • Standard representation (ASCII, Unicode)
  • Lossless compression (>< lossy compression)

Some examples of preferred file format choices include:

Type Preferred format Avoid
Text .docx or .odt .doc
Spreadsheet .csv .xls
Video mpeg4 quicktime
Audio .flac or .wav .mp3
Picture .tiff or .png .gif

File format conversion

Beware that converting data from one format to another can lead to problems of losing metadata or formatting. Therefore it is good practice to plan your choice of formats with long term access in the back of your mind.