You are here: Home > Deposit > Information about depositing data > File formats

File formats

Preferred formats are file formats of which DANS is confident that they will offer the best long-term guarantees in terms of usability, accessibility and sustainability. Depositing research data in preferred formats will always be accepted by DANS.
Acceptable formats are file formats that are widely used in addition to the preferred formats, and which will be moderately to reasonably usable, accessible and robust in the long term. DANS favours the use of preferred formats, but acceptable formats will in most cases also be allowed.

As a general guideline, DANS believes that the file formats best suited for long-term sustainability and accessibility:
• Are frequently used
• Have open specifications
• Are independent of specific software, developers or vendors

In practice, it is not always possible to use formats which satisfy all of these criteria.


If your data are stored in other formats than those mentioned below, please contact DANS at .

Type
  • Preferred format(s)
  • Acceptable format(s)
Text documents
  • PDF/A (.pdf)
  • ODT (.odt)
  • MS Word (.doc, .docx)
  • RTF (.rtf)
  • PDF (.pdf)
Plain text
  • Unicode text (.txt)
  • Non-Unicode text (.txt)
Markup language
  • XML (.xml)
  • HTML (.html)
  • Related files: .css, .xslt, .js, .es
  • SGML (.sgml)
Spreadsheets
  • ODS (.ods)
  • CSV (.csv)
  • MS Excel (.xls, .xlsx)
  • PDF/A (.pdf)
  • OOXML (.docx, .docm)
Databases
  • SQL (.sql)
  • SIARD (.siard)
  • DB tables (.csv)
  • MS Access (.mdb, .accdb) (v. 2000 or later)
  • dBase (.dbf)
  • HDF5 (.hdf5, .he5, .h5)
Statistical data
  • SPSS Portable (.por)
  • SPSS (.sav)
  • STATA (.dta)
  • DDI (.xml)
  • data (.csv) + setup (.txt)
  • SAS (.7dat; .sd2; .tpt)
  • R (* under examination)
Raster images
  • JPEG (.jpg, .jpeg)
  • TIFF (.tif, .tiff)
  • PNG (.png)
  • JPEG 2000 (.jp2)
  • DICOM (.dcm) (by mutual agreement)
Vector images
  • SVG (.svg)
  • Illustrator (.ai)
  • EPS (.eps)
Audio
Video
Computer Aided Design (CAD)
  • AutoCAD DXF v. R12 (.dxf)
  • AutoCAD other versions (.dwg, .dxf)
Geographical Information (GIS)
  • GML (.gml)
  • MIF/MID (.mif/.mid)
  • ESRI Shapefiles (.shp & related files)
  • MapInfo (.tab & related files)
  • KML (.kml)
Geo referenced images
  • GeoTIFF (.tif, .tiff)
  • TIFF World File (.tfw & .tif)
Raster GIS
  • ASCII GRID (.asc, .txt)
  • ESRI GRID (.grd & related files)
3D
  • WaveFront Object (.obj)
  • X3D (.x3d)
  • COLLADA (.dae)
  • Autodesk FBX (.fbx)
RDF
  • W3C standards
Computer Assisted Qualitative Data Analysis (CAQDAS)
  • Formats used in application, processed according to each individual file’s data type
  • Application’s export formats (ATLAS.TI copy bundle; NVIVO export project; …)
  • QuDEX

 Digital data are stored in file formats, which are often standard software formats. The software and file format selected will usually depend on the user’s primary purpose.

To create a table, for instance, spreadsheet software will be used more often than a word processor. This is because data tables require specific properties which are better supported by specialized software. This may include the ability to sort data, to use formulas, to set up a filter, and so on. If such information is stored from a spreadsheet application the user may expect the file format to preserve these properties, or ‘significant characteristics’. If the table is created using a word processor it is less likely for the software to support these properties. The word processing application, on the other hand, will be more suitable for formatting an article, for instance using a functional table of contents and page numbers. Such features will not be supported by the spreadsheet application.

When information is stored from a software program, it is usually saved in that program’s standard file format. This is, however, no guarantee that in the future the file contents can be used or displayed in the way that was intended when the file was created. Formats may, for instance, be dependent on particular software. Software can become obsolete or only support certain versions of formats. It is also possible that specific format properties only work in the software used, or even only in one specific version of this software. Files may also be dependent on the use of expensive or exclusive software that not just anyone can access.

To preclude the risk of obsolescence and ensure the accessibility and sustainability of important file properties, a number of measures can be taken. One of these measures is to use file formats that have a high probability of remaining useful for many years.