Skip to main content

Research Data Management: Storing

Guidance and support to staff, researchers and students at the University of Southampton

Storing, retaining & destroying

Research data will typically be stored, managed and shared during a project, but may also need to be retained longer for a variety of reasons. This guidance will help you understand some of the factors that influence the length of the retention period for data, so you can make the correct decisions when planning, selecting storage and depositing data.

Storing data costs money, as shown by the costing tool provided by the UK Data Archive. Extended data retention periods may have some additional costs that will impact your project directly or they may be covered by the full economic costing included in your proposal.  Invariably data retention periods will outlive projects, so you may want to consider how this will be funded as part of your data management plan and/or in your proposal – check with your funder’s guidelines. In some cases the costs and benefits of data storage and retention decisions may need to be assessed and justified for funding purposes. The KRDS (Keeping Research Data Safe) Benefits Analysis Toolkit may be used for this.

More information about Storing Sensitive Data

See more information about retention.

See more information on destruction of data.

Requesting a DOI

We can register a DOI for your dataset through DataCite - this gives a persistent link and can make it easier to cite.

For more details see our DOI for data page.

 

Where to store your data

It is not a requirement that all research data must be held within the University, it is recommended that you keep copies of your data on the university's networked storage. If your research data contains personal information from living indidivuals, you must store you data securely within the University's systems - see Research Data & GDPR for more information.

There are different types of networked storage:

  • Home Filestore
    • Also known as My Documents or H drive on windows machines
    • Backed up every 2 hours (kept for 30 days)
    • Four hourly snapshots (kept for 90 days)
    • To access network storage from Linux machines see http://linuxdesktops.soton.ac.uk/mount.html
  • OneDrive through Office 365
    • Cloud based storage
    • 5TB of Storage, max file size 10GB
    • Restrictions on filenames
    • All data held in secure centres which are within the EEA
    • Do NOT use personal OneDrive
  • Shared Departmental Filestore
    The J:\ drive is a shared resource for work-related data and files that do not need to be held as private and confidential to the individual.
  • Research Filestore
    iSolutions do provide secure storage for active research data in Research Filestore. If you would like to request research filestore please contact ServiceLine in the first instance.  Please note that requests greater than 1TB will require a business case and allocations will be made on a case by case basis
  • High Performance Computing (HPC)
    Any researcher who is limited by the computing capability of their desktop PC may find HPC beneficial. Contact your BRM or email serviceline if you think you may need to use this facility.

Related University Policies

University of Southampton Research Data Management Policy

Research Data Storage (University of Southampton Service Statement) - available from iSolutions KnowledgeBase

At the end of the research project, in a timely manner and in accordance with any funding requirements, research data should be deposited in an appropriate data repository. The best repository to choose for your research data will be a national data centre or discipline specialist data repository because they have the expertise and resources to deal with particular types of data.

Depositing @ Soton

You can deposit small datasets (gigabytes in size) in ePrints Soton for long-term storage.  If your data is a terrabyte or over in size, contact researchdata@soton.ac.uk to discuss how to deposit your data.

Data can be open or on request depending on the nature of the data.

You can request a DOI for your dataset to include in the funder acknowledgement and data access statement in your publications and we can organise this.  Ideally you should request the DOI prior to submitting your manuscript.  See DOI for Data.

Depositing elsewhere

Where possible we recommend using discipline-specific data repositories such as the Archeology Data Service, you can find one for your subject via Re3data.org. Some funders expect data to be deposited in specific data centres e.g. ESRC and NERC support dedicated data centres. Also consider whether any agreements with your collaborators include requirements for data deposit. If you have an option to deposit in a repository associated with your funder, or your publication will pays for deposit in Dryad it is worthwhile considering this.

If you deposit your data elsewhere, please create a dataset catalogue entry in Pure with a link to where the data is stored.

You may be able to publish your data in a data journal. This is a growing and fast moving area.  Some publishers are now requiring the deposit of supporting data with the article, while others require that a link to the data is provided.  You will need to take this into account when considering how long you will need to retain the data and may influence your choice of storage location.

 

Deciding what data to keep is not always easy, as multiple factors and interests may bear on whether data should be preserved and the means that should be used. The Digital Curation Centre has published a Checklist for appraising research data that can help you approach this question in a considered, systematic fashion.

The University's RDM policy states that 'significant' data should be kept.  But how do you tell what is significant? One way of defining significant is if it falls into one of these categories:

  • Data that directly underpin research findings which have been or will be published, and that must be retained in order that these findings can be independently substantiated;
  • Data that have enduring value independent of any published findings. This may be value to yourself, in view of possible future research or the exploitation of any IPR, to other researchers, who might usefully interrogate them to generate further insights, or to other stakeholders such as industrial collaborators for whom the data may hold commercial interest
  • Data that must be managed according to legal, ethical or contractual obligations irrespective of any research value they possess. This may affect what can be retained under what conditions, and what must be destroyed.

Selection criteria

These are some of the criteria you should apply when going through data appraisal and selection:

  • Do the data directly support research findings that have been or will be published? 
  • What are the funder's or sponsor's requirements? Many funders require the preservation and sharing of key data underpinning research publications for set periods after the completion of the research. Check any relevant policies and contract terms;
  • What are the publisher's requirements? Some publishers, e.g. the Nature Publishing Group, require authors to make data available as a condition of publishing;
  • Do the data have long-term research value? Would you use them in future research? Would other researchers be able to use them?
  • What is the likely cost of retaining the data in relation to the cost of recreating the data or the impact of losing the data? The results of simulations may be easily reproducible; experimental data may be reproducible cheaply or only at great cost, depending on the nature of the experiment; time series data or survey findings are by their nature irreplaceable, and may be highly valuable;
  • Do you have permission to keep/publish the data? It may not be possible to retain or publish secondary data obtained under licence;
  • Are you legally entitled to keep all the data, or publish or otherwise process them? The Data Protection Act, for example, and the scope of any consents for collection and processing of personal data, might affect what data can be kept, and what can be done with them;
  • Can you afford to keep the data? Can the cost of archiving the data be recovered from the sponsor? The Research Councils as well as some other funders accept that data management costs may be recovered through grants;
  • Do you have space to keep the data? What preservation resources are available to you?

What to destroy and how to do it

It is important that, as well as planning for the curation of your data, you give consideration to how it will be destroyed where this is required for legal or other reasons. Guidance on when and who authorises the destruction of research data is covered in our section on Retention Periods and in the University Research Data Management Policy.

See more information on destruction of data.

Other guides on selection

 

The largest share of costs for data are incurred in preparation and ingest to the selected storage service, as shown by the costing tool provided by the UK Data Archive. 

Extended data retention periods may have some additional costs that will impact your project directly or they may be covered by the full economic costing included in your proposal.  Invariably data retention periods will outlive projects, so you may want to consider how this will be funded as part of your data management plan and/or in your proposal – check with your funder’s guidelines. Funders usually will only pay for costs incurred during a project so archival storage costs will have to be invoiced during the grant when data is deposited rather than a rolling annual cost.

Over time costs will be incurred for storage, typically based on the volume of data stored for a given retention period, and for additional services, for example active data management such as reformatting to counter possible format obsolescence. The latter is now regarded as less of a problem for popular formats, but may need to be considered for specialised data formats.

In some cases the costs and benefits of data storage and retention decisions may need to be assessed and justified for funding purposes. The KRDS (Keeping Research Data Safe) Benefits Analysis Toolkit may be used for this.

Useful Links

Loading ...

Acknowledgements

Thanks to the University of Reading for the list of selection criteria.