Skip to Main Content

Research Data Management: Sharing

Guidance and support to staff, researchers and students at the University of Southampton

Sharing Data after the research

Data-sharing is encouraged by all UK Funding Councils, the University's Research Data Management Policy and, for COVID-related research, the statement on 'Sharing research data and findings relevant to the novel coronavirus (COVID-19) outbreak'. See more information on your obligations to share Covid-19 research.

Wherever possible data and data records should be made available with as few restrictions as possible. Sharing your data:

  • can increase your citations
  • allows data to be tested and validated
  • allows data to be re-used for further research or in teaching
  • increasingly funders see sharing data as in the public interest in line with the OECD principles and guidelines for access to research data from public funding.
  • reduces duplication of effort

Where sharing data is a requirement, the time of the release can be linked to a number of different points in the data life-cycle. These can include

  • the date of creation
  • any publication based on the data
  • the end or within a specified period after the end of the project

Data should be made available as openly as possible and as restricted as necessary.

Sharing, but not openly

Open data can be shared but not all shareable data can be open. There are some good reasons not to share data openly

  • the data contains personal data which cannot be sufficiently anonymised
  • the data contains commercially sensitive information

Often you can still share personal or pseudonymised information ethically with other researchers if the original participants were informed. See the UK Data Archive’s Consent and Ethics section for more information and advice.

If you think your data should only be available on request, see Restricting Access for more details and contact to discuss the options available.

Sharing, but not immediately

You may not want to share data immediately on initial publication as you still have extra research to do on the dataset and don't want to be scooped, or you want to make a patent application (contact RIS for help with patent applications). Funders recognise the right for researchers to have a period of privileged access to the data they have collected.  Typically data would be shared in a repository which allows for embargo periods (the University's institutional repository does support embargoes). The metadata record would be publicly available but the data itself would not be released until after the embargo period ends.

Sharing data deposited elsewhere

You may not always deposit or store data locally. For instance, you may be working on a collaborative project where the other partner is the lead organisation. In this case you may be required to deposit data using their services and this should have been agreed at the start of any project.  Seek guidance on collaborative agreements from your contact in Research & Innovation Services.

Your funder may also require that you deposit data in a particular repository. Where this is the case, you should create a record in Pure with a link to where the data is held.

Research data is only useful if the data can be read and understood.

File Formats

Will you still be able to access your file in 20 years?

Where possible use text files. Data will be usable even if formatting is lost. Consider exporting to CSV, XML, JSON free-form text. Otherwise, use file formats with openly published specifications:

  • DOCX, RFT or ODT for textual data
  • XLSX or ODS for spreadsheet data
  • SVG for figures (an open-standard vector format)
  • PDF/A for PDFs (a standardised version of PDF)
  • FLAC for audio (open and loses less than mp3 compression)

For more information see the UK Data Service: Recommended File Formats for a quick overview, and the Library of Congress Rescommended Formats Statement for more detailed analysis.

ReadMe Files

It is good practice to have a README file to accompany your dataset.  A README file should be a txt file and should contain the following information as a minimum:

  • Name/institution/address/email information for Research Group or Principal investigator or person responsible for collecting the data
  • Date of data collection (can be a single date, or a range)
  • Information about geographic location of data collection (if applicable)
  • Licenses or restrictions placed on the data
  • Links to publications that cite or use the data
  • Method description, links or references to publications or other documentation containing experimental design or protocols used in data collection (if applicable)
  • Any relationships between the data files
  • For each filename, a short description of what data it contains
  • For tabular data, definitions of column headings and row labels, data codes (including missing data) and measurement units (or embed those in the tabular file)
  • Definitions for codes or symbols used to record missing data
  • Specialized formats or abbreviations used

We have a ReadMe file template for University of Southampton datasets. These records have examples of ReadMe files:

We would recommend the following guides to writing README files:

Maintaining access

Where you have identified that your data has long-term value or that it requires to be held for a long period of time, i.e. funder requirement, you need to consider if there are any implications for on-going access. This may include the selection of format, i.e. format needs to be durable; software used, bespoke or otherwise, where this is required to interpret the data; and the need to give permission for the data to be migrated to new formats over time. It is likely that this requirement will be included in the agreement to deposit in an external repository. If data is generated using specifically-developed software, it may be necessary to provide a copy of the software, noting operating requirements, with the data.

Removing personal direct and indirect identifiers

See the Sensitive Data: Publication for further information on direct and indirect identifiers and Sensitive Data: Anonymisation on how to remove direct identifers from your data. We recommend that for open publication no more than two indirect identifiers are left in the dataset. More indirect identifiers can remain in the data if it is available on restricted access, to bone fide researchers only with ethics clearance and who are bound by data sharing agreements.



Bookmark this page as

We recommend that datasets should have an explicit licence to allow re-use. We would recommend that Creative Commons are used for datasets. This will make it clear how the data can be reused (e.g. Creative Commons Attribution). If you do not add a licence to your data - as a default we will add a CC-BY license to open and embargoed data. This is an attribution licence and will ensure re-use of your data is as open as possible  - “transparency, openness, verification and reproducibility are important to Open research” which we promote at the University of Southampton. (UKRI 2021).

Please see our single page guide explaining the basics of Creative Commons licences, Creative Commons (CC) licences: an introduction

See below for more information or read the DCC’s guide How to License Research Data for in depth discussion.

For software licences, see the guidance from the Software Sustainability Institute: Choosing an open-source licence

The Licences


This licence lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation. This is the most accommodating of licences offered. Recommended for maximum dissemination and use of licensed materials.

View Licence Deed | View Legal Code


This licence lets others remix, tweak, and build upon your work even for commercial purposes, as long as they credit you and license their new creations under the identical terms. This licence is often compared to “copyleft” free and open source software licences. All new works based on yours will carry the same licence, so any derivatives will also allow commercial use. This is the licence used by Wikipedia, and is recommended for materials that would benefit from incorporating content from Wikipedia and similarly licensed projects.

View Licence Deed | View Legal Code


This licence allows for redistribution, commercial and non-commercial, as long as it is passed along unchanged and in whole, with credit to you.

View Licence Deed | View Legal Code


This licence lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms.

View Licence Deed | View Legal Code


This license lets others remix, tweak, and build upon your work non-commercially, as long as they credit you and license their new creations under the identical terms.

View License Deed | View Legal Code


This licence is the most restrictive of our six main licences, only allowing others to download your works and share them with others as long as they credit you, but they can’t change them in any way or use them commercially.

View Licence Deed | View Legal Code


Creative Commons also provides tools that work in the “all rights granted” space of the public domain. Their CC0 tool allows licensors to waive all rights and place a work in the public domain, and our Public Domain Mark allows any web user to “mark” a work as being in the public domain.

GNU General Public Licence

This licence is usually applied to software. It allows others to copy, distribute and modify the software as long as they keep track of the changes in source files and keep modifications under General Public Licence.

View Legal Code

Apache v2

This licence is usually applied to software. It gives a lot of freedom to others, including an explicit right to a patent. Any modification to files need to be included in a notice if the software is re-used.

View Legal Code

All Rights Reserved

This is a restrictive licence that means the author(s) of the work (e.g. dataset or software) retains all rights over it and it cannot be re-used in any way without explicit permission.



Bookmark this page as

The nature or source of the data you create may mean there are moral, ethical, commercial and legal reasons for not sharing or for restricting access. Please note that those datasets identified as being unsuitable for open access still require to be held, safely and securely, in line with the University policy.

Levels of access in the University of Southampton Repository

Your dataset can be open or on request. It is possible to have a mixture of access levels within a dataset so that some data is openly available and some data is on request.

Datasets can be on request because:

  • the creator wants to know who is using the dataset (for impact),
  • the dataset is too sensitive to be openly available for ethical or commerical reasons,
  • the dataset is too large for Pure/ePrints.

Where the dataset is too sensitive to be openly available, there is a further choice of access conditions:

  1. Bona fide researchers, subject to registration and ethical approval may request supporting data from the University of Southampton repository.
  2. Researchers who have agreed the usage / data access agreement will be given access. There is not currently a generic data sharing agreement, the dataset creators should liaise with RIS to supply a data sharing agreement if they require anything beyond the basic on the data request form.
  3. A custom condition - please supply

The dataset creators will need to choose who can make the decision to supply the dataset:

  • the Library Research Data Service,
  • specific people (please specify) - typically one or more of the dataset creator(s), and / or their supervisor (for PGRs). If all of these people have left the university the request will be escalated within the faculty as appropriate,
  • another arrangement – please specify.

Whatever conditions are set, the Research Data team in the Library will act as the first point of contact for any requests.

Ethics and Data Protection

Research involving human subjects requires ethical review through Faculty Ethics Committees (See Ethics policy). Guidelines by the University and funders grant conditions cover the creation, use and storage of research data. Data related to individuals needs to be handled carefully and in accordance with the Data Protection Act 2018 (incorporating the General Data Protection Regulations). The confidentiality of participants in research should always be maintained and the privacy of participants should be protected in any publication arising from the research in line with current best practice that include the use of anonymised data or control access by restricting who can use the data. Information that identifies an individual can only be shared if consent has been given for this. Ethical reviews are likely to benefit from seeing a research data management plan that covers the data lifecycle and can demonstrate compliance with current legislation.

For further guidance contact Legal Services. See also the Information Commissioner's Office introduction to anonymisation

Sensitive and Classified Information

As well as the obvious areas of research involving details of individuals or patients, you may need to consider if there are any economic, social, security or political risks associated with the release of the data. For example, recent malaria statistics had to be anonymised to avoid the resultant maps from being used to identify the location of villages in a war zone. Sensitive research data require appropriate security measures such as regular changes of password or encryption of the data. This will also apply to commercially-sensitive information data obtained from commercial partners.

Intellectual Property Rights and Confidentiality

The University Intellectual Property Regulations govern the ownership and use of Intellectual Property Rights (including research data) generated by University staff and students.

In addition, there is often a positive obligation from our non-commercial sponsors (UKRI, Medical Charities, Government Departments, EU) to consider the protection and commercialisation of any Intellectual Property Rights arising from the research they are funding. This may require a temporary delay in the release of research data until the commercial potential of the idea is assessed and protection secured (if appropriate). Only a very small proportion of the IPR generated across the University will warrant patent protection and necessitate temporary restriction on sharing. The value of the majority of University IP will be derived through publication and widest dissemination which in turn, may create opportunities. If you think your IP may have commercial potential, advice from your Collaboration Manager in RIS should be sought at the earliest opportunity to allow sufficient time for a commercial assessment.

Confidentiality obligations will also arise from the contractual arrangements entered into for research. Industry usually is very cautious but so are government departments and public sector organisations. Whereas RIS will seek to secure terms and conditions that maximise your academic freedom (including the right to publish and to re-use the research outputs), industry sponsors will often impose confidentiality obligations and restrictions on the Intellectual Property Rights (including research data) arising from the work they fund at the University. Where consultancy terms are imposed because the research is more “applied”, it is most likely no on-going rights can be preserved. It is also worth bearing in mind that if you are working in collaboration with other universities, there may be joint ownership issues. These need to be agreed and should be covered by the proposal and collaboration agreement. Advice on the terms and conditions governing research projects should be sought from the Research Support Officer in your Faculty.

Use of third party data

If you use data owned by a third party (copyright material, software or database), you need to understand the terms under which these are obtained and the scope of use. It is necessary to obtain permission from the data owner for re-use of such material, unless conditions of re-use have been explicitly indicated, for example, with a Creative Commons licence. It is your responsibility to ensure you comply with the terms that apply. Advice on terms and conditions for the in-licensing of data or software can be sought from the Research Support Officer in your Faculty. In most instances, these are not negotiable. However, it may be possible to seek specific use terms or negotiate different licensing arrangements more appropriate to your specific requirements. It may be that in some circumstances, a commercial licence offers more freedom-to-operate than provisions for academic purposes.

You may also find that the terms of use of some data services, such as Census statistics, require you to deposit derived work with them. When depositing data in a repository you will be required to agree to a licence that asserts you have the rights to deposit that data.

Publication agreements

Some publishers require that supporting data be submitted with articles for publication; in other cases that supporting data is deposited in a designated service or repository. Particular consideration should be given to any rights the publisher asks you to assign. While it has been common practice for publishers to seek transfer of copyright for research papers, this is not yet established for supporting data. Broadly, you should ensure that rights you are asked to assign for data do not conflict with the University regulations on IPR. It is increasingly recognised that researchers should explicitly retain the rights to reuse and share their own data and publications, through data and publication repositories, for example, and should avoid assigning rights that prevent this

Other restrictions

There is often a time dimension to decisions on when to release data for sharing. With the emergence of networked data management services, best practice will be to store data at the point of generation, but to specify when that data might be shared, e.g. after further processing, after validation, after publication, etc. In some cases, restrictions may be set by others and apply for a set period, e.g. an embargo, after which data can be shared.

Restrictions on access to research data during and after the end of the project need to be addressed in the initial research proposal and throughout the life of the project as part of your data management plan. Some Funders may require statements justifying why data should be restricted as part of their application process. This may also need to be addressed in ethics applications.

Note that those datasets identified as being unsuitable for sharing still need to be managed and stored in accordance with the University policy. Access can be restricted by adding the appropriate metadata on the deposit form. Identification and security of sensitive data is important. Sensitive data should be flagged at the start of a project in a data management plan.

Any request for access to your research data under the Freedom of Information Act should be directed to the University FOI Officer in Legal Services.


Bookmark this page as

  • Research data should be deposited in an appropriate institutional or disciplinary data repository at time of publication of the associated research output or at the end of the project.
  • The best repository to choose for your research data will be a national data centre or discipline specialist repository, because they have the expertise and resources to deal with particular types of data. See Depositing Elsewhere below for more information
  • If there is no subject repository for your data exists, or if the dataset is small and underpins a publication, you can deposit it in the University of Southampton Repository.

Depositing @ Soton

You can deposit small datasets (gigabytes in size) in our institutional repository, ePrints Soton for long-term storage. If your dataset is a terrabyte or over in size, contact to discuss how to deposit your data.

Data can be open or on request depending on the nature of the data. Contact for more information.

In order to deposit your data please do the following:

  1. Log into Pure:
  2. On your Pure personal overview page on the right click ‘add content’ select ‘dataset’. A new template will open up for you to add details of the dataset.
  3. Any fields with a red * must be completed. These are Title, People, Dataset managed by, Publisher and Date made available. Your record will not save unless you have filled them in. Please fill in other fields if you think they will be useful.
  4. In the Data availability section you can add your data files to Electronic data. All file formats are accepted (although please try to ensure you files are a type recommended for preservation) and you can add multiple files at once. If you have many files, consider zipping them together. Include a ReadMe file.
  5. Please choose an appropriate Data Licence (CC-BY or other)and Type (eg Dataset, Image) to all files. Click OK.
  6. If you think your data should only be available on request, please contact to discuss the options available.
  7. In Relations to other content, link your dataset to Publications, Projects or other Datasets in Pure.
  8. At the bottom of the page under Status, save the record as For Validation.
  9. Request a DOI for your dataset to include in the funder acknowledgement and data access statement in your publications. Ideally you should create a record for your dataset and request the DOI prior to submitting your manuscript. 

There is an extensive guide on depositing data at Southampton (pdf) if you need more information on how to deposit data into Pure.

Please also see our data deposit videos at and our Creative Commons licences: An introduction guide

Depositing elsewhere

Where possible we recommend using discipline-specific data repositories, you can find one for your subject via

Some funders expect data to be deposited in specific data centres e.g. ESRC and NERC support dedicated data centres. Also consider whether any agreements with your collaborators include requirements for data deposit. 

If you have an option to deposit in a repository associated with your funder, or your publication will pays for deposit in Dryad it is worthwhile considering this.

If you don't have funding to pay for deposit, or your funder does not provide a repository, you can use Zenodo, hosted at CERN. Zenodo takes deposits of up to 50GB per dataset (you can have multiple datasets), for larger amounts, contact the repository. Datasets can be open or closed. There is o charge, although donations welcomed for oversize deposits.

If you deposit your data elsewhere, please create a dataset catalogue entry in Pure linking to where the data is sorted.

You may be able to publish your data in a data journal. This is a growing and fast moving area.  Some publishers are now requiring the deposit of supporting data with the article, while others require that a link to the data is provided.  You will need to take this into account when considering how long you will need to retain the data and may influence your choice of storage location.



Bookmark this page as

The data access statement should be included in the submitted manuscript, even if the identifiers have not yet been issued. The statement should be updated to include any identifiers as they become available, typically when the manuscript is accepted for publication.  Storage should be in a stable location or repository that provides a persistent identifier such as a Digital Object Identifier (DOI). To request a DOI for data deposited in our institutional repository, please complete the  DOI for Data form

If the dataset is not openly available, the data access statement should direct users to a permanent record that describes any access constraints or conditions that must be satisfied for access to be granted

Note: Listing corresponding author would not normally be considered sufficient to meet the requirements of funders such as the UKRI.

There is no set format for the data access statement.  The following are recommendations on what to include:

Examples of data access statements

Openly available data

  • name(s) of the data repositories
  • persistent identifiers or accession numbers for the dataset.

"All data supporting this study are openly available from the University of Southampton repository at"

For example see: Squicciarini, G., Toward, M.G.R. and Thompson, D.J. (2015) Experimental procedures for testing the performance of rail dampers. Journal of Sound and Vibration, 359, 21-39. DOI: 10.1016/j.jsv.2015.07.007

Restricted access - ethical, legal, commercial

  • include justification for restriction
  • document reasons, for example,
    • the ethics approval reference number in metadata
    • collaborative agreements
    • data management plan for the project.

"Due to ethical concerns, supporting data cannot be made openly available. Further information about the data and conditions for access are available from the University of Southampton repository:"

"Bona fide researchers, subject to registration may request supporting data via University of Southampton repository"

For example see: Krishnaveni, G.V., Veena, S.R., Srinivasan, K., Osmond, C. and Fall, C.H.D. (2015) Linear Growth and Fat and Lean Tissue Gain during Childhood: Associations with Cardiometabolic and Cognitive Outcomes in Adolescent Indian Children. Plos One, 10 (11).

Secondary analysis of existing data

If your data are the result of re-using existing data,

  • the original source(s) should be credited.

"This study was a re-analysis of existing data that are publicly available from [organisation] at [web address]"

For example see: Mcdonagh, E.L., King, B.A., Bryden, H.L., Courtois, P., Szuts, Z., Baringer, M., Cunningham, S.A., Atkinson, C. and Mccarthy, G. (2015) Continuous Estimate of Atlantic Oceanic Freshwater Flux at 26.5 degrees N. Journal of Climate, 28 (22), 8888-8906.

No new data created

  • e.g. mathematical proof.


"No new data were created during this study"

If the examples above do not cover your situation contact for further advice.


Bookmark this page as

As with other academic work, proper citation is essential - this acknowledges scholarship and allows the data to be located more easily.

The inclusion of a DOI (Digital Object Identifier) or a URI (Uniform Resource Identifier) can help with locating the data as well as allowing links to be made from related publications. DOIs (Digital Object Identifiers) can be assigned to individual elements, or to the whole dataset. For further information about getting a DOI at Southampton go to DOI for Data or for information about data DOIs more generally see DataCite.

Information on how to correctly cite a dataset can be found on the DataCite website, following the principles outlined by Force11 in 2014.

The recommended format for data citation usually comprises the following components:
Creator (PublicationYear). Title. Publisher. Identifier

If applicable, information about two other properties, Version and ResourceType, should also be included
Creator (PublicationYear). Title. Version. Publisher. ResourceType. Identifier


Voutsina, Nikol, Chapman, Mark and Taylor, Gail (2016) Data in support of 'Characterization of the watercress (Nasturtium officinale R. Br.; Brassicaceae) transcriptome using RNASeq and identification of candidate genes for important phytonutrient traits linked to human health. University of Southampton Dataset. doi:10.5258/SOTON/394656

Further reading

Covid-19 Research

Coronavirus (Image by Felipe Esquivel-Reed, CC-BY-SA)

The University, UKRI, as well as multiple funders and all leading publishers have signed Wellcome's statement on 'Sharing research data and findings relevant to the novel coronavirus (COVID-19) outbreak' which follows the WHO recommendations for sharing research in public health emergencies.

University authors who are conducting research related to Covid-19 are required to:

  • make their research paper available as a pre-print including a clear data statement
  • share interim and final research data relating to the outbreak, together with protocols and standards used to collect the data, as rapidly and widely as possible - including with public health and research communities and the WHO

This means that the interim results underlying papers should be made available but also the final, complete dataset once the project is finished is deposited.

The University's recommendation is that the pre-prints and data are best deposited in disciplinary relevant repositories in preference to ePrints Soton in order to maximise their exposure. The main pre-print servers for health, medicine and the bio sciences include medRxiv, SSRN and bioRxiv. ASAPbio maintain a list of reputable pre-print servers covering all disciples. Relevant subject data repositories can be found by searching

Accepted manuscripts and catalogue records for the datasets held elsewhere should still be deposited in our own institutional repository via Pure.

Researchers should not feel concerned about pre-prints counting as prior publication. All the leading scientific publishers have signed the statement to agree "that data or preprints shared ahead of submission will not preempt its publication in these journals".

Help from other University Services

Legal Services

Collaboration Managers Team (Research & Innovation Services)

Research Support Team (Research & Innovation Services)


Depositing Data to Support Publication

Note:  There are additional steps for depositing data at the end of a project.  Please contact for further guidance.

Requesting a DOI

We can register a DOI for your dataset through DataCite - this gives a persistent link and can make it easier to cite.


For more details see our DOI for data page.


Research Support Guide