Data-sharing is encouraged by all UK Funding Councils, the University's Research Data Management Policy and, for COVID-related research, the statement on 'Sharing research data and findings relevant to the novel coronavirus (COVID-19) outbreak'. See more information on your obligations to share Covid-19 research.
Wherever possible data and data records should be made available with as few restrictions as possible. Sharing your data:
Where sharing data is a requirement, the time of the release can be linked to a number of different points in the data life-cycle. These can include
Data should be made available as openly as possible and as restricted as necessary.
Open data can be shared but not all shareable data can be open. There are some good reasons not to share data openly
Often you can still share personal or pseudonymised information ethically with other researchers if the original participants were informed. See the UK Data Archive’s Consent and Ethics section for more information and advice.
You may not want to share data immediately on initial publication as you still have extra research to do on the dataset and don't want to be scooped, or you want to make a patent application (contact RIS for help with patent applications). Funders recognise the right for researchers to have a period of privileged access to the data they have collected. Typically data would be shared in a repository which allows for embargo periods (the University's institutional repository does support embargoes). The metadata record would be publicly available but the data itself would not be released until after the embargo period ends.
You may not always deposit or store data locally. For instance, you may be working on a collaborative project where the other partner is the lead organisation. In this case you may be required to deposit data using their services and this should have been agreed at the start of any project. Seek guidance on collaborative agreements from your contact in Research & Innovation Services.
Your funder may also require that you deposit data in a particular repository. Where this is the case, you should create a record in Pure with a link to where the data is held.
Research data is only useful if the data can be read and understood.
Will you still be able to access your file in 20 years?
Where possible use text files. Data will be usable even if formatting is lost. Consider exporting to CSV, XML, JSON free-form text. Otherwise, use file formats with openly published specifications:
For more information see the UK Data Service: Recommended File Formats for a quick overview, and the Library of Congress Rescommended Formats Statement for more detailed analysis.
It is good practice to have a README file to accompany your dataset. A README file should be a txt file and should contain the following information as a minimum:
We have a ReadMe file template for University of Southampton datasets. These records have examples of ReadMe files:
We would recommend the following guides to writing README files:
Where you have identified that your data has long-term value or that it requires to be held for a long period of time, i.e. funder requirement, you need to consider if there are any implications for on-going access. This may include the selection of format, i.e. format needs to be durable; software used, bespoke or otherwise, where this is required to interpret the data; and the need to give permission for the data to be migrated to new formats over time. It is likely that this requirement will be included in the agreement to deposit in an external repository. If data is generated using specifically-developed software, it may be necessary to provide a copy of the software, noting operating requirements, with the data.
See the Sensitive Data: Publication for further information on direct and indirect identifiers and Sensitive Data: Anonymisation on how to remove direct identifers from your data. We recommend that for open publication no more than two indirect identifiers are left in the dataset. More indirect identifiers can remain in the data if it is available on restricted access, to bone fide researchers only with ethics clearance and who are bound by data sharing agreements.
Bookmark this page as https://library.soton.ac.uk/researchdata/sharingtips
We recommend that datasets should have an explicit licence to allow re-use. We would recommend that Creative Commons are used for datasets. This will make it clear how the data can be reused (e.g. Creative Commons Attribution). If you do not add a licence to your data - as a default we will add a CC-BY license to open and embargoed data. This is an attribution licence and will ensure re-use of your data is as open as possible - “transparency, openness, verification and reproducibility are important to Open research” which we promote at the University of Southampton. (UKRI 2021).
Please see our single page guide explaining the basics of Creative Commons licences, Creative Commons (CC) licences: an introduction
See below for more information or read the DCC’s guide How to License Research Data for in depth discussion.
For software licences, see the guidance from the Software Sustainability Institute: Choosing an open-source licence.
This licence lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation. This is the most accommodating of licences offered. Recommended for maximum dissemination and use of licensed materials.
This licence lets others remix, tweak, and build upon your work even for commercial purposes, as long as they credit you and license their new creations under the identical terms. This licence is often compared to “copyleft” free and open source software licences. All new works based on yours will carry the same licence, so any derivatives will also allow commercial use. This is the licence used by Wikipedia, and is recommended for materials that would benefit from incorporating content from Wikipedia and similarly licensed projects.
This licence allows for redistribution, commercial and non-commercial, as long as it is passed along unchanged and in whole, with credit to you.
This licence lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms.
This license lets others remix, tweak, and build upon your work non-commercially, as long as they credit you and license their new creations under the identical terms.
This licence is the most restrictive of our six main licences, only allowing others to download your works and share them with others as long as they credit you, but they can’t change them in any way or use them commercially.
Creative Commons also provides tools that work in the “all rights granted” space of the public domain. Their CC0 tool allows licensors to waive all rights and place a work in the public domain, and our Public Domain Mark allows any web user to “mark” a work as being in the public domain.
GNU General Public Licence
This licence is usually applied to software. It allows others to copy, distribute and modify the software as long as they keep track of the changes in source files and keep modifications under General Public Licence.
This licence is usually applied to software. It gives a lot of freedom to others, including an explicit right to a patent. Any modification to files need to be included in a notice if the software is re-used.
All Rights Reserved
This is a restrictive licence that means the author(s) of the work (e.g. dataset or software) retains all rights over it and it cannot be re-used in any way without explicit permission.
Bookmark this page as https://library.soton.ac.uk/researchdata/licensing
The nature or source of the data you create may mean there are moral, ethical, commercial and legal reasons for not sharing or for restricting access. Please note that those datasets identified as being unsuitable for open access still require to be held, safely and securely, in line with the University policy.
Your dataset can be open or on request. It is possible to have a mixture of access levels within a dataset so that some data is openly available and some data is on request.
Datasets can be on request because:
Where the dataset is too sensitive to be openly available, there is a further choice of access conditions:
The dataset creators will need to choose who can make the decision to supply the dataset:
Whatever conditions are set, the Research Data team in the Library will act as the first point of contact for any requests.
Research involving human subjects requires ethical review through Faculty Ethics Committees (See Ethics policy). Guidelines by the University and funders grant conditions cover the creation, use and storage of research data. Data related to individuals needs to be handled carefully and in accordance with the Data Protection Act 2018 (incorporating the General Data Protection Regulations). The confidentiality of participants in research should always be maintained and the privacy of participants should be protected in any publication arising from the research in line with current best practice that include the use of anonymised data or control access by restricting who can use the data. Information that identifies an individual can only be shared if consent has been given for this. Ethical reviews are likely to benefit from seeing a research data management plan that covers the data lifecycle and can demonstrate compliance with current legislation.
For further guidance contact Legal Services. See also the Information Commissioner's Office Guide to Anonymisation
As well as the obvious areas of research involving details of individuals or patients, you may need to consider if there are any economic, social, security or political risks associated with the release of the data. For example, recent malaria statistics had to be anonymised to avoid the resultant maps from being used to identify the location of villages in a war zone. Sensitive research data require appropriate security measures such as regular changes of password or encryption of the data. This will also apply to commercially-sensitive information data obtained from commercial partners.
The University Intellectual Property Regulations govern the ownership and use of Intellectual Property Rights (including research data) generated by University staff and students.
In addition, there is often a positive obligation from our non-commercial sponsors (UKRI, Medical Charities, Government Departments, EU) to consider the protection and commercialisation of any Intellectual Property Rights arising from the research they are funding. This may require a temporary delay in the release of research data until the commercial potential of the idea is assessed and protection secured (if appropriate). Only a very small proportion of the IPR generated across the University will warrant patent protection and necessitate temporary restriction on sharing. The value of the majority of University IP will be derived through publication and widest dissemination which in turn, may create opportunities. If you think your IP may have commercial potential, advice from your Collaboration Manager in RIS should be sought at the earliest opportunity to allow sufficient time for a commercial assessment.
Confidentiality obligations will also arise from the contractual arrangements entered into for research. Industry usually is very cautious but so are government departments and public sector organisations. Whereas RIS will seek to secure terms and conditions that maximise your academic freedom (including the right to publish and to re-use the research outputs), industry sponsors will often impose confidentiality obligations and restrictions on the Intellectual Property Rights (including research data) arising from the work they fund at the University. Where consultancy terms are imposed because the research is more “applied”, it is most likely no on-going rights can be preserved. It is also worth bearing in mind that if you are working in collaboration with other universities, there may be joint ownership issues. These need to be agreed and should be covered by the proposal and collaboration agreement. Advice on the terms and conditions governing research projects should be sought from the Research Support Officer in your Faculty.
If you use data owned by a third party (copyright material, software or database), you need to understand the terms under which these are obtained and the scope of use. It is necessary to obtain permission from the data owner for re-use of such material, unless conditions of re-use have been explicitly indicated, for example, with a Creative Commons licence. It is your responsibility to ensure you comply with the terms that apply. Advice on terms and conditions for the in-licensing of data or software can be sought from the Research Support Officer in your Faculty. In most instances, these are not negotiable. However, it may be possible to seek specific use terms or negotiate different licensing arrangements more appropriate to your specific requirements. It may be that in some circumstances, a commercial licence offers more freedom-to-operate than provisions for academic purposes.
Some publishers require that supporting data be submitted with articles for publication; in other cases that supporting data is deposited in a designated service or repository. Particular consideration should be given to any rights the publisher asks you to assign. While it has been common practice for publishers to seek transfer of copyright for research papers, this is not yet established for supporting data. Broadly, you should ensure that rights you are asked to assign for data do not conflict with the University regulations on IPR. It is increasingly recognised that researchers should explicitly retain the rights to reuse and share their own data and publications, through data and publication repositories, for example, and should avoid assigning rights that prevent this
There is often a time dimension to decisions on when to release data for sharing. With the emergence of networked data management services, best practice will be to store data at the point of generation, but to specify when that data might be shared, e.g. after further processing, after validation, after publication, etc. In some cases, restrictions may be set by others and apply for a set period, e.g. an embargo, after which data can be shared.
Restrictions on access to research data during and after the end of the project need to be addressed in the initial research proposal and throughout the life of the project as part of your data management plan. Some Funders may require statements justifying why data should be restricted as part of their application process. This may also need to be addressed in ethics applications.
Note that those datasets identified as being unsuitable for sharing still need to be managed and stored in accordance with the University policy. Access can be restricted by adding the appropriate metadata on the deposit form. Identification and security of sensitive data is important. Sensitive data should be flagged at the start of a project in a data management plan.
Any request for access to your research data under the Freedom of Information Act should be directed to the University FOI Officer in Legal Services.
Bookmark this page as https://library.soton.ac.uk/researchdata/restrictingaccess
You can deposit small datasets (gigabytes in size) in our institutional repository, ePrints Soton for long-term storage. If your data is a terrabyte or over in size, contact email@example.com to discuss how to deposit your data.
Data can be open or on request depending on the nature of the data. Contact firstname.lastname@example.org for more information.
In order to deposit your data please do the following:
Please also see our data deposit videos at https://library.soton.ac.uk/researchdata/datasetvideos and our Creative Commons licences: An introduction guide
You can request a DOI for your dataset to include in the funder acknowledgement and data access statement in your publications. Ideally you should create a record for your dataset and request the DOI prior to submitting your manuscript.
There is an extensive guide on depositing data at Southampton (pdf) if you need more information on how to deposit data into Pure.
Where possible we recommend using discipline-specific data repositories, you can find one for your subject via Re3data.org.
Some funders expect data to be deposited in specific data centres e.g. ESRC and NERC support dedicated data centres. Also consider whether any agreements with your collaborators include requirements for data deposit.
If you have an option to deposit in a repository associated with your funder, or your publication will pays for deposit in Dryad it is worthwhile considering this.
If you don't have funding to pay for deposit, or your funder does not provide a repository, you can use Zenodo, hosted at CERN. Zenodo takes deposits of up to 50GB per dataset (you can have multiple datasets), for larger amounts, contact the repository. Datasets can be open or closed. There is o charge, although donations welcomed for oversize deposits.
If you deposit your data elsewhere, please create a dataset catalogue entry in Pure linking to where the data is sorted.
You may be able to publish your data in a data journal. This is a growing and fast moving area. Some publishers are now requiring the deposit of supporting data with the article, while others require that a link to the data is provided. You will need to take this into account when considering how long you will need to retain the data and may influence your choice of storage location.
Bookmark this page as https://library.soton.ac.uk/researchdata/repository
The data access statement should be included in the submitted manuscript, even if the identifiers have not yet been issued. The statement should be updated to include any identifiers as they become available, typically when the manuscript is accepted for publication. Storage should be in a stable location or repository that provides a persistent identifier such as a Digital Object Identifier (DOI). To request a DOI for data deposited in our institutional repository, please complete the DOI for Data form
If the dataset is not openly available, the data access statement should direct users to a permanent record that describes any access constraints or conditions that must be satisfied for access to be granted
Note: Listing corresponding author would not normally be considered sufficient to meet the requirements of funders such as the UKRI.
There is no set format for the data access statement. The following are recommendations on what to include:
Openly available data
"All data supporting this study are openly available from the University of Southampton repository at https://doi.org/10.5258/SOTON/xxxxx"
For example see: Squicciarini, G., Toward, M.G.R. and Thompson, D.J. (2015) Experimental procedures for testing the performance of rail dampers. Journal of Sound and Vibration, 359, 21-39. DOI: 10.1016/j.jsv.2015.07.007
Restricted access - ethical, legal, commercial
"Due to ethical concerns, supporting data cannot be made openly available. Further information about the data and conditions for access are available from the University of Southampton repository: https://doi.org/10.5258/SOTON/xxxxx"
"Bona fide researchers, subject to registration may request supporting data via University of Southampton repository https://doi.org/10.5258/SOTON/xxxxx"
For example see: Krishnaveni, G.V., Veena, S.R., Srinivasan, K., Osmond, C. and Fall, C.H.D. (2015) Linear Growth and Fat and Lean Tissue Gain during Childhood: Associations with Cardiometabolic and Cognitive Outcomes in Adolescent Indian Children. Plos One, 10 (11).
Secondary analysis of existing data
If your data are the result of re-using existing data,
"This study was a re-analysis of existing data that are publicly available from [organisation] at [web address]"
For example see: Mcdonagh, E.L., King, B.A., Bryden, H.L., Courtois, P., Szuts, Z., Baringer, M., Cunningham, S.A., Atkinson, C. and Mccarthy, G. (2015) Continuous Estimate of Atlantic Oceanic Freshwater Flux at 26.5 degrees N. Journal of Climate, 28 (22), 8888-8906.
No new data created
"No new data were created during this study"
If the examples above do not cover your situation contact ResearchData@soton.ac.uk for further advice.
Bookmark this page as https://library.soton.ac.uk/researchdata/access-statements
As with other academic work, proper citation is essential - this acknowledges scholarship and allows the data to be located more easily.
The inclusion of a DOI (Digital Object Identifier) or a URI (Uniform Resource Identifier) can help with locating the data as well as allowing links to be made from related publications. DOIs (Digital Object Identifiers) can be assigned to individual elements, or to the whole dataset. For further information about getting a DOI at Southampton go to DOI for Data or for information about data DOIs more generally see DataCite.
The recommended format for data citation usually comprises the following components:
Creator (PublicationYear). Title. Publisher. Identifier
If applicable, information about two other properties, Version and ResourceType, should also be included
Creator (PublicationYear). Title. Version. Publisher. ResourceType. Identifier
Voutsina, Nikol, Chapman, Mark and Taylor, Gail (2016) Data in support of 'Characterization of the watercress (Nasturtium officinale R. Br.; Brassicaceae) transcriptome using RNASeq and identification of candidate genes for important phytonutrient traits linked to human health. University of Southampton Dataset. doi:10.5258/SOTON/394656
Bookmark this page as https://library.soton.ac.uk/researchdata/datacitation
University policies related to data sharing:
Bookmark this page as https://library.soton.ac.uk/researchdata/sharingpolicies
The University, UKRI, as well as multiple funders and all leading publishers have signed Wellcome's statement on 'Sharing research data and findings relevant to the novel coronavirus (COVID-19) outbreak' which follows the WHO recommendations for sharing research in public health emergencies.
University authors who are conducting research related to Covid-19 are required to:
This means that the interim results underlying papers should be made available but also the final, complete dataset once the project is finished is deposited.
The University's recommendation is that the pre-prints and data are best deposited in disciplinary relevant repositories in preference to ePrints Soton in order to maximise their exposure. The main pre-print servers for health, medicine and the bio sciences include medRxiv, SSRN and bioRxiv. ASAPbio maintain a list of reputable pre-print servers covering all disciples. Relevant subject data repositories can be found by searching Res3Data.org.
Accepted manuscripts and catalogue records for the datasets held elsewhere should still be deposited in our own institutional repository via Pure.
Researchers should not feel concerned about pre-prints counting as prior publication. All the leading scientific publishers have signed the statement to agree "that data or preprints shared ahead of submission will not preempt its publication in these journals".