Skip to main content

Research Data Management: Sharing

Guidance and support to staff, researchers and students at the University of Southampton

Sharing your data

Data-sharing is encouraged by all UK Funding Councils and the University's Research Data Management Policy. Wherever possible data and data records should be made available with as few restrictions as possible. Sharing your data:

  • can increase your citations
  • allows data to be tested and validated
  • allows data to be re=used for further research or in teaching
  • increasingly funders see sharing data as in the public interest in line with the OECD principles and guidelines for access to research data from public funding.
  • reduces duplication of effort and cuts
  • the University's Research Data Management policy expects researchers to share data where possible

Where sharing data is a requirement, the time of the release can be linked to a number of different points in the data life-cycle. These can include

  • the date of creation
  • any publication based on the data
  • the end or within a specified period after the end of the project

For more information on how and what to share after the end of the research project, see below.

Depositing Data to Support Publication

Note:  There are additional steps for depositing data at the end of a project.  Please contact ResearchData@soton.ac.uk for further guidance.

Requesting a DOI

We can register a DOI for your dataset through DataCite - this gives a persistent link and can make it easier to cite.

For more details see our DOI for data page.

 

To share or not to share?

Sharing, but not openly

There are some good reasons not to share data openly

  • you want to make a patent application - contact RIS
  • the data contains personal data which cannot be sufficiently anonmyised
  • the data contains commercially sensitive information

Often you can still share personal or pseudonymised information ethically with other researchers if the original participants where made informed. See the UK Data Archive’s Consent and Ethics section for more information and advice.

Share, but not immediately

You may not want to share data immediately on initial publication as you still have extra research to do on the dataset and don't want to be scooped. Funders recognise the right for researchers to have a period of privileged access to the data they have collected.  Typically data would be shared in a repository which allows for embargo periods (the University's institutional repository does support embargoes). The metadata record would be publicly available but the data itself would not be released until after the embargo period ends.

Sharing data deposited elsewhere

You may not always deposit or store data locally. For instance, you may be working on a collaborative project where the other partner is the lead organisation. In this case you may be required to deposit data using their services and this should have been agreed at the start of any project.  Seek guidance on collaborative agreements from your contact in Research & Innovation Services.

Your funder may also require that you deposit data in a particular repository. Where this is the case, you should create a record in Pure with a link to where the data is held.

 

Sharing Data after the research

Research data is only useful if the data can be read and understood.

File Formats

Will you still be able to access your file in 20 years?

Where possible use text files. Data will be usable even if formatting is lost. Consider exporting to CSV, XML, JSON free-form text. Otherwise, use file formats with openly published specifications:

  • DOCX or ODT for textual data
  • XLSX or ODS for spreadsheet data
  • SVG for figures (an open-standard vector format)
  • PDF/A for PDFs (a standardised version of PDF)
  • FLAC for audio (open and loses less than mp3 compression)

For more information see the UK Data Service: Recommended File Formats

ReadMe Files

It is good practice to have a README file to accompany your dataset.  A README file should be a txt file and should contain the following information as a minimum:

  • Name/institution/address/email information for Research Group or Principal investigator or person responsible for collecting the data
  • Date of data collection (can be a single date, or a range)
  • Information about geographic location of data collection (if applicable)
  • Licenses or restrictions placed on the data
  • Links to publications that cite or use the data
  • Method description, links or references to publications or other documentation containing experimental design or protocols used in data collection (if applicable)
  • Any relationships between the data files
  • For each filename, a short description of what data it contains
  • For tabular data, definitions of column headings and row labels, data codes (including missing data) and measurement units (or embed those in the tabular file)
  • Definitions for codes or symbols used to record missing data
  • Specialized formats or abbreviations used

These records have examples of ReadMe files:

We would recommend the following guides to writing README files:

Maintaining access

Where you have identified that your data has long-term value or that it requires to be held for a long period of time, i.e. funder requirement, you need to consider if there are any implications for on-going access. This may include the selection of format, i.e. format needs to be durable; software used, bespoke or otherwise, where this is required to interpret the data; and the need to give permission for the data to be migrated to new formats over time. It is likely that this requirement will be included in the agreement to deposit in an external repository. If data is generated using specifically-developed software, it may be necessary to provide a copy of the software, noting operating requirements, with the data.

At the end of the research project, in a timely manner and in accordance with any funding requirements, research data should be deposited in an appropriate institutional or disciplinary data repository. The best repository to choose for your research data will be a national data centre or discipline specialist repository, because they have the expertise and resources to deal with particular types of data.

Depositing @ Soton

You can deposit small datasets (gigabytes in size) in ePrints Soton for long-term storage.  If your data is a terrabyte or over in size, contact researchdata@soton.ac.uk to discuss how to deposit your data.

Data can be open or on request depending on the nature of the data.

You can request a DOI for your dataset to include in the funder acknowledgement and data access statement in your publications and we can organise this.  Ideally you should request the DOI prior to submitting your manuscript.  See DOI for Data.

Depositing elsewhere

Where possible we recommend using discipline-specific data repositories, you can find one for your subject via Re3data.org.

Some funders expect data to be deposited in specific data centres e.g. ESRC and NERC support dedicated data centres. Also consider whether any agreements with your collaborators include requirements for data deposit. 

If you have an option to deposit in a repository associated with your funder, or your publication will pays for deposit in Dryad it is worthwhile considering this.

If you deposit your data elsewhere, please create a dataset catalogue entry in Pure linking to where the data is sorted.

You may be able to publish your data in a data journal. This is a growing and fast moving area.  Some publishers are now requiring the deposit of supporting data with the article, while others require that a link to the data is provided.  You will need to take this into account when considering how long you will need to retain the data and may influence your choice of storage location.

The nature or source of the data you create may mean there are moral, ethical, commercial and legal reasons for not sharing or for restricting access. Please note that those datasets identified as being unsuitable for open access still require to be held in line, safely and securely, with the University policy.

Ethics and Data Protection

Research involving human subjects requires ethical review through Faculty Ethics Committees (See Ethics policy). Guidelines by the University and funders grant conditions cover the creation, use and storage of research data. Data related to individuals needs to be handled carefully and in accordance with the Data Protection Act 2018 (incorporating the General Data Protection Regulations). The confidentiality of participants in research should always be maintained and the privacy of participants should be protected in any publication arising from the research in line with current best practice that include the use of anonymised data or control access by restricting who can use the data. Information that identifies an individual can only be shared if consent has been given for this. Ethical reviews are likely to benefit from seeing a research data management plan that covers the data lifecycle and can demonstrate compliance with current legislation.

For further guidance contact Legal Services. See also the Information Commissioner's Office Guide to Anonymisation

Sensitive and Classified Information

As well as the obvious areas of research involving details of individuals or patients, you may need to consider if there are any economic, social, security or political risks associated with the release of the data. For example, recent malaria statistics had to be anonymised to avoid the resultant maps from being used to identify the location of villages in a war zone. Sensitive research data require appropriate security measures such as regular changes of password or encryption of the data. This will also apply to commercially-sensitive information data obtained from commercial partners.

Intellectual Property Rights and Confidentiality

The University Intellectual Property Regulations govern the ownership and use of Intellectual Property Rights (including research data) generated by University staff and students.

In addition, there is often a positive obligation from our non-commercial sponsors (RCUK, Medical Charities, Government Departments, EU) to consider the protection and commercialisation of any Intellectual Property Rights arising from the research they are funding. This may require a temporary delay in the release of research data until the commercial potential of the idea is assessed and protection secured (if appropriate). Only a very small proportion of the IPR generated across the University will warrant patent protection and necessitate temporary restriction on sharing. The value of the majority of University IP will be derived through publication and widest dissemination which in turn, may create opportunities. If you think your IP may have commercial potential, advice from your Collaboration Manager in RIS should be sought at the earliest opportunity to allow sufficient time for a commercial assessment.

Confidentiality obligations will also arise from the contractual arrangements entered into for research. Industry usually is very cautious but so are government departments and public sector organisations. Whereas RIS will seek to secure terms and conditions that maximise your academic freedom (including the right to publish and to re-use the research outputs), industry sponsors will often impose confidentiality obligations and restrictions on the Intellectual Property Rights (including research data) arising from the work they fund at the University. Where consultancy terms are imposed because the research is more “applied”, it is most likely no on-going rights can be preserved. It is also worth bearing in mind that if you are working in collaboration with other universities, there may be joint ownership issues. These need to be agreed and should be covered by the proposal and collaboration agreement. Advice on the terms and conditions governing research projects should be sought from the Research Support Officer in your Faculty.

Use of third party data

If you use data owned by a third party (copyright material, software or database), you need to understand the terms under which these are obtained and the scope of use. It is necessary to obtain permission from the data owner for re-use of such material, unless conditions of re-use have been explicitly indicated, for example, with a Creative Commons licence. It is your responsibility to ensure you comply with the terms that apply. Advice on terms and conditions for the in-licensing of data or software can be sought from the Research Support Officer in your Faculty. In most instances, these are not negotiable. However, it may be possible to seek specific use terms or negotiate different licensing arrangements more appropriate to your specific requirements. It may be that in some circumstances, a commercial licence offers more freedom-to-operate than provisions for academic purposes.

You may also find that the terms of use of some data services, such as Census statistics, require you to deposit derived work with them. When depositing data in a repository you will be required to agree to a licence that asserts you have the rights to deposit that data.

Publication agreements

Some publishers require that supporting data be submitted with articles for publication; in other cases that supporting data is deposited in a designated service or repository. Particular consideration should be given to any rights the publisher asks you to assign. While it has been common practice for publishers to seek transfer of copyright for research papers, this is not yet established for supporting data. Broadly, you should ensure that rights you are asked to assign for data do not conflict with the University regulations on IPR. It is increasingly recognised that researchers should explicitly retain the rights to reuse and share their own data and publications, through data and publication repositories, for example, and should avoid assigning rights that prevent this

Other restrictions

There is often a time dimension to decisions on when to release data for sharing. With the emergence of networked data management services, best practice will be to store data at the point of generation, but to specify when that data might be shared, e.g. after further processing, after validation, after publication, etc. In some cases, restrictions may be set by others and apply for a set period, e.g. an embargo, after which data can be shared.

Restrictions on access to research data during and after the end of the project need to be addressed in the initial research proposal and throughout the life of the project as part of your data management plan. Some Funders may require statements justifying why data should be restricted as part of their application process. This may also need to be addressed in ethics applications.

Note that those datasets identified as being unsuitable for sharing still need to be managed and stored in accordance with the University policy. Access can be restricted by adding the appropriate metadata on the deposit form. Identification and security of sensitive data is important. Sensitive data should be flagged at the start of a project in a data management plan.

Any request for access to your research data under the Freedom of Information Act should be directed to the University FOI Officer in Legal Services.

We recommend that datasets should have an explicit licence to allow re-use. We would recommend that Creative Commons are used for datasets. See below for more information or read the DCC’s guide How to License Research Data for in depth discussion.

For software licences, see the guidance from the Software Sustainability Institute: Choosing an open-source licence

The Licences

Attribution
CC BY

This licence lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation. This is the most accommodating of licences offered. Recommended for maximum dissemination and use of licensed materials.

View Licence Deed | View Legal Code

Attribution-ShareAlike
CC BY-SA

This licence lets others remix, tweak, and build upon your work even for commercial purposes, as long as they credit you and license their new creations under the identical terms. This licence is often compared to “copyleft” free and open source software licences. All new works based on yours will carry the same licence, so any derivatives will also allow commercial use. This is the licence used by Wikipedia, and is recommended for materials that would benefit from incorporating content from Wikipedia and similarly licensed projects.

View Licence Deed | View Legal Code

p

Attribution-NoDerivs
CC BY-ND

This licence allows for redistribution, commercial and non-commercial, as long as it is passed along unchanged and in whole, with credit to you.

View Licence Deed | View Legal Code

Attribution-NonCommercial
CC BY-NC

This licence lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms.

View Licence Deed | View Legal Code

Attribution-NonCommercial-ShareAlike
CC BY-NC-SA

This license lets others remix, tweak, and build upon your work non-commercially, as long as they credit you and license their new creations under the identical terms.

View License Deed | View Legal Code

Attribution-NonCommercial-NoDerivs
CC BY-NC-ND

This licence is the most restrictive of our six main licences, only allowing others to download your works and share them with others as long as they credit you, but they can’t change them in any way or use them commercially.

View Licence Deed | View Legal Code

CC0

Creative Commons also provides tools that work in the “all rights granted” space of the public domain. Their CC0 tool allows licensors to waive all rights and place a work in the public domain, and our Public Domain Mark allows any web user to “mark” a work as being in the public domain.

GNU General Public Licence

This licence is usually applied to software. It allows others to copy, distribute and modify the software as long as they keep track of the changes in source files and keep modifications under General Public Licence.

View Legal Code

Apache v2

This licence is usually applied to software. It gives a lot of freedom to others, including an explicit right to a patent. Any modification to files need to be included in a notice if the software is re-used.

View Legal Code

All Rights Reserved

This is a restrictive licence that means the author(s) of the work (e.g. dataset or software) retains all rights over it and it cannot be re-used in any way without explicit permission.

The data access statement should be included in the submitted manuscript, even if the identifiers have not yet been issued. The statement should be updated to include any persistent identifiers or accession numbers as they become available, typically when the manuscript is accepted for publication.  Storage should be in a stable location or repository that provides a persistent identifier such as a Digital Object Identifier (DOI). To request a DOI for data deposited in our institutional repository, please complete the  DOI for Data form

If the data themselves are not openly available, the data access statement should direct users to a permanent record that describes any access constraints or conditions that must be satisfied for access to be granted

Note: Directing interested parties to the corresponding author would not normally be considered sufficient to meet the requirements of funders such as the EPSRC.

There is no set format for the data access statement.  The following are recommendations on what to include:

Examples of data access statements

Openly available data

  • name(s) of the data repositories
  • persistent identifiers or accession numbers for the dataset.

"All data supporting this study are openly available from the University of Southampton repository at https://doi.org/10.5258/SOTON/xxxxx"

For example see: Squicciarini, G., Toward, M.G.R. and Thompson, D.J. (2015) Experimental procedures for testing the performance of rail dampers. Journal of Sound and Vibration, 359, 21-39. DOI: 10.1016/j.jsv.2015.07.007

Restricted access - ethical, legal, commercial

  • include justification for restriction
  • document reasons, for example,
    • the ethics approval reference number in metadata
    • collaborative agreements
    • data management plan for the project.

"Due to ethical concerns, supporting data cannot be made openly available. Further information about the data and conditions for access are available from the University of Southampton repository: https://doi.org/10.5258/SOTON/xxxxx"

"Bona fide researchers, subject to registration may request supporting data via University of Southampton repository https://doi.org/10.5258/SOTON/xxxxx"

For example see: Krishnaveni, G.V., Veena, S.R., Srinivasan, K., Osmond, C. and Fall, C.H.D. (2015) Linear Growth and Fat and Lean Tissue Gain during Childhood: Associations with Cardiometabolic and Cognitive Outcomes in Adolescent Indian Children. Plos One, 10 (11).

Secondary analysis of existing data

If your data are the result of re-using existing data,

  • the original source(s) should be credited.

"This study was a re-analysis of existing data that are publicly available from [organisation] at [web address]"

For example see: Mcdonagh, E.L., King, B.A., Bryden, H.L., Courtois, P., Szuts, Z., Baringer, M., Cunningham, S.A., Atkinson, C. and Mccarthy, G. (2015) Continuous Estimate of Atlantic Oceanic Freshwater Flux at 26.5 degrees N. Journal of Climate, 28 (22), 8888-8906.

No new data created

  • e.g. mathematical proof.

 

"No new data were created during this study"

If the examples above do not cover your situation contact ResearchData@soton.ac.uk for further advice.
 

As with other academic work, proper citation is essential - this acknowledges scholarship and allows the data to be located more easily. Standards for data citation are still in development, but the UK Data Service has a useful basic guide (see below).

The inclusion of a DOI (Digital Object Identifier) or a URI (Uniform Resource Identifier) can help with locating the data as well as allowing links to be made from related publications.  Details about data DOIs can be found from the DataCite metadata services information.

DOIs (Digital Object Identifiers) can be assigned to individual elements, or to the whole dataset.

DOIs @ Southampton

The University of Southampton via ePrints Soton is able to provide a DOI for data subject to the data meeting certain criteria [policy in development].  For further information about DOIs go to DOI for Data or contact ResearchData@soton.ac.uk.

Loading ...

Help from other University Services

Legal Services

Collaboration Managers Team (Research & Innovation Services)

Research Support Team (Research & Innovation Services)