Skip to Main Content

Research Data Management: Approaching completion

Approaching completion
 

Approaching completion

Your research data is valuable and should be preserved for future use. Early planning is crucial to ensure appropriate steps are taken. Data preservation maximizes its value by allowing for further analysis and sharing. While not all data is suitable for sharing, the trend is towards openness, with restrictions only when necessary.

Why preserve data?

Research data is an important resource in its own right and should not just be forgotten when a project finishes. Significant time and effort is invested by researchers in collection, collating, cleansing and structuring data to use in their research, and this time and effort should be recognised.

University of Southampton requirements

The University of Southampton Research Data Management Policy stipulates that researchers should preserve and provide appropriate access to any research data which underpins a research output for as long as it has continuing value - but for a minimum of ten years after publication or public release of the research.

Researchers are strongly encouraged to deposit their data (along with sufficient descriptive metadata) in an appropriate repository or archive, and to ensure that there is at least a data record in the university repository, our own archive for research outputs, including datasets.

Funder requirements

Many funding bodies now require that data is preserved for a specified period (often between three and ten years) after the end of the project, and made available for reuse where this is appropriate. Certain funders may also require you to use a specific repository for storing your data. You can find out about the policies of different funders in the Funder requirements section.

Planning for preservation

The sooner you start to plan for preservation, the easier it is to incorporate into your research practices and the less time you need to spend overall.

According to the University's RDM policy, "significant" data should be kept. The creator of the datset is usually the best person to decide what is significant. Keeping everything is not always the best solution as it can make it harder to find the material you need. Selecting data to retain involves several factors. The Digital Curation Centre provides a checklist to guide this process. Considerations include:

  • Is the data essential to your research?
  • Does it underpin published work?
  • Could it be valuable to other researchers?
  • Is it unique or historically significant?
  • Do you have the legal rights to retain and reuse it?

When thinking about potential reuse, think outside your own discipline: some data may be of interest to researchers in other disciplines, or to members of the general public, or it may be of use for educational or training purposes. For instance, historical data—like old weather logs—has proven invaluable in ways its creators could not have anticipated. Additionally, consider if the data can be replicated; experimental data may be reproducible if a comprehensive methodology is available, while time series or survey data is often irreplaceable and may be highly valuable.

Useful information:

Research data is not always digital. It can include artefacts, ice cores, physical samples, fossils, and more. You can add records to Pure describing your physical collections and detailing how they can be accessed. This allows other researchers to discover what resources the University holds. The Research Data team can even issue Digital Object Identifiers (DOIs) for non-digital objects. The "Digital" in DOI refers to the identifier being digital, not the object itself.

Secondary data is pre-existing information collected by others, and may be quantitative or qualitative. Accessing it may require permission, and you may not have the right to retain or re-share it post-project.

You should dispose of third-party data securely per the terms agreed in the original data-sharing agreement.

Generally, we only accepts secondary data deposits into the repository if you have re-sharing rights and have modified the data from its original form.

On the whole it is good to preserve as much data as possible. However, there are situations in which not all data can be kept. This may be for practical reasons (for example, because the quantity of data is such that it is not feasible to store it all), or because the data contains sensitive or confidential information which needs to be deleted after a certain point. Key considerations include:

  • Honouring any commitments made to research participants (e.g. on consent forms)
  • Compliance with data protection legislation
  • Institutional expectations around ethical practice

For example, Data Protection specifies that personal data should not be retained for longer than necessary. Thus you may opt to remove personal identifiers from a dataset at the end of the project, so that an anonymised version of the data may be preserved. It should be noted, however, that deleting obvious identifiers (names, email addresses, and so on) may not be sufficient to fully anonymise a dataset: it may still be possible to deduce someone's identity by combining other pieces of information (a postcode and a rare medical condition, for example). Additionally, some types of data, such as video recordings, are very difficult to anonymise adequately. Data creators will need to consider what can realistically be achieved without significantly reducing the value of the dataset, and then plan a suitable preservation strategy in the light of this.

The questions of which data should be preserved and of which data should be shared with a view to reuse need to be considered separately. Data which needs to be preserved but which is not suitable for sharing can be stored in a secure archive. In some cases, it may be appropriate to have multiple versions of a dataset: for example, an anonymised one which can be shared openly, and one retaining more personal information to which access is restricted. Making data available for reuse by others is covered in more detail below.

Research data is only useful if the data can be read and understood. Will you still be able to access your file in 20 years?

Where possible use open formats which can be read by a wide range of software. Using then for your dataset deposit will increase the longevity of your data.  Consider exporting your material to plain text files,  CSV, XML, JSON, HTML for text,  TIFF or PNG for images, MP3 for audio files. The UK Data Archive lists recommeneded file formats, see also the Library's guide to file formats.

Lossy formats are those where the process of data compression involves permanent destruction of some information. This results in a smaller file size, but also means that the original version of the file cannot be perfectly reconstructed.

If possible, lossless formats should be used for preservation. Where compromises need to be made (if, for example, storing a collection of very large files would be prohibitively expensive), then careful consideration needs to be given to the impact on future reusability. Lossy compression will make a more significant difference to some types of data than others: for example, audio recordings of spoken word can often be compressed without an audible loss of quality, when the same degree of compression of music would result in noticeable degradation.

Lossless formats include RAW and PNG for images, or FLAC for audio. Some formats, including TIFF for images and WAV for audio, can use either lossy or lossless compression methods.

Thought needs to be given at an early stage to the costs of preserving data so that these can be included in the funding application. The largest share of costs for data is incurred in the preparation and ingest to the selected storage service, as shown by the costing tool provided by the UK Data Archive.

It may be necessary to budget for additional time and effort to prepare data for preservation, and some data archives levy a charge for deposits. Funders usually will only pay for costs incurred during a project so archival storage costs will have to be invoiced during the grant period when data is deposited rather than a rolling annual cost. Most funding bodies will cover reasonable costs, you can check with your funder what support is available.

The University's Research Data Management Policy requires that all significant Research Data should be held for a minimum of 10 years and longer where the data is actively used.

Funders also have retention requirements and some research data will also be subject to legal requirements. The Digital Curation Centre (DCC) has a summary of the requirements of the major funders. If your funder is not on this list please use the Jisc open policy finder service to check if your funder has any requirements for research data.

Keeping everything may seem appealing, but it makes finding essential material harder and has significant environmental costs. Ensure any data you choose not to retain is securely destroyed.

Try not to limit your thinking to the confines of the original research: while data may certainly be valuable to those working in a similar field to your own, it could also have applications which are harder to predict. Data can sometimes turn out to be useful to researchers in other disciplines, to members of the general public, or for educational or training purposes. Remember that much historical data was originally collected for reasons that had nothing to do with academic scholarship, but has subsequently proved to be a treasure trove for researchers.

Rather than asking which data should be shared, it may be helpful to turn the question on its head, and ask instead which data cannot be shared, with a view to sharing everything else. Projects which produce very large quantities of data may need to make practical decisions about how much it is feasible to share. Other reasons for keeping data private may include confidentiality, intellectual property issues, or plans to seek a patent. However, even if data cannot be shared openly, it is worth considering whether it could be made available in a restricted manner. If you think your data should only be available on request, see the "Levels of access in the University of Southampton Repository" in the "Embargoes and restrictions" section below of this page for more details and contact researchdata@soton.ac.uk to discuss the options available.

Choices made early on in a project may influence how easy it is to share data later, and thus it's important to plan ahead, with preservation and reuse in mind right from the start. For example, it is essential that any research involving human participants secures appropriate consent, and it is much more straightforward to do this at the point when the data is collected, rather than trying to go back and do it retrospectively. Often you can still share personal or pseudonymised information ethically with other researchers if the original participants were informed. See the UK Data Archive’s Consent and Ethics section for more information and advice.

Why share data?

Data-sharing is encouraged by all UK Funding Councils and the University's Research Data Management Policy. As a general rule of thumb, it is good practice to make as much data and metadata as possible available for reuse. At a minimum, you should aim to make all data which supports research findings or conclusions available alongside metadata describing the data however all funders recognise there may be specific reasons for keeping the data private or on restricted access. Even when the research data cannot be openly shared, your project may well produce other materials are well worth sharing.

Planning for sharing

The Library's Data Sharing Guide provides information on how data can be shared ethically and legally post-project.

Identifying what and how data can be shared should be considered at the beginning of your project. Both anonymising and pseudonymising your data allow sharing. Please see the library guide on Sensitive Data and GDPR. We also recommend this useful guide on how to anonymise qualitative and quantitative data from the UK Data Service.

The UK Data Archive Text Anonymisation Helper tool (downloads a zip file) can help you find disclosive information to remove or pseudonymise in qualitative data files. The tool does not anonymise or make changes to data but uses MS Word macros to find and highlight numbers and words starting with capital letters in text. Numbers and capitalised words are often disclosive, e.g. as names, companies, birth dates, addresses, educational institutions and countries.

For open datasets, we recommend that no more than two indirect identifiers be left in the dataset. More indirect identifiers can remain in the data if it is available on restricted access, to bone fide researchers with ethics clearance and who are bound by data sharing agreement.

If data contains sensitive or confidential information, this may be a barrier to sharing - especially if the wish is to share it without restrictions. For example, data which includes personal information about living identifiable individuals will need to be anonymised or pseudonymised, unless explicit consent has been obtained from the research subjects to share the non-anonymised version.

It should be noted, however, that deleting obvious identifiers (names, email addresses, and so on) may not be sufficient to fully anonymise a dataset: it may still be possible to deduce someone's identity by combining other pieces of information (a postcode and a rare medical condition, for example). Additionally, some types of data, such as video recordings, are very difficult to anonymise adequately.

For advice on anonymisation, you can contact the University's Information Compliance team.

Data may sometimes need to be redacted for other reasons: for example, it may deal with the location of endangered species, or may contain third party material to which someone else controls the intellectual property rights. It is helpful to indicate what sort of information has been removed and why (insofar as doing so is compatible with the reasons for which the data has been redacted), so that data reusers will be able to make sense of any gaps in the dataset. Redacting data can sometimes make it less representative: explaining the steps taken can help guard against inadvertent misinterpretation.

There are many approaches or techniques that may be used to render data less sensitive in some way, and what is appropriate may vary considerably from case to case. A balance needs to be struck between protecting participants (and the original researcher) or removing other information that could be misused, and not unduly degrading the data. If it is not possible or practical to produce a dataset that can be shared openly without significantly reducing the value of the data, it may be necessary to consider other options - for example, deposit in an archive which offers access restrictions.

Good metadata practice aids in tracking your work and allows others to discover and interpret your data effectively. Rich metadata is a key aspect of the FAIR principles for data.

Datasets may sometimes need to be tidied or otherwise edited as part of the process of creating a preservation dataset. However, it is usually quicker and easier to document data as it is created rather than attempting to fill in all the gaps at the end of the project. Documentation written during a project to describe methodology, project progress, and other aspects of research activity can often be put to new use. There is more information on describing data on our During research page.

It can be helpful to ask a colleague who has not previously worked closely with the dataset to review the documentation: it is often easier for someone with an outside perspective to spot gaps, or to identify things that need additional clarification.

Determining authorship is an important part of conducting collaborative research. Authorship disputes can often be avoided if the criteria for determining authorship and authorship order are discussed and agreed upon during the research planning stage. The University Author, Contribution and Publishing policy guides authored publicly available works, which include but are not limited to journal articles, software and datasets, and apply to any contributor publishing and disseminating research outputs.

It is possible for the creators and contributors on a dataset to be different from the authors on the related paper. For example, dataset contributors can include data collectors, data curators and others who were not involved in the later analysis of the data and writing of the research paper.

In the Datacite metadata schema, the creators of a dataset are defined as "The main researchers involved in producing the data [...] in priority order."

In addition to the creators of a dataset, you can add contributors who have played a significant role in the development of the dataset. The current version of the datacite schema, lists the following contributor types.

  • Contact Person
  • Data Collector
  • Data Curator
  • Data Manager
  • Distributor
  • Editor
  • Hosting Institution
  • Producer
  • Project Leader
  • Project Manager
  • Project Member
  • Registration Agency
  • Registration Authority
  • Related Person
  • Researcher
  • Research Group
  • Rights Holder
  • Sponsor
  • Supervisor
  • Work Package Leader
  • Other

You should include a README file with your dataset to provide key details. We have a basic README template you can use. At a minimum, the README should include:

  • Information about when, where, and by whom the dataset was created, and for what purpose
  • A description of the dataset
  • Details of methods used
  • Details of what has been done to the data - for example, has it been cleansed, edited, restructured, or otherwise manipulated, and if so, how?
  • Explanations of any acronyms, coding, or jargon
  • Units of measurement
  • Annotation of any anomalies (or apparent anomalies) where the reason for these is known
  • Any other notes which will help aid proper interpretation

ReadMe files examples:

Further ReadMe files guidance:

How to share & where to deposit

Shareable research data should be FAIR - findable, accessible, interoperable and re-useable. Make it FAIR by depositing your data in an open format in an appropriate institutional or disciplinary data repository at time of publication of the associated research output or at the end of the project with a persistant identifer, such as a Digital Object Identifier (DOI). Ensure the data has a clear licence and good descriptive metadata.Include a data access statement in any related publications.

The best repository to choose for your research data will be a national data centre or discipline specialist repository, because they have the expertise and resources to deal with particular types of data. If no subject repository for your data exists, or if the dataset is small and underpins a publication, you can deposit it in the University of Southampton Repository.

There is no set format for the data access statement so we have listed examples below for you to use.

You can deposit small datasets (gigabytes in size) in our institutional repository, ePrints Soton for long-term storage. If your dataset is a terrabyte or over in size, contact researchdata@soton.ac.uk to discuss how to deposit your data.

Data can be open or on request depending on the nature of the data. Contact researchdata@soton.ac.uk for more information.

In order to deposit your data please do the following:

  1. Log into Pure: https://pure.soton.ac.uk/
  2. On your Pure personal overview page on the right click ‘add content’ select ‘dataset’. A new template will open up for you to add details of the dataset.
  3. Any fields with a red * must be completed. These are Title, People, Dataset managed by, Publisher and Date made available. Your record will not save unless you have filled them in. Please fill in other fields if you think they will be useful.
  4. In the Data availability section you can add your data files to Electronic data. All file formats are accepted (although please try to ensure you files are a type recommended for preservation) and you can add multiple files at once. If you have many files, consider zipping them together. Include a ReadMe file.
  5. Please choose an appropriate Data Licence (CC-BY or other)and Type (eg Dataset, Image) to all files. Click OK.
  6. If you think your data should only be available on request, please contact researchdata@soton.ac.uk to discuss the options available.
  7. In Relations to other content, link your dataset to Publications, Projects or other Datasets in Pure.
  8. At the bottom of the page under Status, save the record as For Validation.
  9. Request a DOI for your dataset to include in the funder acknowledgement and data access statement in your publications. Ideally you should create a record for your dataset and request the DOI prior to submitting your manuscript.

There is an extensive guide on depositing data at Southampton (pdf) if you need more information on how to deposit data into Pure.

Please also see our data deposit videos at the Support page and our Creative Commons licences: An introduction guide

If you deposit your data elsewhere, please create a dataset catalogue entry in Pure linking to where the data is sorted.

Where possible we recommend using discipline-specific data repositories, you can find one for your subject via Re3data.org.

Some funders expect data to be deposited in specific data centres e.g. ESRC and NERC support dedicated data centres. Also consider whether any agreements with your collaborators include requirements for data deposit.

If you have an option to deposit in a repository associated with your funder, or your publication will pays for deposit in Dryad it is worthwhile considering this.

If you don't have funding to pay for deposit, or your funder does not provide a repository, you can use Zenodo, hosted at CERN. Zenodo takes deposits of up to 50GB per dataset (you can have multiple datasets), for larger amounts, contact the repository. Datasets can be open or closed. There is no charge, although donations welcomed for oversize deposits.

You may be able to publish your data in a data journal. This is a growing and fast moving area. Some publishers are now requiring the deposit of supporting data with the article, while others require that a link to the data is provided. You will need to take this into account when considering how long you will need to retain the data and may influence your choice of storage location.

Small datasets (up to 16 GB gigabytes in size) can be deposited in ePrints Soton via Pure for long-term storage. Please contact researchdata@soton.ac.uk for depositing large datasets (16GB or greater) or consider an external multidisciplinary repository such as Zenodo, hosted at CERN. Zenodo takes deposits of up to 50GB per dataset (you can have multiple datasets), for larger amounts, contact the repository. Datasets can be open or closed.

NB. Please create a dataset entry in Pure which includes a link to where the data is stored if you choose to deposit in an external repository

We recommend that datasets should have an explicit licence to allow re-use. We would recommend that Creative Commons are used for datasets. This will make it clear how the data can be reused (e.g. Creative Commons Attribution). If you do not add a licence to your data - as a default we will add a CC-BY license to open and embargoed data. This is an attribution licence and will ensure re-use of your data is as open as possible  - “transparency, openness, verification and reproducibility are important to Open research” which we promote at the University of Southampton. (UKRI 2021).

Please see our single page guide explaining the basics of Creative Commons licences, Creative Commons (CC) licences: an introduction

See below for more information or read the DCC’s guide How to License Research Data for in depth discussion.

For software licences, see the guidance from the Software Sustainability Institute: Choosing an open-source licence.

 

The Licences:

Attribution
CC BY

This licence lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation. This is the most accommodating of licences offered. Recommended for maximum dissemination and use of licensed materials.

View Licence Deed | View Legal Code

Attribution-ShareAlike
CC BY-SA

This licence lets others remix, tweak, and build upon your work even for commercial purposes, as long as they credit you and license their new creations under the identical terms. This licence is often compared to “copyleft” free and open source software licences. All new works based on yours will carry the same licence, so any derivatives will also allow commercial use. This is the licence used by Wikipedia, and is recommended for materials that would benefit from incorporating content from Wikipedia and similarly licensed projects.

View Licence Deed | View Legal Code

Attribution-NoDerivs
CC BY-ND

This licence allows for redistribution, commercial and non-commercial, as long as it is passed along unchanged and in whole, with credit to you.

View Licence Deed | View Legal Code

Attribution-NonCommercial
CC BY-NC

This licence lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms.

View Licence Deed | View Legal Code

Attribution-NonCommercial-ShareAlike
CC BY-NC-SA

This license lets others remix, tweak, and build upon your work non-commercially, as long as they credit you and license their new creations under the identical terms.

View License Deed | View Legal Code

Attribution-NonCommercial-NoDerivs
CC BY-NC-ND

This licence is the most restrictive of our six main licences, only allowing others to download your works and share them with others as long as they credit you, but they can’t change them in any way or use them commercially.

View Licence Deed | View Legal Code

CC0

Creative Commons also provides tools that work in the “all rights granted” space of the public domain. Their CC0 tool allows licensors to waive all rights and place a work in the public domain, and our Public Domain Mark allows any web user to “mark” a work as being in the public domain.

GNU General Public Licence

This licence is usually applied to software. It allows others to copy, distribute and modify the software as long as they keep track of the changes in source files and keep modifications under General Public Licence.

View Legal Code

Apache v2

This licence is usually applied to software. It gives a lot of freedom to others, including an explicit right to a patent. Any modification to files need to be included in a notice if the software is re-used.

View Legal Code

All Rights Reserved

This is a restrictive licence that means the author(s) of the work (e.g. dataset or software) retains all rights over it and it cannot be re-used in any way without explicit permission.

The following are examples and recommendations on what to include. If these examples do not cover your situation, contact ResearchData@soton.ac.uk for further advice.

To request a DOI for data deposited in our institutional repository, please complete the DOI for Data form.

Recommendations and Examples
Situation Statement Data Access Statement Examples

Openly available data

Name(s) of the data repositories

Persistent identifiers or accession numbers for the dataset.

"All data supporting this study are openly available from the University of Southampton repository at https://doi.org/10.5258/SOTON/xxxxx" Squicciarini, G., Toward, M.G.R. and Thompson, D.J. (2015) Experimental procedures for testing the performance of rail dampers. Journal of Sound and Vibration, 359, 21-39. DOI: 10.1016/j.jsv.2015.07.007

Restricted access - ethical, legal, commercial

Include justification for the restriction.

Document reasons. For example: the ethics approval reference number in metadata or the collaborative agreements or the data management plan for the project.

"Due to ethical concerns, supporting data cannot be made openly available. Further information about the data and conditions for access are available from the University of Southampton repository: https://doi.org/10.5258/SOTON/xxxxx"


"Bona fide researchers, subject to registration may request supporting data via University of Southampton repository https://doi.org/10.5258/SOTON/xxxxx"

Krishnaveni, G.V., Veena, S.R., Srinivasan, K., Osmond, C. and Fall, C.H.D. (2015) Linear Growth and Fat and Lean Tissue Gain during Childhood: Associations with Cardiometabolic and Cognitive Outcomes in Adolescent Indian Children. Plos One, 10 (11).

Secondary analysis of existing data

If your data are the result of re-using existing data, the original source(s) should be credited.

"This study was a re-analysis of existing data that are publicly available from [organisation] at [web address]"  Mcdonagh, E.L., King, B.A., Bryden, H.L., Courtois, P., Szuts, Z., Baringer, M., Cunningham, S.A., Atkinson, C. and Mccarthy, G. (2015) Continuous Estimate of Atlantic Oceanic Freshwater Flux at 26.5 degrees N. Journal of Climate, 28 (22), 8888-8906.
No new data was created
such as a mathematical proof.
"No new data were created during this study"  

An ORCID iD is a persistent digital identifier, used worldwide, that you own and control, and that distinguishes you from every other researcher, even if you move institution or change name. ORCID iDs look like a credit card number: 0000-0001-8414-9272.

You can connect your iD with your professional information - affliations, grants, publications, peer review, and more. You can use your iD to share your information with other systems, ensuring you get recognition for all your contributions, saving you time and hassle, and reducing the risk of errors.

You can register for an ORCID iD or to connect your existing ORCID iD with the university by using Pure. See the guide on Your ORCID iD and Pure for more details.

Your ORCID record is owned and managed solely by you, not the University.

As with other academic work, proper citation is essential - this acknowledges scholarship and allows the data to be located more easily.

The inclusion of a DOI (Digital Object Identifier) or a URI (Uniform Resource Identifier) can help with locating the data as well as allowing links to be made from related publications. DOIs (Digital Object Identifiers) can be assigned to individual elements, or to the whole dataset. For further information about getting a DOI at Southampton go to DOI for Data or for information about data DOIs more generally see DataCite.

Information on how to correctly cite a dataset can be found on the DataCite website, following the principles outlined by Force11 in 2014.

The recommended format for data citation usually comprises the following components:
Creator (PublicationYear). Title. Publisher. Identifier

If applicable, information about two other properties, Version and ResourceType, should also be included
Creator (PublicationYear). Title. Version. Publisher. ResourceType. Identifier

Example:

Voutsina, Nikol, Chapman, Mark and Taylor, Gail (2016) Data in support of 'Characterization of the watercress (Nasturtium officinale R. Br.; Brassicaceae) transcriptome using RNASeq and identification of candidate genes for important phytonutrient traits linked to human health. University of Southampton Dataset. doi:10.5258/SOTON/394656

Further reading

Embargoes and restrictions

Not all data is suitable for sharing openly. However, even if data cannot be shared openly, or shared immediately, this does not automatically mean it cannot be shared at all.

Data for open publication should also not have two or more indirect identifiers (listed below) as that can lead to re-identification through a process called 'triangulation'. You should remove or modify one or more of the indirect identifier until the risk of re-identification is neglible. If you are unsure or require more advice, please contact researchdata@soton.ac.uk. Indirect identifiers include:

  • Place/location of treatment, education, service use
  • Name of professional or business/service responsible for healthcare, education, service
  • Gender
  • Rare disease, condition, experience, treatment, or other characteristic
  • Risky behaviours (e.g. Illicit drug use)
  • Place of birth
  • Socioeconomic data, such as occupation or place of work, income, or education level
  • Household and family composition
  • Body measures (e.g. height, weight)
  • Multiple pregnancies
  • Ethnicity
  • Year of birth or age
  • Verbatim responses or transcripts
  • Dates of sensitive events
  • Small sample sizes i.e. when the number of subjects with a certain characteristic is small

List courtesy of University of Bristol (2023),

Publication vs On Request Access

The more that anonymised data is aggregated and non-linkable, the more possible it is to publish it. However this may remove valuable information from the data, pseudonymised data is often valuable to researchers because of the granularity it affords, but carries a higher risk of re-identification. Instead of making this data openly available, it may be preferable to release the data, on request, to other bone fide researchers using non-disclosure data sharing agreements. This allows more data to be disclosed than is possible with wider or public disclosure. Information security controls still need to be in place and managed.

There are a number of legitimate reasons for not making data publicly available. Some of these have to do with the nature of the data itself (e.g. the data is confidential or otherwise sensitive), whereas others result from the nature of the research process (e.g. researchers may still be working on their primary analysis, or may be intending to seek a patent).

If you intend to share confidential data, you may need a data sharing agreement to be in place. Consult the Contracts team [internal site] in in Research and Innovation Services (RIS) for more information on this.

If you are planning a patent application, it is important to avoid prior disclosure. Before sharing, seek advice from Intellectual property and Commericalisation team [internal site] in Research and Innovation Services (RIS).

Any request for access to your research data under the Freedom of Information Act should be directed to the University FOI Officer in Legal Services.

Note that those datasets identified as being unsuitable for sharing still need to be managed and stored in accordance with the University policy. Access can be restricted by adding the appropriate metadata on the deposit form. Identification and security of sensitive data is important. Sensitive data should be flagged at the start of a project in a data management plan.

Your dataset can be open or on request. It is possible to have a mixture of access levels within a dataset so that some data is openly available and some data is on request.

Datasets can be on request because:

  • the creator wants to know who is using the dataset (for impact),
  • the dataset is too sensitive to be openly available for ethical or commerical reasons,
  • the dataset is too large for Pure/ePrints.

Where the dataset is too sensitive to be openly available, there is a further choice of access conditions:

  • Bona fide researchers, subject to registration and ethical approval may request supporting data from the University of Southampton repository.
  • Researchers who have agreed the usage / data access agreement will be given access. There is not currently a generic data sharing agreement, the dataset creators should liaise with RIS to supply a data sharing agreement if they require anything beyond the basic on the data request form.
  • A custom condition - please supply

The dataset creators will need to choose who can make the decision to supply the dataset:

  • the Library Research Data Service,
  • specific people (please specify) - typically one or more of the dataset creator(s), and / or their supervisor (for PGRs). If all of these people have left the university the request will be escalated within the faculty as appropriate,
  • another arrangement – please specify.

Whatever conditions are set, the Research Data team in the Library will act as the first point of contact for any requests.

Funders recognise that researchers are entitled to a period of privileged access during which they can work on the data before making it available to others. However, if this period extends after the formal end of a project - because researchers are waiting for publications to appear, for example - the point at which the data becomes shareable may occur when researchers have already moved on to the next research question or changed institution; this makes sharing significantly less likely to happen.

An easy solution to this problem is to deposit a copy of the data with an archive which allows data to be placed under a fixed term embargo, such as the university's institutional repository. This typically means that a metadata record will be available for the data, but the data itself will not be downloadable until the embargo has expired.

Another advantage of this approach is that the data is citable even before it is publicly available, and thus can be referenced in research paper's data access statement.

The length of embargo that is deemed appropriate varies between disciplines and between funding bodies, for example the BBSRC assume a period of 12 months after the end of project funding is sufficient time. Funders frequently stipulate that data should be made available as soon as possible; some specify a particular time frame (which can sometimes be shorter than researchers would like it to be), though there may be room for negotiation if there are good reasons to delay.

At the end of your project

It is important that, as well as planning for what and where to keep your data after the project, you also plan what and how it will destroyed.

On Your Computer

  • University Windows build computers: deleting from My Documents will delete from the server (deleted items may end up in the Recycle Bin which will be required to be deleted)
  • University Windows build computers: for locally stored data on a desktop or C drive, data will be deleted and moved to the Recycle Bin which then must be emptied.
  • Deleting from a network drive/research filestore: will delete files with no copy retained in the Recycle Bin.
  • Mac laptops/desktops: deleted items are moved to Trash which must be emptied. The "Secure Empty Trash" utility is available from the "Finder" menu

N.B. When data is stored on an iSolutions server or on SharePoint/One Drive/Teams, it will still exist on backups for some time (usually 90 days).

In Your Email

To delete emails: empty the mailbox and then purge. Outlook/Exchange has a facility to allow deleted emails to be recovered up to 30 days after they have been deleted.   There is also a facility within Outlook to purge deleted emails altogether so that they cannot be recovered.

Physical Equipment

To enable the secure disposal of electronically held data, it is not sufficient to simply delete the folder or file(s) from your PC or other device, because the data remains on the media but the space previously allocated is now available to be overwritten at some point in the future. Even reformatting the storage is not guaranteed to make the data unavailable. There are several methods to ensure that electronic data held on magnetic and solid-state media is secure from unauthorised access.

  • A secure wipe should be carried out. This process can take some time to complete.
    • DBAN is a popular and free disk-wiping utility and is available for Macintosh, Linux and Windows PCs
    • On Macintosh (10.3 or greater) the "Secure Empty Trash" utility is available from the "Finder" menu
    • Digital voice recorders/tape cassettes/video cassettes: follow the manufacturer’s instructions to carry out a hard reset of the device
  • Physical destruction of the disk

    The University operates the WEEE (Waste Electrical and Electronic Equipment) regulation to ensure equipment is disposed of safely and securely. The University retains an external company that specialises in destroying magnetic and solid state media and will provide a "Certificate of Assurance". To arrange for the disposal of any electronic equipment please complete the the IT Disposal request form.

CDs/DVDs

It is recommended that optical media such as DVDs and CDs are physically destroyed. A simple method is to use a suitable shredder. Note that not all shredders are capable of destroying optical media, please check the suitability of your shredder before using this method.

Printed Materials

It is important that any data identified as sensitive and/or confidential and is not to be retained, whether for legal, ethical or other reasons, is destroyed carefully.  The University Estates and Facilities provide a service for the removal of confidential waste. Requests for the removal of confidential waste should be made via Planon on SUSSED.

If your data is highly sensitive you should seek advice from within your Faculty/Research/Academic group to confirm that the confidential waste service is appropriate for your material. If you are shredding your sensitive material you should use cross-cut shredders with a minimum standard of DIN 4.

The UK Data Service has more information about Data disposal (UK Data Archive)

When you leave the university, you will lose access to many systems (including PURE) the day after your employment ends, and therefore must ensure that any unregistered or unvalidated DOI dataset records are completed. It is your responsibility to arrange with your Head of Department/nominated members of departmental staff regarding where your data will be stored and who will have access to it after you leave the University. Your funder or contract requirements may require you to leave a copy of the data in the care of your department to ensure legal or other regulatory compliance. University IT (iSolutions) provides information on storing your files.

If you have an ORCID identifier (a persistent digital identifier that is unique to you), your publication list will travel with you if you leave the university through your ORCID profile. If you have added your ORCID id to your profile in Pure, people will be able to contact you and your latest research from older papers and datasets that they find in our institutional repository. (see our guide on ORCID IDs and Pure for more information).

Staff must return any University property they have before they leave. Depending on local practices, this may be returned to their line manager or an administrator in their area. Please see the Leaving the University page.