LibGuides@Southampton: Research Data Management: Planning

Planning

A robust Research Data Management Plan (DMP) is required to demonstrate and ensure good research practice and procedures. This helps with protection of Intellectual Property Rights (IPR), proper recording, maintenance, storage and security of Research Data which in turn supports compliance with relevant legislation and regulations regarding data usage and rights in relation to data. It also can ensure that common law confidentiality obligations and appropriate access to Research Data is maintained.

Why do I need one?

Basic DMP is required by the University's Research Data Management policy and is recommended in the Concordat on Open Research Data. DMP is a condition of UKRI funding and is likely to be mandated by other funding bodies, Government and institutions in the near future.
Even if your project funder does not require planning, it may be useful to write a DMP because time spent reflecting on roles and options at the start can save time later and provide additional benefits, for example:

If key staff move on, along with their considerable body of knowledge, it can help ensure that things stay on track. When new researchers join the project team, they can get up to speed quickly.
If you get queries about your data, which could include a Freedom of Information (FOI) request, a DMP can help demonstrate that ethical and confidentiality issues have been considered appropriately along with effective processes for data security, retention and sharing.

A DMP will bring most benefit if it is referred to and updated throughout the project and viewed as an integral part of the research process

Copy the easy remember link for this section, https://library.soton.ac.uk/researchdata/whyplan

How to write a DMP?

A Data Management Plan (DMP) is a document that describes:

the volume and type(s) of data generated
how data will be organised and adequately documented
where data will be stored and backed up during the project
how data is ethically and legally compliant
how data will be preserved and made available for others to reuse under FAIR principles (where appropriate) in the long term
who will be responsible for looking after your data during and after the project and what resources will be needed

Top 10 checklist

If your funder requires you to write a data management plan (DMP), follow that funder's current advice (see below). If your funder has no specific requirements or your research is unfunded, this checklist is for you.

1. What data will be created?

How much data will be generated?
Where will they be stored and backed up securely?
How much will this cost?
What types or formats
Does your project involve sensitive data eg working with human participants, commercial contracts, non-disclosure agreements

2. Who will create the data?

Who will own the rights to that data?
Will any existing data sets from other providers need to be used during the project?
Do they have licensing terms or provisions?

3. Roles and Responsibilities

Are there any specific data management roles and responsibilities in addition to those outlined by relevant policies?

4. Software and Services required

Is there a need for specialist software or services to generate, manage or visualise the data?

5. Naming and describing your data

How will you name your files and describe your data so you and others can easily locate/re-use the data?
Is there a discipline/community standard?

6. Data Sharing with Collaborators

How will the other collaborators (including external partners) use the data during the course of the project and for what purpose?
What access will they need and how will this be facilitated, with secure access/data transfer where required?

7. Storage - short & long term

Which data need to be stored in the short term?
Which are identified as significant, with reference to relevant policies, and are to be stored in the longer term to meet any retention requirements?
If this is not definitive at the start, how will this be decided during the course of the project?
What is the best archive format for long term use?
Where are significant data best archived?

8. Dissemination

Will the data be published or made available?
Who will use the data and will they need additional information to re-use the data effectively?
Are there any funder requirements to share data?

9. Restrictions to Sharing

Will any of the data be confidential or should it be kept confidential initially e.g. because of commercial application or ethical considerations?

10. Permissions to share

Will you need written consent from any participants to share/re-use the data beyond the project?
Will you need to obtain permission to publish re-used data?

A more detailed checklist is available from the Digital Curation Centre DMP checklist

Copy the easy remember link for this section, https://library.soton.ac.uk/researchdata/topten

Examples

DCC have a list of “real life” DMPs at http://www.dcc.ac.uk/resources/data-management-plans/guidance-examples

We also have some Southampton DMPs which we can share on request. Contact researchdata@soton.ac.uk

Moreover, you can download the generic Southampton DMP information (Word file, sign in required).

Copy the easy remember link for this section, https://library.soton.ac.uk/researchdata/dmpexamples

University policies related to data management

University policies related to data management and sharing:

Copy the easy remember link for this section, https://library.soton.ac.uk/researchdata/sharingpolicies

Get help

For those writing a data management plan in support of a grant application, the Research Data team can review your plan before you submit your application. We would prefer to have at least two weeks notice before the deadline. Email us at researchdata@soton.ac.uk

DMPonline is a platform for creating data management plans and provides access to templates for all the major UK and EU funding

DMP guidance for PGRs can be found on the library's thesis guide

Guidance from funders

Most major research funders require some form of documentation at the application stage, to explain how research data will be managed.

Funder requirements should followed alongside the existing University Research Data Management Policy.

UKRI

UK Research and Innovation (UKRI) expects research data arising from its funding to be made as open as possible and as restricted as necessary. Each Council has developed their own specific policies and requirements to take account of the disciplines involved, there is an expectation that researchers will:

Plan for their data collection, often through the submission of a initial data management plan with their proposal.
Manage their data throughout the project, keeping the necessary metadata and documentation, and establishing good data handling procedures.
Deposit their research data in a suitable repository and where possible make at least the record and metadata discoverable.
Share the data at the appropriate time, subject to restrictions required relating to confidential, commercial, sensitive or personal data
Write a Data Access statement Publications should include a statement on the availability or restrictions to access for the data (data access statement) as well as the acknowledgement of funding and grant details:
- All UKRI, formerly RCUK, funded research must have a data statement and reference the grant number as part of the UKRI Open Access Policy
- This was reiterated by both the EPSRC (Research Data Expectation, March 2022) and the ESRC (Principle 3 of their Research Data Policy, May 2018)
- Other funders have similar requirements.

Other funders

For ‘at a glance’ summaries, see the DCC’s overview of funders’ data policies, and also their individual funder guides, linked directly below:
The University of Bristol curates a list of funders’ requirements, and provides pdf guides on funders’ requirements.

Templates

The DCC keeps a broad list of funder template DMPs including guidance and sample plans.

Copy the easy remember link for this section, https://library.soton.ac.uk/researchdata/funderguidance

Costing

Thought needs to be given at an early stage to the costs of preserving data, so that these can be included in the funding application.

More cost information

The largest share of costs for data are incurred in the preparation and ingest to the selected storage service, as shown by the costing tool provided by the UK Data Archive.
It may be necessary to budget for additional time and effort to prepare data for preservation, and some data archives levy a charge for deposits. Funders usually will only pay for costs incurred during a project so archival storage costs will have to be invoiced during the grant period when data is deposited rather than a rolling annual cost.
Most funding bodies will cover reasonable costs, you can check with your funder what support is available.

Copy the easy remember link for this section, https://library.soton.ac.uk/researchdata/costs

Making data FAIR

The FAIR Data Principles were first defined in a 2016 article in Scientific Data. They are designed to promote:

Findability
Accessibility
Interoperability
Reusability

Findable

Having machine-readable metadata allows for the discovery of datasets and services.

The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for the automatic discovery of datasets and services, so this is an essential component of the FAIRification process.

F1. (Meta)data are assigned a globally unique and persistent identifier

F2. Data are described with rich metadata (defined by R1 below)

F3. Metadata clearly and explicitly include the identifier of the data they describe

F4. (Meta)data are registered or indexed in a searchable resource

Accessible

After finding data, users must know how data can be accessed. Metadata must be accessible even when data is no longer available.

Once the user finds the required data, she/he/they need to know how they can be accessed, possibly including authentication and authorisation.

A1. (Meta)data are retrievable by their identifier using a standardised communications protocol

A1.1 The protocol is open, free, and universally implementable

A1.2 The protocol allows for an authentication and authorisation procedure, where necessary

A2. Metadata are accessible, even when the data are no longer available

Interoperable

Ensuring that the data can communicate and exchange with other data, applications, or workflows for processing, analysing and storing.

The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.

I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (Meta)data use vocabularies that follow FAIR principles

I3. (Meta)data include qualified references to other (meta)data

Reusable

Having metadata and data well described so it can be repeated or combined in various settings.

The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.

R1. (Meta)data are richly described with a plurality of accurate and relevant attributes

R1.1. (Meta)data are released with a clear and accessible data usage license

R1.2. (Meta)data are associated with detailed provenance

R1.3. (Meta)data meet domain-relevant community standards

FAIR principles are increasingly important as we use computational support to find and deal with data. The principles advocate using rich metadata, persistent identifiers (such as DOIs), licences and, where they exist, shared community standards. See the full FAIR principles below.

Datasets can still be FAIR even if they are not openly accessible. If they are more findable because of rich metadata and it is clear how potential users can request and access the dataset, then even a sensitive dataset which cannot be made openly available can achieve a high degree of FAIRness.

The principles refer to three types of entities: data (or any digital object), metadata (information about that digital object), and infrastructure. For instance, principle F4 defines that both metadata and data are registered or indexed in a searchable resource (the infrastructure component).

The FAIR principles are further defined on the GO FAIR website.

Data privacy and GDPR

A data protection impact assessment (DPIA) is a process to help identify and minimise the data protection risks of a project.

You must do a DPIA for certain listed types of processing, or any other processing that is likely to result in a high risk to individuals' interests. It is also good practice to complete a DPIA for any other major project which will require the processing of personal data.

Policies

Under The Data Protection Act 2018, DPIA (the new term for a Privacy Impact Assessment) is compulsory for any project that is likely to be 'high risk' to the rights and freedoms of individuals. The GDPR does not define what high risk is, however examples include 'large-scale' processing so it is likely that DPIA will be required for some research projects.

Sensitive data

Even sensitive research data can often be shared legally and ethically by using informed consent, anonymisation and controlled access. In order to be able to do this it is important to consider potential data sharing and re-use scenarios well before the ethics process and data collection. Be explicit in your consent forms and PIS about your plans to make data available, who will be able to access the data, and how the data would be accessed and potentially re-used.

Process

You should complete an Initial Data Protection Review (IDPR) (serviceline form) and you may also need to undertake a full Data Protection Impact Assessment. You can find guidance on this process on the Information Governance & Data Protection sharepoint site.