Skip to main content

Research Data Management: Data Plan for your PhD

Guidance and support to staff, researchers and students at the University of Southampton

DMP for your PhD research

All first year post graduate researchers should complete a data management plan for their research and include it as part of their first progression review. 

A data management plan or DMP is a living document that helps you consider how you will organise your data, files, research notes and other supporting documentation throughout the length of the project.  The aim is to help you find these easily, keep them safe and have sufficient documentation to be able to re-use throughout your research and beyond.

 

All researchers will have data. Data can be broadly defined as 'Material intended for analysis'.  This covers many forms and formats, and is not just about digital data.

For example, 

Art History - high resolution reproductions of photographs, notebook describing context

English literature - research notes on text, textual analysis

Engineering - experimental measurements on the physical properties of liquid metals

The University also has a definition for “Research Data” in its Research Data Management Policy that you should consider.

A PhD DMP template and guidance on how to complete your Data Management Plan is now available (see below). Contact us if you need further information or have feedback via researchdata@soton.ac.uk

Key Documents

The template below has been provided to assist you in writing your data management plan.  Not all sections will be relevant, but you should consider carefully each section.

When the time comes to deposit your data, follow the advice in our Thesis Data Deposit guide

Loading ...

Creating your DMP

What are data management plans?
A data management plan is a document that describes:

  • What data will be created
  • What policies will apply to the data  
  • Who will own and have access to the data
  • What data management practices will be used 
  • What facilities and equipment will be required 
  • Who will be responsible for each of these activities

Your data management plan should be written specifically for the research that you will be doing.  Our template is a guide to help you identify the key areas that you need to consider, but not all sections will apply to everyone.  You may need to seek further guidance from your supervisor, colleagues in your department or other sources on best practice in your discipline.  We provide some details of guidance available in our training section and on our general research data management pages.

Each of the tabs looks at the different topics that can be included in a data management plan.  You can move through the tabs in any order.

Describing your Project

At the start of your data management plan (DMP) it is useful to include some basic information about the research you are planning to do.  This may already exist in other documents in more detail, but for the purposes of the DMP try to summarise in as few sentences as possible.

What policies will apply?

It is important that you think about who is funding your research and whether there are any requirements that you need to meet.  Are you funded by a UK Research Council? What policies do they have on research data - see Funder Guidance.  What does our University Research Data Management policy and Code for Conduct for Research state is required?

Does the type of data you will be creating, using, collecting mean that you have to meet certain legal conditions?  Will you be collecting any form of personal data, (see ICO Personal Data Definition), special category data (see ICS Special Category definition) or is it commercially sensitive?  For example, if you are involved in population health and clinical studies research data and records minimum retention could be 20-25 years for certain types of data - see the MRC Retention framework for research data and records for further details. 

Do you need Ethics Approval?

Anyone who is dealing with human subjects or cultural heritage (see University policies) will require to obtain ethics approval and this must be done prior to collecting any data.  Your DMP should inform what you say in your ethics application about how you will collect, store and re-use your data.  It is important that your DMP and your ethics application are in agreement and you provide your participants with the correct information. Once you receive your ethics approval, review your data management plan and update as necessary.

Reviewing your Data Management Plan

A DMP should be a living document and should be updated as your research develops.  It should be reviewed on a regular basis and good practice would encourage that the dates of review are included in the plan itself.  Use of a version table in any document can be helpful.

What data will be created?

In your data management plan you need to provide some detail about the material you will be collecting to support your research.  This should cover how you will collect notes, supporting documentation and bibliographic management as well as your primary data. Will all your data be held electronically or will you require to maintain a print notebook to collect your observations?

Are you using Secondary Data?

Not everyone has to collect their own data, it may already have been collected and made available.  This data is known as secondary data.  Some secondary data are freely available, but other data are released with terms and conditions that you need to meet.  In some cases this may influence where you can store and analyse the data.  You need to be aware of this as you plan the work you intend to do.

How are you collecting or creating your data?

How you collect or gather the material for your research will influence what you need to do to manage them. The way you do this may alter as your research progresses and you should update your plan as required. Will you be collecting data by observing, note-taking in an archive, carrying out experiments or a mixture of these? 

How much data are you likely to have?

Knowing how much data you might create is important as it will dictate where you can store your data and whether you need to ask for additional storage from iSolutions.  It is unlikely that you can say exactly what volume of data you might create, but you will have an idea of individual file sizes.  If you will be working with word, excel documents and a reference management software library then you are likely to be dealing with megabytes or gigabytes of data. If you will be collecting high resolution images then you may end up needing to store terabytes. Estimate as early as possible and if you think you may need additional space you should discuss this with your supervisor.

What formats will you be using?

A crucial factor in being able to share data is that it is in an open format or collected using disciplinary standard software that allow export to open formats.  Consider how open the format of your data will be when selecting the software, instruments, word processing packages that you use. See the Data formats section in Introducing Research Data Part III for points to consider.

Who will own the data?

If you have been sponsored by a research council, government, industry or commercial body the agreement you signed may cover ownership of the data that you create.  Being aware of this early is useful as it will influence what you are able to do when you come to writing papers, sharing and depositing your data when your finish. It may also impact on where you can store your data.  

How will you make your data findable?

Using standards to capture the essential metadata is a good way to help create data that will be easy to find.  It will also make preparing for deposit in the future more straightforward.  The Research Data Alliance has a helpful list of disciplinary metadata and use case examples.  You can make reference to these in your plan once you know what will be most appropriate to use.

Where will you store the data during your PhD?

Where you store your data will depend on things such as the type and size of data you are collecting.  Certain types of data, such as personal, special category data (formerly referred to as sensitive data) or commercially confidential data, will require to be stored more securely than others.  This type of data generally requires to be stored on University network drives that have additional protection and not on personal computers or cloud storage (for example, Office 365, One Drive). Where you are collecting less sensitive data your choice of storage is wider.  For all storage it should in a location with good back-up procedures in place. Consult iSolutions guidance and knowledge base for further information.

How will you name your files and folders?

It can be helpful to think about creating a procedure on how you will name your files. This is a basic step where it is useful to consider how easy it will be to interpret the name in the future.  Abbreviations can be good, but ask yourself how someone else might understand the file name should you need to share it with them. What would make it easy to know what each file contains?  While it is possible to have quite longer file names this can cause problems when you zip files. 

How will you tell one version of a file from another?

How will you be able to tell whether you are dealing with the latest version of a file? How will you manage major versus minor changes?  What if you want to return to an earlier version?  Use the data management plan to investigate what would be the optimum method for you and establish a good procedure from the beginning.  Generally the use of 'draft', 'latest' or 'final' should be avoided.  Instead consider using the data (YYYY-MM-DD) or a version number, for example, v.1.0  where the nominal value increases with major changes and decimal for minor ones.  Adding a version table at the end of a document can also be helpful.

How can you share your data?

To make data accessible is not about doing something at the end of the project, but needs to be planned for from the beginning.  During your research you are likely to have colleagues or collaborators who will need to be able to access the data - how will you do this?  Will you need a collaborative space and if so what can you use?  Does it need to be is a protected location with restricted access due to the type of data you are using? By establishing good procedures on documentation, metadata collection, file-naming and using disciplinary standards this will assist you throughout your research, as well as helping at the end.

How do you handle personal, sensitive or commercially confidential data?

If the data you are collecting contains  personalspecial category data (formerly referred to as sensitive data) or commercially confidential data then sharing or transferring the files needs to be carried out in a way that does not make the data vulnerable. Data should be anonymised or pseudo-anonymised as early as possible after collection, seek disciplinary guidance prior to collection. 

The medium of transfer must be secure and where necessary encryption should be used. You may want to consider one of the following:

There may be other software available and you should check if there is a standard in your discipline. 

Transferring data via USB or external drives is not recommended, but where required these should be encrypted. Avoid using email to send files and instead use our University Dropoff service.  This offers transfer of files up to 50GB and your files can be encrypted by ticking "Encrypt every file" when creating a new drop-off - see 'How secure is Dropoff'

What data do you need to keep and what do you need to destroy?

Not all the data from a project needs to be kept and the data you collect should be reviewed regularly.  The Digital Curation Centre (2014) guide  'Five steps to decide what data to keep: a checklist for appraising research data v.1' may help you to decide what to retain. It is important that you retain or discard data in line with your ethics approval.

You also need to consider what data needs to be destroyed, how you will mark the data for destruction and when this needs to happen. Destroying paper based records is relatively easy through our confidential waste system.  Destroying digital data is less so as it may need to be done so that it cannot be forensically recovered. Guidance on destroying your data is available or contact iSolutions for advice.

Why do you need to consider the long-term storage now?

At the end of your PhD you will be encouraged to share your data as openly as possible, and as closed as necessary. To do this safely consider what you need to do to enable your data to be accessible in the future.  Knowing where the best place to store your data may inform what you need to plan for in its creation or collection.  Are you aware of any disciplinary data repositories that hold similar data?  Examples are:

Investigate what requirements these repositories have on formats, documentation etc and incorporate these into your plan. Otherwise you should plan to deposit in the University Institutional Repository

There are currently no costs for depositing most dataset in our Institutional Repository unless the data requires specialist archive storage or is in excess of 1TB. External repositories may have charges for depositing data. 

Who will be creating the archive?

Generally as a PhD the job of drawing together your data into a dataset ready for deposit will fall to you as the researcher.  It is not the responsibility of your supervisor, although they may be able to advise on what needs to be done.  If you are part of a larger project there may be someone designated to curate the project data.  For further assistance contact researchdata@soton.ac.uk

How long should the data be kept?

This will depend on a number of factors.  Your funder may have a policy that requires the data to be held for a minimum of 10 years from last use.  If you are working in certain medical areas the data may need to be held for 25 years.  There may be some restrictions on how long you can retain personal data relating to Data Protection Act 2018 (GDPR).  Significant data that has been given a persistent identifier (DOI) will be kept permanently.

What documentation or additional information needs to accompany the data?

Keeping a record of what changes you have made, when data was collected, where data was collected from, observations, definitions of what has been collected are all crucial to allowing data to be used safely and with integrity. How do you plan to do this? How will you make sure that you can match up your notes with the files they refer to?  Some programming languages such as Python and R allow you to make notes in the files about what you are doing which is really helpful.  Where this is not an option then you will need to develop your own method to make sure that processes applied to the data are recorded and available to you to refer back to later.  Creating a register of your files by type using an excel spreadsheet may be worth considering, but it should be manageable and importantly kept up-to-date.

In order for data to be reusable it requires data provenance.  Data provenance is used to document where a piece of data comes from and the process and methodology by which it is produced. It is important to confirm the authenticity of data enabling trust, credibility and reproducibility. This is becoming increasingly important, especially in the eScience community where research is data intensive and often involves complex data transformations and procedures.

What restrictions will need to apply?

Not all data can be made openly available.  Some data may only be shared once a data sharing agreement has been signed, while other data may not be suitable for sharing.  Funding councils are encourage all data to be as open as possible and as closed as necessary. Where will your data fit with this?  What agreements do you need to be able to share your data?

When can data be made available?

Data can be deposited in our Institutional Repository and kept as an 'entry in progress' until it is ready for publication. 

Not all data needs to be made immediately available at the end of your PhD.  It is possible to add an embargo to give yourself some additional time to find funding to continue your work and re-use your own data.  See Regulations on embargoes.

However, it is not always necessary for you to wait until the end of your PhD before depositing data.  If you write a conference or journal paper it is likely that you will be asked to make the underpinning data available.

How will you keep your data safe?

What would happen if your files became corrupted or your laptop was stolen, would you be able to restore them? What would happen if someone was able to access your data without your knowledge or approval?  If you are holding personal or special category data (formerly referred to as sensitive data) and these became public this would be a data breach with potentially serious consequences.

Case Study

Dr Fitzgerald Loss of seven years of Ebola research 

Consider carefully the impact to you and your research if these were to happen and what procedures you may need to put into place to reduce the risk of these happening.

How will you back up your data?

Good housing keeping of your data is important and this includes doing regular back ups of your data.  University storage is backed up regularly but it is important to have your own 'back up' folders, kept separately from your working files.  Back up should be done on as regular a basis as required.  This can be defined by the length of time you are prepared to repeat work lost.  You may need to back up daily, weekly or monthly depending on the nature of your research.  

As well as establishing a process for backing up your files, you should check the process of restoring your files.  You will need to check that the files restore correctly.  Having good documentation on what your files contain, what transformations or analysis has been carried out will be invaluable for this process.

How can you safely destroy data?

Destroying data, especially  personalspecial category data (formerly referred to as sensitive data) or commercially confidential data, is not as straightforward as just deleting the file.  Further action is required otherwise the data could be recovered.  Please read our guidance on destruction of data  and GDPR regulations.

An important part of research data management is that your plan is implemented and part of your everyday good research practice.  The plan should be a living document and reflect your practice.  You may find that some parts become redundant or that there is a better way to carry out a process so your plan should be updated. As a PhD researcher it is likely that you will be the person responsible for implementing the plan.  If your research is part of a wider research project there may be someone in the team who has been given the role and you should discuss your data management plan with them.

What next?

Having written your plan consider what actions do you need to take in order to carry it out? What further information do you need to find? Investigate what training or briefing sessions are available via Gradbook.  If you want to enhance your data analysis skills check out material on Lynda.com.

Loading ...

Training

Courses offered:

Data Management Plan: Why Plan?  45 minute briefing.  Book via Gradbook A Panopto recording of this course also available

Research Data Management: What you need to know from the start.  45 minute briefing. Book via Gradbook

Research Data Management Workshop.180 minute workshop Book via Gradbook

Managing your research notes and/or data (Arts and Humanities). 120 minute workshop.  Book via Gradbook

PhD Data Management Plans: Introduction for Supervisors. 45 minute briefing. Book via Staffbook

Loading ...