Skip to Main Content

Research Data Management: Sensitive Data & Data Protection (GDPR)

Guidance and support to staff, researchers and students at the University of Southampton

Sensitive Research Data

Sensitive research data usually includes one or more of the following:

  • the involvement of human subjects, particularly where the research involves sensitive personal data such as health records;
  • the involvement of commercial collaborators, particularly where the data could be construed as competitive intelligence;
  • working under the terms of a non-disclosure agreement.

If you are working with sensitive data, you need to take extra precautions to ensure the data can only be viewed by those with permission to do so. These may include encryption or other special measures when storing, transferring and disposing of data. Please see our Briefing document on what to consider when sharing sensitive data.

Whilst adopting a proportionate risk based approach, the entire lifecycle of the research information needs to be considered, from creation to destruction. Minimum controls for highly restricted information to remain secure include user access controls, encryption, identifying and guaranteeing the location of the information, legitimate sharing / appropriate contracts.

The key actions to reduce your risk are:

  • Raw data and all files containing contact details for individuals (such as consent forms) must only be stored on University servers, within the University network
  • Password protect and/or encrypt files containing raw data, contact details and/or lookup keys (you must also look after the passwords for those files carefully).
  • If you are holding data locally on a laptop (for example during collection) the data must be encrypted and the laptop should be a University build laptop.
  • When sharing data with collaborators, do not share the raw data. Do not use cloud-based services. Do not share data with collaborators outside the University unless you know that a data sharing agreement is in place.
  • When moving data, do not email files instead use SafeSend or create a Sharepoint site for you and your collaborators.

What the Data Protection Act (GDPR) means for Research Data

You left your laptop on a train or a bag on the bus. Your laptop had the DNA profiles from the participants in your research project, or the bag was full of consent forms. Accidents happen but the penalties increased substantially when the EU General Data Protection Regulation (GDPR) was adopted into British law as the the Data Protection Act (2018) in May 2018.

DPA (2018) covers all forms of personal data including genomic and some anonymised data.

Working with Sensitive Data

Using the University's Research Filestore

It is possible to restrict access to folders on the University's research filestore, so that only certain individuals or groups are allowed to view and edit the contents. A typical configuration for project folders is to allow access only to members of the project team, but it is also possible to set up folders within the project folder that are restricted to fewer users. For more information contact the IT team via serviceline@soton.ac.uk 

Using the University's One Drive

You can share data which is stored on your Office 365 One Drive for Business, however for data which contains direct and indirect personal identifiers, we recommend you use Research Filestore. See the iSolutions website for more information about Office365.

Using external storage providers

While external services such as Dropbox, Google Drive and OneDrive are convenient, they do not comply fully with the University's data policies due to the following issues:

  • data may be stored in jurisdictions which do not provide the same level of privacy and data protection as the European Economic Area;
  • they do not interact well with existing University storage services;
  • they do not provide sufficient guarantee of continued availability;
  • extra precautions must be taken in order to ensure more than one person at the University has access to the data, in case of researchers leaving the University.

External cloud-based solutions should therefore be avoided for sensitive data. If you are considering using external storage providers nevertheless, perhaps because of conditions imposed by external collaborators, you must only consider those which will allow you to take the following security measures:

 

Anonymised data

Anonymisation is the complete and irreversible removal of any information that could lead to an individual being identified, either from the removed information itself or this information combined with other data held by the University. Once data is truly anonymised and individuals are no longer identifiable, the data will not fall within the scope of the DPA (2018) and GDPR and it becomes easier to use.

Full Anonymisation is the process of removing personal identifiers, both direct and indirect. An individual may be directly identified from their name, address, postcode, telephone number, photograph or image, or some other unique personal characteristic (direct identifiers). An individual may be indirectly identifiable when certain information is linked together with other sources of information, including, their place of work, job title, salary, their postcode or even the fact that they have a particular diagnosis or condition (indirect identifiers).

Full anonymisation is often difficult to attain. In most cases the information can only be partially anonymised and therefore will still be subject to data protection legislation. If you can't fully anonymise the information, it is still good practice to partially anonymise it as this limits the ability to identify people.

Much of what may have been considered anonymised data 20 years ago would now be defined as pseudonmyised data due to the increased ability for data-linking where two or more data sources can be combined to re-identify individuals.

Partial anonymisation and pseudonymised data

Full anonymisation is often difficult to attain and for research, often not desirable. In most cases the information can only be partially anonymised or psedonymised and therefore will still be subject to data protection legislation. Pseudonymisation is defined within the GDPR as “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information, as long as such additional information is kept separately and subject to technical and organizational measures to ensure non-attribution to an identified or identifiable individual” (Article 4(3b)).

Unlike full anonymisation, pseudonymisation will not exempt researchers from the DPA (2018) altogether. It does however help the University meet its data protection obligations, particularly the principles of ‘data minimisation’ and ‘storage limitation’ (Articles 5(1c) and 5(1)e), and processing for research purposes for which ‘appropriate safeguards’ are required.

Where 'de-identified' or pseudonymised data is in use, there is a residual risk of re-identification; the motivated intruder test can be used to assess the likelihood of this. Once assessed, a decision can be made on whether further steps to de-identify the data are necessary. By applying this test and documenting the decisions, the study will have evidence that the risk of disclosure has been properly considered; this may be a requirement if the study is audited. 

Motivated Intruder Test

This involves considering whether an ‘intruder’ would be able to achieve re-identification if motivated to attempt this.

The ‘motivated intruder’ is taken to be a person who starts without any prior knowledge but who wishes to identify the individual from whose personal data the anonymised data has been derived. This test is meant to assess whether the motivated intruder would be successful.

In practice this includes but is not limited to:

  • Using the edited or full Electoral Register to try to link anonymised data to someone’s identity
  • Using social media to try to link anonymised data to a user’s profile
  • Conducting an internet search to use combinations of data, such as date of birth and postcode, to identify an individual.

The ‘motivated intruder’ is not assumed to have any specialist knowledge such as computer hacking skills, or to have access to specialist equipment or to resort to criminality such as burglary, to gain access to data that is kept securely.

Further Information

The UK Anonymisation Network (UKAN)

UKAN Anonymisation Decision-making Framework

UK Data Archive: Anonymisation guide

UK Data Archive (2021) Webinar: How to anonymise qualitative and quantitative data

Information Commissioner’s Office (2012). Anonymisation: managing data protection risk code of practice.

University of Bristol (2020) Sharing research data concerning human participants, version 2

UK Data archive text anonymisation helper tool (downloads a zip file) This tool can help you find disclosive information to remove or pseudonymise in qualitative data files. The tool does not anonymise or make changes to data, but uses MS Word macros to find and highlight numbers and words starting with capital letters in text. Numbers and capitalised words are often disclosive, e.g. as names, companies, birth dates, addresses, educational institutions and countries.

 

Sharing data during research

If you want to share data with external collaborators, even if they are part of the same research project, you must have a data sharing agreement in place. Contact riscontracts@soton.ac.uk for more information. When you share the data with others, they will be data processors
but the University will still be the data controller and therefore responsible for how the data is used.

Extra precautions need to be taken when transferring sensitive data between collaborators:

  • SharePoint provides a safe way to share documents collaboratively with other researchers both internal and external to the University.
  • Collaborators can be given a University computing account as part of their visitor status, subject to the completion of the necessary agreements. Through this account, they could be given permissions to transfer data directly into certain folders on the Research Filestore.
  • Use the SafeSend in preference to email. The service allows you to easily move files of up to 50Gb in and out of the University. All files are transferred across the network securely encrypted. All files uploaded and temporarily stored on Safesend are held on equipment owned and operated at the University's own Data Centre. Safesend is in not a cloud service; everything is stored on equipment directly owned by the University, and managed by its own IT staff. All access to data is very tightly and strictly controlled by the University; all accesses to data on Safesend are logged and can be easily checked if you are ever concerned that a 3rd party might have gained access to your data. Files are automatically deleted from Safesend 32 days after you upload them. No backups are taken of the uploaded data (it's only a transitory stopping point), so after an uploaded file has been deleted, there is no way of recovering the file.

 

Publishing Data

Data that is to be published should have all direct identifiers removed, those include:

  • name
  • Initials
  • Address, including full or partial postal code
  • Spatial location (e.g. latitude and longitude units with enough precision to potentially locate the subject)
  • Telephone or fax numbers or contact information
  • Email addresses
  • Vehicle identifiers
  • Medical device identifiers
  • Web or internet protocol addresses
  • Biometric data
  • Facial photograph or comparable image
  • Un-anonymised audio or video recordings
  • Names of relatives
  • Dates relating to an individual (e.g. date-of-birth)

Data for open publication should also not have two or more indirect identifiers (listed below) as that can lead to re-identification through a process called 'triangulation'. You should remove or modify one or more of the indirect identifier until the risk of re-identification is neglible. If you are unsure or require more advice, please contact researchdata@soton.ac.uk. Indirect identifiers include:

  • Place/location of treatment, education, service use
  • Name of professional or business/service
  • responsible for healthcare, education, service
  • Gender
  • Rare disease, condition, experience, treatment, or other characteristic
  • Risky behaviours (e.g. Illicit drug use)
  • Place of birth
  • Socioeconomic data, such as occupation or place of work, income, or education level
  • Household and family composition
  • Body measures (e.g. height, weight)
  • Multiple pregnancies
  • Ethnicity
  • Year of birth or age
  • Verbatim responses or transcripts
  • Dates of sensitive events
  • Small sample sizes i.e. when the number of subjects with a certain characteristic is small

(List courtesy of University of Bristol (2023), Sharing Data Concerning Human Participants guide)

Publication vs On Request Access

The more that anonymised data is aggregated and non-linkable, the more possible it is to publish it. However this may remove valuable information from the data, pseudonymised data is often valuable to researchers because of the granularity it affords, but carries a higher risk of re-identification. Instead of making this data openly available, it may be preferable to release the data, on request, to other bone fide researchers using non-disclosure data sharing agreements. This allows more data to be disclosed than is possible with wider or public disclosure. Information security controls still need to be in place and managed. For more information contact researchdata@soton.ac.uk

Many of the techniques for dealing with sensitive data involve some form of encryption. Encryption obfuscates the data so that only those with the correct decryption key or password are able to read them. The strength of encryption refers to how difficult it would be for an attacker to decrypt the data without knowing the key in advance, and this depends on both the method and the key used.

The tool you use for encryption should inform you of the method it will use and may give you a choice. The Information Commissioner's Office currently recommends using the AES-128 or AES-256 encryption methods, of which the latter is stronger.

Whenever setting the key to be used by an encryption method, be sure to use a strong password. You must keep the key safe, as if it is lost the data will be unrecoverable, and conversely if it is leaked the encryption will cease to offer protection.

For more information about encryption contact InfoSec via serviceline@soton.ac.uk

See the Research Data Management: Destruction webpage for more information on how to securely destroy electronic and printed data.

A data protection impact assessment (DPIA) is a process to help identify and minimise the data protection risks of a project.

You must do a DPIA for certain listed types of processing, or any other processing that is likely to result in a high risk to individuals’ interests.

It is also good practice to complete a DPIA for any other major project which will require the processing of personal data.​

Under The Data Protection Act 2018,  DPIA (the new term for a Privacy Impact Assessment) is compulsory for any project that is likely to be 'high risk' to the rights and freedoms of individuals. The GDPR does not define what high risk is, however examples include 'large-scale' processing so it is likely that DPIA will be required for some research projects.

Even sensitive research data can often be shared legally and ethically by using informed consent, anonymisation and controlled access. In order to be able to do this it is important to consider potential data sharing and re-use scenarios well before the ethics process and data collection. Be explicit in your consent forms and PIS about your plans to make data available, who will be able to access the data, and how the data would be accessed and potentially re-used.

You should complete an Initial Data Protection Review (serviceline form) and you may also need to undertake a full Data Protection Impact Assessment. You can find guidance on this process on the Information Governance & Data Protection sharepoint site.

Other GDPR Resources

The University has guidance and resources for staff to help them understand their and the University's responsibilities under GDPR.

UKRI has published guidance for researchers:

The Information Commissioner's Office (ICO)  has guidance on how GDPR is being interpreted in the UK:

Data Protection Impact Assessments (DPIAs)

The University has a process to deal with DPIAs for both research and administrative work  within the University. Templates for the DPIA and further information can be found the the Information Governance intranet site. Please direct any DPIA queries to DPIA@soton.ac.uk

Requesting a DOI

We can register a DOI for your dataset through DataCite - this gives a persistent link and can make it easier to cite.

Datacite

For more details see our DOI for data page.

 

Credits

Thanks to the Universities of Bath, Manchester, UCL, Edinburgh and Bristol whose webpages informed our content.