Sensitive research data usually includes one or more of the following:
If you are working with sensitive data, you need to take extra precautions to ensure the data can only be viewed by those with permission to do so. These may include encryption or other special measures when storing, transferring and disposing of data. Please see our Briefing document on what to consider when sharing sensitive data.
Whilst adopting a proportionate risk based approach, the entire lifecycle of the research information needs to be considered, from creation to destruction. Minimum controls for highly restricted information to remain secure include user access controls, encryption, identifying and guaranteeing the location of the information, legitimate sharing / appropriate contracts.
The key actions to reduce your risk are:
You left your laptop on a train or a bag on the bus. Your laptop had the DNA profiles from the participants in your research project, or the bag was full of consent forms. Accidents happen but the penalties increased substantially when the EU General Data Protection Regulation (GDPR) was adopted into British law as the the Data Protection Act (2018) in May 2018.
DPA (2018) covers all forms of personal data including genomic and some anonymised data.
It is possible to restrict access to folders on the University's research filestore, so that only certain individuals or groups are allowed to view and edit the contents. A typical configuration for project folders is to allow access only to members of the project team, but it is also possible to set up folders within the project folder that are restricted to fewer users. For more information contact the IT team via serviceline@soton.ac.uk
You can share data which is stored on your Office 365 One Drive for Business, however for data which contains direct and indirect personal identifiers, we recommend you use Research Filestore. See the iSolutions website for more information about Office365.
While external services such as Dropbox, Google Drive and OneDrive are convenient, they do not comply fully with the University's data policies due to the following issues:
External cloud-based solutions should therefore be avoided for sensitive data. If you are considering using external storage providers nevertheless, perhaps because of conditions imposed by external collaborators, you must only consider those which will allow you to take the following security measures:
Anonymisation is the complete and irreversible removal of any information that could lead to an individual being identified, either from the removed information itself or this information combined with other data held by the University. Once data is truly anonymised and individuals are no longer identifiable, the data will not fall within the scope of the DPA (2018) and GDPR and it becomes easier to use.
Full Anonymisation is the process of removing personal identifiers, both direct and indirect. An individual may be directly identified from their name, address, postcode, telephone number, photograph or image, or some other unique personal characteristic (direct identifiers). An individual may be indirectly identifiable when certain information is linked together with other sources of information, including, their place of work, job title, salary, their postcode or even the fact that they have a particular diagnosis or condition (indirect identifiers).
Full anonymisation is often difficult to attain. In most cases the information can only be partially anonymised and therefore will still be subject to data protection legislation. If you can't fully anonymise the information, it is still good practice to partially anonymise it as this limits the ability to identify people.
Much of what may have been considered anonymised data 20 years ago would now be defined as pseudonmyised data due to the increased ability for data-linking where two or more data sources can be combined to re-identify individuals.
Full anonymisation is often difficult to attain and for research, often not desirable. In most cases the information can only be partially anonymised or psedonymised and therefore will still be subject to data protection legislation. Pseudonymisation is defined within the GDPR as “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information, as long as such additional information is kept separately and subject to technical and organizational measures to ensure non-attribution to an identified or identifiable individual” (Article 4(3b)).
Unlike full anonymisation, pseudonymisation will not exempt researchers from the DPA (2018) altogether. It does however help the University meet its data protection obligations, particularly the principles of ‘data minimisation’ and ‘storage limitation’ (Articles 5(1c) and 5(1)e), and processing for research purposes for which ‘appropriate safeguards’ are required.
Where 'de-identified' or pseudonymised data is in use, there is a residual risk of re-identification; the motivated intruder test can be used to assess the likelihood of this. Once assessed, a decision can be made on whether further steps to de-identify the data are necessary. By applying this test and documenting the decisions, the study will have evidence that the risk of disclosure has been properly considered; this may be a requirement if the study is audited.
This involves considering whether an ‘intruder’ would be able to achieve re-identification if motivated to attempt this.
The ‘motivated intruder’ is taken to be a person who starts without any prior knowledge but who wishes to identify the individual from whose personal data the anonymised data has been derived. This test is meant to assess whether the motivated intruder would be successful.
In practice this includes but is not limited to:
The ‘motivated intruder’ is not assumed to have any specialist knowledge such as computer hacking skills, or to have access to specialist equipment or to resort to criminality such as burglary, to gain access to data that is kept securely.
The UK Anonymisation Network (UKAN)
UKAN Anonymisation Decision-making Framework
UK Data Archive: Anonymisation guide
UK Data Archive (2021) Webinar: How to anonymise qualitative and quantitative data
University of Bristol (2020) Sharing research data concerning human participants, version 2
UK Data archive text anonymisation helper tool (downloads a zip file) This tool can help you find disclosive information to remove or pseudonymise in qualitative data files. The tool does not anonymise or make changes to data, but uses MS Word macros to find and highlight numbers and words starting with capital letters in text. Numbers and capitalised words are often disclosive, e.g. as names, companies, birth dates, addresses, educational institutions and countries.
If you want to share data with external collaborators, even if they are part of the same research project, you must have a data sharing agreement in place. Contact riscontracts@soton.ac.uk for more information. When you share the data with others, they will be data processors
but the University will still be the data controller and therefore responsible for how the data is used.
Extra precautions need to be taken when transferring sensitive data between collaborators:
Data that is to be published should have all direct identifiers removed, those include:
Data for open publication should also not have two or more indirect identifiers (listed below) as that can lead to re-identification through a process called 'triangulation'. You should remove or modify one or more of the indirect identifier until the risk of re-identification is neglible. If you are unsure or require more advice, please contact researchdata@soton.ac.uk. Indirect identifiers include:
(List courtesy of University of Bristol (2023), Sharing Data Concerning Human Participants guide)
The more that anonymised data is aggregated and non-linkable, the more possible it is to publish it. However this may remove valuable information from the data, pseudonymised data is often valuable to researchers because of the granularity it affords, but carries a higher risk of re-identification. Instead of making this data openly available, it may be preferable to release the data, on request, to other bone fide researchers using non-disclosure data sharing agreements. This allows more data to be disclosed than is possible with wider or public disclosure. Information security controls still need to be in place and managed. For more information contact researchdata@soton.ac.uk
Many of the techniques for dealing with sensitive data involve some form of encryption. Encryption obfuscates the data so that only those with the correct decryption key or password are able to read them. The strength of encryption refers to how difficult it would be for an attacker to decrypt the data without knowing the key in advance, and this depends on both the method and the key used.
The tool you use for encryption should inform you of the method it will use and may give you a choice. The Information Commissioner's Office currently recommends using the AES-128 or AES-256 encryption methods, of which the latter is stronger.
Whenever setting the key to be used by an encryption method, be sure to use a strong password. You must keep the key safe, as if it is lost the data will be unrecoverable, and conversely if it is leaked the encryption will cease to offer protection.
For more information about encryption contact InfoSec via serviceline@soton.ac.uk
See the Research Data Management: Destruction webpage for more information on how to securely destroy electronic and printed data.
A data protection impact assessment (DPIA) is a process to help identify and minimise the data protection risks of a project.
You must do a DPIA for certain listed types of processing, or any other processing that is likely to result in a high risk to individuals’ interests.
It is also good practice to complete a DPIA for any other major project which will require the processing of personal data.
Under The Data Protection Act 2018, DPIA (the new term for a Privacy Impact Assessment) is compulsory for any project that is likely to be 'high risk' to the rights and freedoms of individuals. The GDPR does not define what high risk is, however examples include 'large-scale' processing so it is likely that DPIA will be required for some research projects.
Even sensitive research data can often be shared legally and ethically by using informed consent, anonymisation and controlled access. In order to be able to do this it is important to consider potential data sharing and re-use scenarios well before the ethics process and data collection. Be explicit in your consent forms and PIS about your plans to make data available, who will be able to access the data, and how the data would be accessed and potentially re-used.
You should complete an Initial Data Protection Review (serviceline form) and you may also need to undertake a full Data Protection Impact Assessment. You can find guidance on this process on the Information Governance & Data Protection sharepoint site.
The University has guidance and resources for staff to help them understand their and the University's responsibilities under GDPR.
UKRI has published guidance for researchers:
The Information Commissioner's Office (ICO) has guidance on how GDPR is being interpreted in the UK:
The University has a process to deal with DPIAs for both research and administrative work within the University. Templates for the DPIA and further information can be found the the Information Governance intranet site. Please direct any DPIA queries to DPIA@soton.ac.uk
We can register a DOI for your dataset through DataCite - this gives a persistent link and can make it easier to cite.
For more details see our DOI for data page.
Thanks to the Universities of Bath, Manchester, UCL, Edinburgh and Bristol whose webpages informed our content.