Skip to Main Content

Research Data Management: Working

Guidance and support to staff, researchers and students at the University of Southampton

The joy of well-managed data

Good research data management doesn’t end with a Data Management Plan or begin when you want to share your data; thinking about how to organise your data throughout your research project will allow you to comply with funder requirements and make your research more efficient. Data management is an essential building block for good research.  It will help you collect data that underpins your work in a way that allows it to be used with confidence, both now and in the future.  Basic principles, such as filenaming and version control, will help you locate and understand your data.  Ultimately, well managed data can add to the credit you receive and impact of your research alongside other research outputs.

See further information if you are working with sensitive data (data that is security or commercially sensitive or contains information on living human subjects) .

 

Working with Data

By describing and documenting your data you will be able to

  • return to data created earlier in a project and be reminded of what work or processes have been applied to the data. 
  • revise or review the data should you need to do so
  • extend your original work at a later date
  • allow others to add to your work rather than repeating it. 

How will my research data be used? This will depend on the type of data and any requirements, or restrictions, placed on you by funders, ethical or commercial considerations.  UK funding councils and others are increasingly requiring details of how and where data will be shared, while acknowledging some limitations need to be imposed for reasons of commercial interest or confidentiality.

Three kinds of data reuse

1. Author consultation and reuse: the data must be meaningfully named and located  so that you, the originator of the data, can find and use it on any future occasion. 

2. Non-author consultation: for other researchers to access your work, the metadata must be consistent and discoverable, and assigned according to international standards where these exist, for example, Dublin Core or Data Documentation Initiative. Allowing others to see your work gives credit to you, your research team, and your institution.

3. Non-author reuse: the most open  form of reuse, enabling other researchers to replicate/develop/enhance your data in their own research. Increasingly required by funders, and means that the data must be completely and consistently described. For example, the OECD requires publicly funded data to be openly available to the scientific community.

See UKRI's Common Principles on Research Data.

For further guidance see our section on Funder Expectations.

 

Metadata are a subset of core data documentation, which provides standardised structure information that explains:

  • the origin
  • purpose
  • time references
  • geographic location
  • creator
  • access conditions
  • terms of use

of a data collection (UK Data Archive).

The detail and range of the metadata for any research file is in part dependent on the subject, format, and intended reuse:

  • The creation of metadata for the various elements of a project, and for the project as a whole is essential - there must be evidence that the project data is both findable and usable
  • The simplest form of metadata is assigned through meaningful filenames and use of the document properties and tag option in programs such as Word and Excel.
  • At the file level, metadata must include a comprehensive description that enables replication: this varies between disciplines and file type.
  • At the resource level, metadata is required for linked files that form part of a complete project, which requires an additional level of metadata: a general overview is available from the Archaeology Data Service

Why do you need metadata?

Creating metadata is good research practice and enables you to keep track your own work.  Depositing your metadata with your data will also enable others to discover and understand your data.

For further resources see Useful links below

 

Useful Links

This resource is freely available
Bookmark

Bookmark this page as: https://library.soton.ac.uk/researchdata/description

Files are commonly saved within a folder structure.   You should consider whether one big-flat folder for all your files or a hierarchical tree structure would be the most appropriate for the piece of work or project you are doing.  A complex structure can encourage the use of shorter less meaningful file names that are dependent on that structure.  This may mean that when the folder structure is removed, for example when you provide your data to a collaborator, the file names may have little or no meaning. To avoid this try to use names that match your environment and contain:

  1. Something meaningful to you (such as what you are doing with the file)
  2. Something meaningful to someone else (such as an experiment number or project name.

Develop a system for file naming that works for your project or work, use it consistently and make sure it is part of the assigned metadata. The UK Data Archive has a useful guide: Format your data.

Ideally all data items related to a project, with associated metadata, should be grouped, and deposited with a summary of contents and relationships, itself in an appropriate open format.  You may want to consider using a database or spreadsheet to track data. Data analysis software such as NVivo can be used to describe and document data.

Tips for filenames

Best Practice Example

Limit file names to 32 characters

32CharactersLooksExactIyLikeThis.csv

Don't use special characters or spaces

NO name&date@location.txt

NO name-datelocation.txt

NO name.date VI .2.txt

YES name_date_location.txt

Use versioning

NO ProjlD_latest.txt

YES ProjlD_v02.txt

Use leading zeros in sequential numbering

to allow for multi-digit versions

For a sequence of 1-10: 01-10

For a sequence of 1-100: 001-010-100

NO ProjID_I .csv

YES ProjlD_01 .csv

Don't use generic data file names that may

conflict when moved from one location to another

NO MyData.csv

YES ProjlD_date.csv

Links

Bookmark

Bookmark this page as: https://library.soton.ac.uk/researchdata/filenaming

Software solutions

Developers use version and revision control software to maintain current and historical versions of files such as source code, web pages, and documentation but it can be used for any sort of digital file.

Git is a free and open source distributed version control system.  The University runs the University of Southampton Git Service.  You can use it to keep private versions of your data and files within the University.

The University also supports: 

Contact Serviceline for further details.

Including version information in filenames

This can be done in any of the following ways:

  • the date recorded in the file name or within the file, for example HealthTest-2008-04-06
  • a file history, version control table or notes included within a file, where versions, dates, authors and details of changes to the file are recorded
  • version numbering in the file name, for example HealthTest-00-02 or HealthTest_v2
    Filename Description

    LiteratureReview_1.0

    Original document

    LiteratureReview_1.1

    Minor revisions made

    LiteratureReview_1.2

    Further minor revisions

    LiteratureReview_2.0

    Substantive changes

Further information

See  Version control and authenticity - UK Data Service

 

Bookmark

Bookmark this page as: https://library.soton.ac.uk/researchdata/versioning

Data Security covers not just data containing personal information, but also data which may be commercially or otherwise ethically sensitive.

You should be careful when working with data when working away from campus. Please see iSolutions advice on working safely while not on campus.

At the University, the Information Security team in iSolutions can provide guidance and help on data security. They can be contacted via serviceline@soton.ac.uk

See also the Sensitive Data & Data Protection (GDPR) pages.

Encryption

Many of the techniques for dealing with sensitive data involve some form of encryption. Encryption obfuscates the data so that only those with the correct decryption key or password are able to read them. The strength of encryption refers to how difficult it would be for an attacker to decrypt the data without knowing the key in advance, and this depends on both the method and the key used.

The tool you use for encryption should inform you of the method it will use and may give you a choice. The Information Commissioner's Office currently recommends using the AES-128 or AES-256 encryption methods, of which the latter is stronger.

Whenever setting the key to be used by an encryption method, be sure to use a strong password. You must keep the key safe, as if it is lost the data will be unrecoverable, and conversely if it is leaked the encryption will cease to offer protection.

For more information about encryption contact InfoSec via serviceline@soton.ac.uk

Bookmark

Bookmark this page as https://library.soton.ac.uk/researchdata/security

Sharing data during research

If you want to share data with external collaborators, even if they are part of the same research project, you must have a data sharing agreement in place. Contact riscontracts@soton.ac.uk for more information. When you share the data with others, they will be data processors
but the University will still be the data controller and therefore responsible for how the data is used.

Extra precautions need to be taken when transferring sensitive data between collaborators:

  • SharePoint provides a safe way to share documents collaboratively with other researchers both internal and external to the University.
  • Collaborators can be given a University computing account as part of their visitor status, subject to the completion of the necessary agreements. Through this account, they could be given permissions to transfer data directly into certain folders on the Research Filestore.
  • Use the SafeSend in preference to email. The service allows you to easily move files of up to 50Gb in and out of the University. All files are transferred across the network securely encrypted. All files uploaded and temporarily stored on Safesend are held on equipment owned and operated at the University's own Data Centre. Safesend is in not a cloud service; everything is stored on equipment directly owned by the University, and managed by its own IT staff. All access to data is very tightly and strictly controlled by the University; all accesses to data on Safesend are logged and can be easily checked if you are ever concerned that a 3rd party might have gained access to your data. Files are automatically deleted from Safesend 32 days after you upload them.
Bookmark

Bookmark this page as: https://library.soton.ac.uk/researchdata/researchdata/sharing-during-research

The metadata describing your data supports findability, citation and reuse. Rich metadata provides important context for the interpretation of your data and makes it easier for machines to conduct automated analysis. Follow standard metadata schemes, general ones such as Dublin Core, or discipline specific. The Digital Curation Centre has an excellent disciplinary metadata directory, see also the RDA Metadata Directory and a portal of data standards at FAIRsharing.

Other useful resources:

Bookmark

Bookmark this page as: https://library.soton.ac.uk/researchdata/disciplines 

Bookmark

Bookmark this page as: https://library.soton.ac.uk/researchdata/guides

Requesting a DOI

We can register a DOI for your dataset through DataCite - this gives a persistent link and can make it easier to cite.

Datacite

For more details see our DOI for data page.

 

Research Support Guide

Acknowledgements

Parts of this guide on Working with Research Data are based on MIT Libraries Data Management https://libraries.mit.edu/data-management, CC BY and OpenAIRE, CC-BY