Library Guides: Research Data Management at Charles Sturt: Collect and Create

Collect or create your data

What type of data are you collecting or creating?

Consider how you will collect, document and describe the data in a way that it can be used later. If you have sensitive data, collecting, storing and sharing will have extra requirements.

Documentation and metadata

Providing documentation and metadata means others can find (metadata); and make sense of your data (documentation).

Documentation is contextual information about your data that you are likely to produce during the course of your research, and this information will aid anyone else to reuse your data. Keep documentation alongside your research data, securely stored and backed up regularly.

Metadata, unlike documentation, is standardised data about your data. The reason why metadata exists is to allow for data preservation (if you were to save your data in a data repository); discovery for sharing; and data citation.

Documentation of your data

Data documentation provides context for your data and ensures that the data can be understood in the long term. Here is a list of what you should consider storing:

Document all this information in a spreadsheet or README.txt file and store it alongside your dataset.

File naming and version control

There is no point in collecting data that you then can't find! Create a strategy for file naming and a folder structure.

Metadata

Metadata is often described as data about data.

Think of it as the keywords in an article and a way or "selling" your data to others. It can help others find your data and decide if they need it so important in ensuring findability, reuse and citation of your work.

Metadata is usually structured using recognised standards or schemas such as Data Documentation Initiative (DDI).

Store the metadata within the data (e.g. in file properties) or in separate databases (e.g. XML) or files (README.txt).

See the ARDC Metadata Guide for more details and see the table below outlining the elements you can use to describe your data.

Metadata schemas

Many disciplines have a specific way of structuring metadata - these specific structures are called schemas. A schema will list the information you'll need to include about your data and how that information should be structured. Below are a few examples of schemas:

Discipline	Metadata standard
General	Dublin Core (DC) Metadata Object Description Schema (MODS) Metadata Encoding and Transmission Standard (METS)
Arts	Categories for the Description of Works of Art (CDWA) Visual Resources Association (VRA Core)
Biology	Darwin Core
Ecology	Ecological Metadata Language (EML)
Geographic	ISO 19115-1:2014 - Geographic information - Metadata
Social sciences	Data Documentation Initiative (DDI)

Metadata example

You can use these metadata elements in a README.txt:

Identifier	Unique alpha-numeric identifier used to identify the data (such as DOI)
Date	Any key dates associated with the data, including project start and end dates
Title	Name of the research project or dataset
Version	Information on the relevant version(s) of the dataset
Creator(s)	Names, contact details and identifiers (such as ORCID) for all organisations and/or persons who collected and created the data
Source	Citations for any data obtained or derived from other sources, including the creator, the year, the title of the dataset, identifier and access information
Location	Relevant geographic information, including cities, regions, states, countries or coordinates
Keywords	Keywords or phrases describing the data, this could also include relevant Field of Research codes
Methodology	Information on how the data was created, including specific software or equipment (with model or version numbers), formulae, algorithms or methodologies
Processing	Information on how the data has been transformed, altered or processed including quality assurance/control measures
Technical details	All relevant technical information including a list of all the files that make up the dataset with extensions and relevant file formats and structures, an explanation of any codes or abbreviations used in the file names, a list of all variables in the data files, as well as the names and version numbers of all software packages required to use, view, or analyse the data
Rights	Any known intellectual property rights, statutory rights, licenses, or restrictions on use of the data
Access	How and where the data can be accessed