Data Management Plans
Data Management Plans
Introduction
Research data are all data that have been collected, generated, observed during the research process aimed at obtaining scientific results.
Research data are:
- raw data (which were obtained directly as a result of a research tool),
- processed data (compiled).
Examples of research data:
- experimental notes, logbooks
- laboratory protocols, procedure descriptions
- methodological descriptions
- samples
- artefacts, objects
- textual documents
- questionnaires, surveys
- audio or video recordings
- photographs, images
- database content (images, texts, audio and video recordings)
- software (scripts, input files)
- results of computer simulations
- mathematical models and algorithms
Open research data – is data produced in the course of research and used in scientific work, to which any user has free and unrestricted access. These data can be used, modified and shared legally.
Some data may be archived in a closed model, due to:
- commercialisation of research results, e.g. applying for patent protection for an invention
- national security
- protection of personal data
- copyright restrictions
Dataset – a structured set of data, made available in a given repository, that relates to a given topic and is provided with metadata describing its content.
Metadata of research data
Appropriate preparation, organisation and description of the data will enable its efficient retrieval.
Data should be provided with metadata in such a way that the recipient knows what kind of data it is, how it was produced and under what conditions it can be used.
There is no single universally applicable metadata description standard for research data, so it is a good idea to familiarise yourself with the metadata description standards used by the repository where you intend to deposit your data.
The following fields may appear in the metadata description standards that you will need to complete:
- title
- source
- creators (persons or bodies holding copyrights on research data)
- date of production
- format
- language
- information on openness (including licence and possible embargo)
- related project
- related publication, etc.
These tools can help in choosing the most suitable standard:
Publicly available file formats should be used. For this purpose, it is advisable to use uncompressed file formats that do not require commercial software and use standard encoding (ASCII, Unicode).
In some cases, the migration of data to an open format may result in the loss or distortion of some data/metadata. It is then acceptable to deposit data in closed formats.
If the data is readable by commercial tools, but ones that are commonly used in the discipline, then it is also acceptable to deposit such data.
Before preparing datasets, check that the repository allows the data to be deposited in the format of your choice.
File formats
Data should be deposited in such a way as to ensure its long-term readability and accessibility. When sharing research data, consideration should be given to:
- the software with which they will be readable
- the sustainability of the chosen file formats.
Publicly available file formats should be used. For this purpose, it is advisable to use uncompressed file formats that do not require commercial software and use standard encoding (ASCII, Unicode).
In some cases, the migration of data to an open format may result in the loss or distortion of some data/metadata. It is then acceptable to deposit data in closed formats.
If the data is readable by commercial tools, but ones that are commonly used in the discipline, then it is also acceptable to deposit such data.
Before preparing datasets, check that the repository allows the data to be deposited in the format of your choice.
Sharing research data
Data should be open as much as possible and closed as much as necessary. To help researchers prepare and share data appropriately, FAIR principles have been developed, according to which data should be:
Findable – easy to find; the dataset must be provided with metadata such that it is searchable by the relevant tools available in the repository
Accessible – (at least down to the metadata level) to anyone having access to Internet;
- availability in FAIR does not mean open access without restriction; it means that the exact conditions under which data are made available and reusable are specified through metadata
The following open licences are worth using:
- Creative Commons licenses
- GNU free software licenses
- Open Data Commons database licenses
Metadata should be available even if the dataset has been moved or deleted.
Interoperable – data must be described to an appropriate standard and using a correct methodology; they should also be deposited in formats that allow them to be read and processed
Reusable – this means that the description or the datasets themselves should contain information on the origin of the data, together with the entire methodology of data extraction; the possibility of re-use also requires that the licence under which the data have been shared and can be processed is indicated.
Repositories
Research data should be collected and made available in institutional, national or international repositories.
When selecting a repository, the following should be taken into consideration:
- Under what conditions will the data be stored?
- How will the data be secured?
- Does the repository support a discipline-specific standard for metadata description?
- Does the repository ensure the assignment of an identifier, e.g. DOI, to datasets (this translates into better retrieval of data)?
- Is it possible to link the dataset to authors using identifiers, i.e. ORCID?
- Are other researchers in the discipline using the same repository?
- The cost of depositing data (check whether your chosen repository applies an additional fee, the so-called Data Processing Charge, or whether depositing data is free of charge)
When choosing a repository, it is also worth using the Register of Research Data Repositories. This is a global register of research data repositories from all scientific disciplines.
Some of the most popular research data repositories today are:
- WUT Data Repository –
- RepO – Repsitory for Open Data D– a national repository created as part of the Open Science Platform. It allows depositing so-called small data. Use of the service is free of charge.
- Zenodo – an OpenAIRE project, supporting open access and data movement in Europe. The repository has been developed with EU funding. It complies with the FAIR principles. There is a limit of 50 GB per dataset.
Data Management Plan (DMP)
DMP is a document required for grant applications under Horizon 2020 and its successor Horizon Europe programmes, and in National Science Centre competitions.
The plan should describe what data applicants will use in the course of their work:
- How will the data be produced (e.g. will it be self-generated or purchased, etc.)?
- Who will have the rights to the data?
- Will the data be made available to other users and under what conditions?
- Where will the data be stored?
- How will the data be described?
Please take a look at the tabs:
- DMP in NCN programmes
- DMP in the HORIZON programme
The Reference and Bibliometric Analysis Department is consulting WUT staff on the correctness of research Data Management Plans prepared for submission. Please contact us by email:
- Magdalena Maciąg: This email address is being protected from spambots. You need JavaScript enabled to view it.
- Monika Gajewska: This email address is being protected from spambots. You need JavaScript enabled to view it.
Creating a DMP is a complex process, each plan is different, so the following free tools can be useful in the creation of a DMP:
- DMPTool (US) – contains examples of DMPs. It allows you to prepare DMP templates tailored to the requirements of US grantors
- DMPonline (UK) – a tool to facilitate work with DMPs, allowing the creation of templates
- DSW - Data Stewardship Wizard – a tool to facilitate work with DMPs, allowing the creation of templates
- The Data Curation Center – a service from a UK institution specialising in research data management. Among other things, it provides: ready-made data management plans, guides, guidelines and information on metadata
DMP in NCN programmes
A Data Management Programme (DMP) is prepared at the grant application stage:
- A template is available for the plan to be submitted
- The template is divided into 6 parts - each part contains a set of questions. The individual fields can be up to 1,000 characters including spaces (except for point 2.1 - where the limit of 2,000 characters applies)
- The plan is subject to an expert evaluation at the final report assessment stage. The evaluation will consist in comparing the plan in the application with its implementation
The DMP may be subject to change during the course of the project:
- It is recommended to update the Data Management Plan during the project duration
- There is no need to inform NSC about changes in DMP
- The final report should describe the actual state of data in the project - as of the project completion date. It may be different than initially planned
- DMP should be prepared in English (excluding thumbnails)
The National Science Centre template consists of the following elements:
1.1 Data description and collection or re-use of existing data
In this section you should answer the following questions:
- How will new data be collected or produced and/or how will existing data be reused?
- How will the data be controlled and documented?
- How will the files be organised and their different versions managed?
sliders_set_4slider_see-exampleSee example
The reaction of hydrodechlorination of 1,2-dichloroethane (1,2-DCE) will be carried out at atmospheric pressure, in a glass flow reactor equipped with a fritted disk to place a catalyst charge. Prior to reaction, the catalyst will be reduced in flowing hydrogen (30 cm3/min), ramping the temperature from 20 to 600 °C (at ∼15 °C/min) and kept at 600 °C for 1 h. All reactions will be followed by gas chromatography (HP 5890 series II with FID 5% Fluorcol/Carbopack B column (10 ft) from Supelco). The results of GC analysis will be elaborated using HP ChemStation (software). XRD studies of Ni–Ru/SiO2 catalysts at various stages of their biography (after calcination, after reduction, and after hydrodechlorination) are also planned to furnished useful information. XRD experiments will be performed on a standard Siemens D5000 diffractometer using Ni-filtered CuKα radiation. Those experiments are planned to do in an external laboratory. The data set will consist of xml files and description of the methodology (if the versioning happens during the research, all versions will be available in the set).
1.2 What data (for example the kinds, formats, and volumes) will be collected or produced?
In this section you should answer the following questions about the planned format and volume of data:
- What type of data will it be (e.g. documents, spreadsheets, audio files, videos, databases, source code)?
- What format and volume the data will be in (the file format can be anything, the most important thing is to choose a format that provides universal access and openness. Open and standard formats should be considered first)?
- Whether and how will the data be encoded for storage?
sliders_set_5slider_see-example-2See example
Results from ChemStation will be exported into XML files. The data set will have up to 10 GB size. The data does not require additional encoding. XRD analysis will be shown on pictures and graphs in jpg format.
2. Documentation and data quality
To be specified:
- The type of metadata used to make it easier for users to find them
- Is the data machine-readable?
- What international standards or schemes (i.e. Dublin Core, DDI) will be used to structure metadata?
sliders_set_6slider_see-example-1See example 1
Data and their associated metadata will be deposed in a public repository, Zenodo. The data will be stored with a "readme" file and clear folder structure and filename descriptions. Metadata are retrievable by their identifier using a standardized communication protocol. The repository is registered in the Directory of Open Access Repositories (OpenDOAR). Zenodo registers DOIs (via DataCite) for all deposited records. In Zenodo, metadata meets one of the broadest cross-domain standards available - DataCite's Metadata Schema. The following additional article level fields are supported: journal title/volume/issue/pages, conference title/acronym/dates/place/website, book publisher/place/ISBN/title/pages, alternate persistent identifiers. Results from ChemStation will be exported into XML files - which is a machine readable format. Results of XRD analysis will not have a machine readable format because of its specificity (jpg format).slider_see-example-2-2See example 2
The data will be stored in Dryad repository. The data will be stored with "readme" file and clear folder structure and filename descriptions. The description of the "read me" files will be prepared in a consistent and descriptive manner following the repository rules. Dryad welcomes the submission of data in multiple formats to enable various reuse scenarios. Default metadata entry form is based on fields from the metadata schema of the DOI issuing agency, DataCite. Recommended minimum content for metadata in this repository is: title of the dataset, institution name, address, email information for the principal investigator (or the person responsible for collecting the data), associate or co-investigators, contact person for questions, date of data collection (can be a single date, or a range), information about geographic location of data collection, keywords used to describe the data topic, language information, information about funding sources that supported the collection of the data. Results will be exported into XML file - which is a machine readable format. Results of XRD analysis will be uploaded in (jpg format).
3. Storage and backup during the research process
To be specified:
- How will the data be stored during the research process?
- What are the back-up procedures?
- Where and how will the data be stored and who will have access to it?
- How will data security and protection of sensitive data be taken care of during the research?
- Would any other, additional documentation (e.g. any information on procedures, etc.) be necessary to reuse the data?
- Will backups be made and how? Indicate how often they will be made, by whom, on what media and where they will be stored
sliders_set_7slider_see-example-3See example
Only researchers involved in survey will have access to the collected data. Access to the software will be protected with login data. File access privileges are defined on a per user basis for data files, methods, sequences, and results. The data will be stored on well-protected laptops (with up-to-date firewalls and virus/trojan protection) and servers (university servers). Loss of data will be prevented by making regular backups (each month). The backups will be stored in the secure faculty and/or institute storage that is in place for this specific purpose, to minimize the risk of unauthorized access.
4.1 Issues related to the processing of personal data
To be specified:
- Has there been any processing of personal data?
- How will compliance with personal data and data security legislation be ensured?
sliders_set_8slider_see-example-4See example
The survey does not assume using any of the personal data.
4.2 Legal requirements, codes of conduct
To be specified:
- Who will be the owner of the data?
- Which licenses will be applied to the data?
- Will there be restrictions on re-use of data?
- Do you need to seek copyright clearance before sharing data?
sliders_set_9slider_see-example-1-2See example 1
XRD studies of Ni–Ru/SiO2 catalysts are planned to do in the external laboratory. All necessary agreements (including copyright transfer) will be signed to ensure legal rights to use the data in this project. The data set will be opened to all users under the Creative Commons Attribution 4.0 International (CC-BY-4.0) license. More information about the licensee can be found on the European Commission webpage. Specifically, the research data will validate the results presented in the published scientific paper.slider_see-example-2-3See example 2
The survey team will have all rights to the data. All necessary agreements will be signed before depositing the data in repository. The data will be released to the public domain. All files submitted to Dryad must abide by the terms of the Creative Commons Zero (CC0 1.0).
5.1 Data sharing and long-term preservation
When and how will project data be shared?
- Are there possible restrictions to data sharing or embargo reasons?
- Are there any barriers and constraints to making the research data fully or partially accessible (e.g. from the publisher of the article)?
- Does data sharing require the consent of project participants?
sliders_set_10slider_see-example-5See example
The data will be available at the same time as the paper will be published. We do not assume any limitations and obstacles preventing full or partial data disclosure. Sharing the data will not inquire any additional consent of the project participants.
5.2 Data selection:
To be specified:
- What procedure would be used to select data to be preserved?
- What repository will you use? Does this repository comply to the FAIR Data Principles?
sliders_set_11slider_see-example-1-3See example 1
The data that will be deposited will be selected in such a way that other scientists can repeat the survey. They will support and illustrate all information from the journal article that will be published. Data and their associated metadata will be deposited in a public repository, Zenodo Repository. This repository meets all FAIR requirements.slider_see-example-2-4See example 2
The data and their associated metadata will be deposited in Dryad repository. All datasets in Dryad are indexed by the Clarivate Data Citation Index, Scopus, and Google Dataset Search. Each dataset is given a unique Digital Object Identifier or DOI. This repository meets all FAIR requirements. Dryad has a team of curators who check every submission to ensure the validity of files and metadata. After deposing the data in Dryad, the bibliographic description of the data will also be deposited in Institutional Repository (with a link to the Dryad repository and the data's DOI number). The deposited data will be selected in such a way that other scientists can repeat or fully understand the survey.
5.3 Software tools needed to access the data
To be specified:
- Will potential users need specific tools to access and (re)use the data?
sliders_set_12slider_see-example-6See example
During the survey, open data formats are planned to be used (xml or jpg). Users will do not need any specific software to read the data.
5.4 Unique identifiers (e.g. DOI)
To be specified:
- How will the application of a unique and persistent identifier (such us a Digital Object Identifier (DOI)) to each dataset be ensured?
sliders_set_13slider_see-example-1-4See example 1
The data will be deposited in Zendodo Repository. To every upload in Zenodo is assigned a Digital Object Identifier (DOI).slider_see-example-2-5See example 2
The data will be deposited in Institutional Repository. This repository does not assign DOI to data sets. However, in the metadata scheme used there, there is a field for it. To meet all FAIR requirements, the DOI costs will be covered from the project budget. The DOI will be filled in in the repository during uploading the datasets and metadata.
6.1 Data management responsibilities
To be specified:
- Who will be responsible for data management (i.e. who will be the data steward)?
sliders_set_14slider_see-example-1-5See example 1
The Project Manager will take over the main responsibilities related to the data management process. The Institute does not provide additional staff to support research data management throughout the duration of the project. The main tasks of the project manager in this area will include: taking care of data quality, creating data processing procedures in the laboratory, naming files, creating backups and their appropriate storage, long-term data archiving in the repository and taking care of this data after the end of the project (adding new versions). The Project Manager will have an additional person to help: Assistant Project Manager. He will also be involved in deposing data and sharing of the project results. He will be responsible for periodically checking if the backups are made on an ongoing basis and properly stored. In addition, the Project Manager is obliged to transfer the data to the other members of the project when any of the team members leaves the team. Both the Project Manager and the Assistant will be responsible for checking that the DMP is being followed.slider_see-example-2-6See example 2
The person responsible for the data management process is the supervisor of the whole project, the same person will be responsible for DPM implementation, updating, and verification. An additional member of the staff will also be assigned to make backups and data corrections during the project. The main tasks will include: taking care of data quality, creating data processing procedures in the laboratory, creating backups and their appropriate storage, long-term data archiving in the repository, and taking care of this data after the end of the project (adding new versions).
6.2 What resources will be dedicated to managing the data and ensuring that the data is FAIR?
To be specified:
- What are the costs of ensuring FAIR standards in the project and how they will be covered?>
sliders_set_15slider_see-example-1-6
See example 1
The data will be deposited in a free repository. If necessary, devices for data backup will be purchased (2 external 4TB drives. The cost of one drive is PLN 420).slider_see-example-2-7See example 2
The main costs for ensuring FAIR principles in the project are for the deposited data in Dryad Repository. Dryad charges excess storage fees for data totaling over 50GB. For data packages in excess of 50GB, submitters will be charged $50 for each additional 10GB, or part thereof. (Submissions between 50 and 60GB = $50, between 60 and 70GB = $100, and so on).
The Reference and Bibliometric Analysis Department is consulting WUT staff on the correctness of research Data Management Plans prepared for submission. Please contact us by email:
- Magdalena Maciąg: This email address is being protected from spambots. You need JavaScript enabled to view it.
- Monika Gajewska: This email address is being protected from spambots. You need JavaScript enabled to view it.
See also:
DMP in the HORIZON programme
The first version of the DMP should be submitted once the project has been approved and funding has started (within the first 6 months of the start of the project):
- A template of the plan to be submitted is available - the template is a set of questions to be answered
- The DMP needs to be updated on an ongoing basis - if significant changes occur (e.g. decision to file a patent, need to make research confidential, etc.)
- Costs related to open access to research data under Horizon 2020 are eligible for reimbursement during the course of the project, under the terms of the H2020 grant agreement, in particular Article 6 and Article 6.2.D.3
- The plan should be prepared in English
The DMP should include the following elements to address specific questions:
Data Summary
- What is the purpose of data collection/generation?
- What types and formats of data will be collected/generated?
- What is the origin of the data (will we generate it ourselves or use existing data)?
- Who will have the rights to the data - is it necessary to conclude agreements on the use and dissemination of the data?
- What is the expected size of the data?
- What will be the methodology of data extraction?
- Will the data be produced once or continuously?
- Will the data need to be versioned?
sliders_set_16slider_see-example-7See example
An online survey system will be used to collect data from the respondents. The data will be interpreted and used for writing a scientific paper. The survey will be conducted by using the LimeSurvey tool. LimeSurvey is a free software. It will be downloaded and installed on the University server to ensure the data safety. Access to the software will be protected with login data. Only researchers involved in the survey will have access to the collected data. The survey will be conducted anonymously. We do not assume collecting any personal data which need to be protected. The researchers will have all legal rights to the survey output. The data will be stored on the University server in the Lime Survey software until the end of the survey. When the survey has been completed, all data will be transformed into the csv. (comma-delimited file). All questions from the questionnaire will be transferred into the text file (.odt). The survey methodology will be written and saved in text format. The data set will consist of those three files. The data set will have up to 50 GB. As the data dimensions are relatively low, no new specialist processing or storage will be required.
FAIR data (findable, accessible, interoperable and reusable)
- To what extent will the data comply with FAIR principles (findable, accessible, interoperable and reusable)?
- With which metadata description format will the datasets be described?
- Will they be assigned a unique identifier, e.g. DOI?
- Will the data be placed in a publicly accessible repository / service and made available to other users? If so, under what licence?
- Will reading the data require additional software (how and with which tools will it be readable)?
- How long will the data be available in the repository?
- Do the publishers of the journal in which the research results will be described require research data to be included with the article?
- For more on the FAIR principles, see Sharing research data
sliders_set_17slider_see-example-8See example
The data and their associated metadata will be deposed in a public repository Zenodo. This repository meets all FAIR requirements. (Meta)data are assigned a globally unique and persistent identifier. A DOI is issued to every published record on Zenodo. Zenodo's metadata is compliant with DataCite's Metadata Schema minimum and recommended terms, with a few additional enrichments. Metadata of each record is indexed and searchable directly in Zenodo's search engine immediately after publishing. Metadata of each record is sent to DataCite servers during DOI registration and indexed there. Metadata for individual records as well as record collections are harvestable using the OAI-PMH protocol by the record identifier and the collection name. Metadata is also retrievable through the public REST API. Data and metadata will be retained for the lifetime of the repository. This is currently the lifetime of the host laboratory CERN, which currently has an experimental program defined for the next 20 years at least. Metadata are stored in high-availability database servers at CERN, which are separate from the data itself. Metadata use a formal, accessible, shared, and broadly applicable language for knowledge representation. The data set will open to all users under the Creative Commons Attribution 4.0 International (CC-BY-4.0) license. More information about the licence can be found on the European Commission webpage. Sharing of the data will follow the principle "as open as possible, as closed as necessary". Specifically, the research data will validate the results presented in a published scientific paper. The data will be available at the same time as the paper is published.
Allocation of resources
- What are the costs associated with meeting the FAIR standard?
- What will be the costs associated with storing and sharing the data (will the data be shared in a paid service and what will be the cost)?
- Who will be responsible for data management in the project?
sliders_set_18slider_see-example-9See example
In Zenodo, the content may be uploaded free of charge. The person responsible for managing the data and creating metadata compatible with the standard used in Zenodo Repository is the supervisor of the whole project, namely, Prof. Marcin Sosnowski.
Data security
- What rules will be in place to ensure data security, including data recovery (in case of data loss)?
- If sensitive data will be collected during the project, how will it be secured?
- Will the data require additional processing to ensure its anonymity?
- Does the repository / data storage facility meet basic security principles?
sliders_set_19slider_see-example-10See example
Regarding the secure storage of the data, the project will use an architecture based on cloud services to store the data. The used services provide the functionalities needed to address secure storage and data security. The questionnaire will be available to respondents for 3 months. Each week, it will be sent to another respondents group. Each week, a copy of the data will be created, transformed into csv. and then put into the cloud. In case of unusual situation or losing data, the .csv file can be uploaded to the LimeSurvey. Access to the LimeSurvey tool will also be protected with login data (username and password). Only researchers will have access to this tool.
Ethical aspects
- Are there any ethical or legal issues that may affect data sharing?
- If personal questionnaires were used, was respondents' consent to the sharing and long-term storage of data taken into account?
sliders_set_20slider_see-example-11See example
The questionnaire used in this survey will contain an obligatory clause with the information about the purpose of the survey. All respondents will be informed that their answers will be interpreted and used for gathering more knowledge about social behavior. Respondents will also be ensured that their personal data will not be collected or shared.
Other issues
- Do you use specific data management procedures (e.g. national / project / university guidelines)? If yes, please specify which procedures you use.
The Reference and Bibliometric Analysis Department is consulting WUT staff on the correctness of research Data Management Plans prepared for submission. Please contact us by email:
- Magdalena Maciąg: This email address is being protected from spambots. You need JavaScript enabled to view it.
- Monika Gajewska: This email address is being protected from spambots. You need JavaScript enabled to view it.
See also: