Composed by: Pedro Mendes (RDA/EOSC Future Ambassador for Catalysis), Daniel Costa
Contributors: Allyson Lister
Comments requested: Please note that this is a new Discipline page, and it is open for comments from the RDA Community. To add your input please use the comments section below.
Downloadable disciplinary info sheet: Chemistry
What is chemistry data?
Chemistry studies the composition, structure, and properties of substances and the transformations that they undergo1. Thus, all data generated throughout such a study can be considered chemistry data. For more specific terminology questions, NFDI4Chem provides a terminology service, and a list of all terminologies tagged with Chemistry and registered with FAIRsharing (an output of the RDA FAIRsharing WG) is also available.
For a machine-interpretable definition, an ontology should be employed. A review of current ontologies for chemistry can be found here.
Where is chemistry data shared?
In addition to general-purpose repositories/databases, like Zenodo2, there are specific ones for chemistry. Some sub-domains, like analytical chemistry and computational chemistry, have mature repositories and thus sharing via these specific repositories is highly recommended. The FAIRsharing registry provides a searchable registry of Chemistry databases as part of the larger registry of databases, standards and policies across all subject areas. Re3data provides a searchable database of repositories, while NFDI4Chem guides one to choose “the right repository” in chemistry.
How is chemistry data shared (e.g. standards, guidelines, trusted examples)?
For a quick start, check the knowledge database of NFDI4Chem, and you can also search the FAIRsharing registry generally, more specifically its ecosystem of Chemistry standards. While full standardisation is still far, some standards have been developed within the community. For instance, for chemical reactions, the Unified Data Model standard format can be used.
When no standards are available, generic guidelines based on FAIR principles3 should be applied (check RDA ones listed below, also see here4). The underlying principle for chemistry being to report the full raw data of experimental or computational results and key determining variables. IUPAC is also running a series of projects which should issue chemistry-specific guidelines in the next couple of years, taking also part of the WorldFAIR project.
Additionally, a variety of trusted examples are provided by NFDI4Chem and the Spotlight section can also serve as inspiration.
What are typical data and file formats for chemistry data?
Most chemistry data is better reported in tabular formats, but there are many specific formats for specific data in chemistry. For instance, molecular structures can be represented by InChI5 or SMILES6 formats.
For data types and formats in chemistry, check DataCC and NFDI4Chem lists.
Tips on the best file formats for a given generic data format are provided by OpenAIRE and the 5-star for Open Data classification. Formats should be accessible and interoperable, e.g. ideally csv or json for tabular formats.
Chiefly, biology and materials science. Both disciplines are contiguous to chemistry topic-wise and particularly biology is quite some steps ahead4,7 in terms of open science practices, serving thus as inspiration.
RDA Groups active in this discipline
RDA Groups in this discipline that are no longer active
If your Working Group or Interest Group may be of relevance to those working in Chemistry, please email enquiries[at]rd-alliance.org to have your group added to this page.
Highlighted RDA Outputs
Are you an expert in Chemistry resources and believe you can contribute to FAIRsharing? If so please consider joining the community champions.
|Online machine-actionable tool developed by OpenAIRE to facilitate Research Data Management (RDM) activities concerning the implementation of Data Management Plans (DMPs).
||EUDAT metadata indexing service that provides a discovery portal which allows users to find data collections within an international and inter-disciplinary scope.
|OpenAIRE Mining Service
Service that performs text mining (entity resolution) on the metadata and the text of publications and extracts information on: links to projects/grants and funders; data citations or links to scientific database entries (e.g. links to entries in PDB - Protein Data Bank); document classification according to several taxonomies; software citations; author affiliations; references; document similarity.
Join the Chemistry Research Data Interest Group (CRDIG) at RDA.
Developers and curators for FAIR data in chemistry can be found at FAIRsharing as members of the Community Champions programme.
1. Merriam-Webster dictionary, https://www.merriam-webster.com/dictionary/chemistry, on 11/01/2023.
2. Zenodo in FAIRsharing: https://doi.org/10.25504/FAIRsharing.wy4egf
3. FAIR Principles in FAIRsharing: https://doi.org/10.25504/FAIRsharing.WWI10U4. SMILES in FAIRsharing: https://doi.org/10.25504/FAIRsharing.qv4b3c
4. P. S. F. Mendes, S. Siradze, L. Pirro, J. W. Thybaut, ChemCatChem 2021, 13, 836.
5. InChI in FAIRsharing: https://doi.org/10.25504/FAIRsharing.ddk9t9
6. SMILES in FAIRsharing: https://doi.org/10.25504/FAIRsharing.qv4b3c
7. S. Herres-Pawlis, J. C. Liermann, O. Koepler, Research Data in Chemistry – Results of the first NFDI4Chem Community Survey. Z. Anorg. Allg. Chem. 2020, 646, 1748.
Chemical research data is fundamentally the most important product that chemists create as it guides future research and allows us to understand how chemistry works, what we can do with it and how it affects our lives. Chemistry, as a discipline, has long focused on its own specific needs in analyzing and communicating chemical data, especially the representation and identification of chemical structures. Today’s chemical informaticians (cheminformaticians) have pushed technology to make storing and searching chemical information easier and more standardized for increasing amounts of data.
The time has come, however, to take a step back and look at chemical informatics as it relates to the needs relative to: dealing with large amounts of heterogeneous data that are stored in different ways, how we transmit chemical information to the many other disciplines that need it, and how we semantically represent it in digital form. Looking to efforts of other disciplines and understanding and appreciating the importance and impact of generalized data technologies and RDA outcomes will strengthen chemistry and help move it toward open data in a way that is interoperable and forward thinking.
The Chemistry Research Data Interest Group (CRDIG) is focused on mechanisms by which we can improve chemical informatics and highlight its importance in the global data economy, specifically:
- Bringing together important stakeholders relative to open chemical data (e.g., the American Chemical Society - Division of Chemical Information, the International Union of Pure and Applied Chemistry (IUPAC), and others)
- Bridging the chemical informatics and RDA communities to help appreciate and understand what each has to offer
- Development of both RDA Working Group and IUPAC project proposals for important domain activities such as:
- Establishing new or revised metadata, ontology, chemical structure, or data format standards
- Characterization of the different chemical information types, identification of the critical points in the data life-cycle, and mapping of gaps in interoperability