Skip to content

Commit

Permalink
Merge branch 'localisation' of https://github.com/NFDI4Chem/knowledge…
Browse files Browse the repository at this point in the history
…_base into localisation
  • Loading branch information
jliermann committed Nov 12, 2024
2 parents a4914eb + 2f3a664 commit ab60d47
Show file tree
Hide file tree
Showing 35 changed files with 1,087 additions and 529 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/localisation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ jobs:
if: github.ref == 'refs/heads/localisation' || github.event_name == 'workflow_dispatch'
steps:
- name: Checkout repository
uses: actions/checkout@v2
uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v2
uses: actions/setup-node@v4
with:
node-version: "lts/*"

Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
fonts
# Localisation files
/i18n
# API data
elnData.json
elnDataPharm.json
# Generated files
.docusaurus
.cache-loader
Expand Down
2 changes: 1 addition & 1 deletion docs/00_intro/10_fair.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ Especially when looking at metadata, effective and efficient machine readability

### I2. (meta)data use vocabularies that follow FAIR principles

The applied vocabularies or ontologies should be well-documented and resolvable using a PID. For instance, CHMO mentioned [above](#i1-metadata-use-a-formal-accessible-shared-and-broadly-applicable-language-for-knowledge-representation) uses a [persistent URL (PURL)](http://www.purlz.org/home), resolvable using a standard web browser through `http`, while the [documentation](https://github.com/rsc-ontologies/rsc-cmo) is publicly available on Github.
The applied vocabularies or ontologies should be well-documented and resolvable using a PID. For instance, CHMO mentioned [above](#i1-metadata-use-a-formal-accessible-shared-and-broadly-applicable-language-for-knowledge-representation) uses a [persistent URL (PURL)](https://en.wikipedia.org/wiki/Persistent_uniform_resource_locator), resolvable using a standard web browser through `http`, while the [documentation](https://github.com/rsc-ontologies/rsc-cmo) is publicly available on Github.

### I3. (meta)data include qualified references to other (meta)data

Expand Down
89 changes: 28 additions & 61 deletions docs/10_domains/10_analytical_chemistry.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,84 +4,51 @@ nfdi4chem-id: dac
slug: "/analytical_chemistry"
---

import Methods from '@site/src/components/Methods.js';
import {LbeChip} from '@site/src/components/lbe/LbeElements.js';
import { LbeChip } from "@site/src/components/lbe/LbeElements.js";
import ElnFinder from "@site/src/components/eln/ElnFinder.js";

<LbeChip title="analytical chemistry" /><LbeChip title="chemometric" />
<LbeChip title="analytical chemistry" />
<LbeChip title="chemometric" />

## Introduction

:::info Summary:
Although analytical chemistry is one of the oldest branches of chemistry, it continues to evolve. New methods and technologies are constantly being developed. Providing the tools and techniques needed to identify and quantify the chemical constituents of a sample, analytical chemistry is a cornerstone of both academia and industry. It is essential for a wide range of applications, from environmental monitoring to drug discovery.

Analytical chemistry is one of the oldest scientific disciplines and an interdisciplinary science, combining methods of physical, inorganic and organic chemistry. Analytical chemistry aims to acquire, process, and evaluate signals to qualify and quantify the composition and to unravel the structure of matter. The analytical chemist applies classical (wet) chemistry and instrumental methods for separation, identification (qualification) and quantification. The discipline is related to many research fields in life, environmental, earth, and engineering sciences, such as metabolomics, medicine, and geochemistry. <br/>
A typical workflow begins with the conceptualisation of the research question, and the planning of experiments, methods, and surveys to evaluate the hypotheses. Surveys are utilised in life, environmental, and earth sciences to perform experiments and/or to obtain samples required to support the research (e.g., laboratory and field campaigns, cohort studies). Experiments are conducted and samples are processed applying existing or newly established methods along with recording of accompanying metadata. Analytical chemistry applies already in the experimental or sampling stage as conditions need to be controlled or metadata has to be acquired (e.g., pH, temperature, colour). Once the product of the experiment or sample processing is obtained, it is analysed with suitable direct or combined methods for identification and quantification. Processing and interpretation of the acquired research data and [metadata](/docs/metadata) support the answering of the research question and decisions for further experiments, research, and measures.
:::
Despite the diversity of analytical methods, a common denominator is the large amount of data generated by instrumental methods. This data must be processed and interpreted to extract meaningful information. This makes analytical chemistry a challenging field for research data management.

## Type of experiments for chemical analysis
## Data Types

### Sampling
Unlike some other areas of chemistry, most research data in analytical chemistry are generated by instrumental methods. In addition, the size and complexity of the data can vary greatly depending on the method used.

- Collecting of materials for analysis
- Transport and storage without alteration of samples
- Small-scale experiments to develop / optimise approaches and equipment
- Upscaling of methods to obtain sufficient material for a comprehensive analysis
Techniques like mass spectrometry, chromatography, and spectroscopy generate complex data consisting of raw data files, metadata, and processed data. Raw data files are often proprietary and require specialised software to open and interpret. Metadata is crucial for understanding experimental conditions and parameters. Processed data can range from simple peak lists to complex multivariate models.

### Sample processing
Some open data formats are available for specific data types, such as mass spectrometry data in the [mzML format](https://www.psidev.info/mzML). The [JCAMP-DX format](https://iupac.org/what-we-do/digital-standards/jcamp-dx/) is used for optical spectroscopy data. This format is also suitable for NMR spectroscopy data, but with some major limitations. For chromatography or combined chromatography-mass spectrometry data the situation is more complex as many vendors have their own proprietary formats.

- Preparation of the sample for analysis:
- Direct methods: no further treatment (e.g., pH, RFA, MALDI, direct infusion)
- Single methods:
- Dissolving
- Extraction
- Pulverising
- Combined methods:
- Extraction and enrichment (e.g., solid-phase extraction, aqua regia digestion, volatilisation of solvents)
- Separation of interfering compounds (e.g., chromatography, precipitation)
- Chemical transforming in measurable form (e.g., complexing, derivatisation)
- Small-scale experiments for screening / optimization of separation conditions and upscaling
## ELNs and Other Tools

### Determination and evaluation
- Product characterisation with feasible methods (e.g., NMR spectroscopy, mass spectrometry, IR spectroscopy, UV/vis spectroscopy, elemental analysis)
- to identify analytes (targeted and non-targeted)
- to assess the constitution of mixtures
- to quantify analytes
General chemistry ELNs can typically be used for analytical chemistry data and may be well suited to your research topic. However, there are also specialised tools that are tailored to the needs of analytical chemists. These tools often include features for managing instrument data, processing raw data files and visualising results. They may also include tools for chemometric analysis.

## Planning of experiments
<ElnFinder subDisc="Analytical chemistry" />

- New and reused analytical methods and research ideas are derived from previous work of the own research group, scientific literature, datasets published in [repositories](/docs/repositories), and requirements of public calls for research, development and demonstration projects.
- Experimental design follows a logical order to achieve a specific goal, such as analytical selectivity and sensitivity, or in the case of a non-targeted analysis (e.g., in metabolomics), a coverage of a broad physical-chemical domain of analytes.
- Planning is concluded by adding the experimental details. All [metadata](/docs/metadata) is [documented](/docs/data_documentation) in an [ELN](/docs/eln) (e.g., [Chemotion ELN](https://www.chemotion.net/chemotionsaurus/index.html)) including references.
## Publishing Data

## Documentation of experiments
Data from analytical chemistry can be published on several platforms, depending on the research subject and data type.

- Documentation of research data and metadata is carried out digitally using an ELN.
- Experimental conditions (e.g., solvents, temperature, duration, pressure) are noted in the ELN and if available a laboratory information system.
- Observations and results of analytical methods with no digital output (i.e. no data files) are added manually to the ELN entry of the experiment, which may include temperatures, or the pH (with metadata where applicable).
- Obtained data from analytical instruments (e.g., NMR, MS, or IR data) are uploaded to the Chemotion ELN in open file formats and directly attached to the respective ELN experiment entry including instrumental setup metadata.
- In case instrumental metadata is not convertible to open format without information loss, conditions need to be documented in the ELN.
- Metadata related to the obtained data, such as mass, volumes, or solvent of measurement, have to be provided according to [metadata standards](/docs/format_standards).
If the analytical data have a more supporting role in a larger study it may be appropriate to publish the data in a general data repository. If the research focuses more on the analytical method itself, it may be more appropriate to publish the data in a specialised repository.

## Data producing methods
General data repositories include [Zenodo](https://zenodo.org/) or [RADAR4Chem](https://radar.products.fiz-karlsruhe.de/de/radarabout/radar4chem). For analytical data in context with synthetic chemistry data, [Chemotion Repository](https://chemotion-repository.de/) might also be a suitable option.

- Data can be collected during the experiment or after the experiment by analysing the obtained product.
- Manually determined data: Experimental observations, mass, volumes, pH, etc.
- Digital data are obtained with analytical instruments. An overview of file extensions, file sizes, and converters for several analytical methods is given in the table below.
- Raw data files in proprietary file formats should be saved alongside interoperable open file formats by using converters or the analytical device software. If no specific open format is currently available, export as .txt or .csv is recommended. Please be aware that metadata included in the header of .txt or .csv files may not follow a defined (open) format and metadata should be additionally also added into the ELN.
For method-specific data, several specialised repositories are available. A few examples include:

<Methods defaultProfile={"synthetic"} />
- [MassBank EU](https://massbank.eu/)
A field-specific ecosystem of databases and tools for mass spectrometry reference spectra.
- [MetaboLights](https://www.ebi.ac.uk/metabolights/)
A repository for metabolomic studies.
- [nmrXiv](https://nmrxiv.org/)
A repository for NMR data.

:::note *This table will be continuously updated with new recommendations on interoperable open file formats.
:::
This list is not exhaustive, and there may be other repositories that are more suitable for your data.

## Data analysis
## Challenges

- Research data can be processed, analysed and compared (also to data of other experiments) within the [Chemotion ELN](https://www.chemotion.net/chemotionsaurus/index.html).
- Optionally, preprocessing of digital data with software of analytical device before data are transferred to the Chemotion ELN (cf. data producing methods).
- A detailed view, evaluation and interpretation of results is carried out with the Chemotion ELN features.


## Publishing research data

- In addition to a research article in a scientific journal, the underlying research data are [published](/docs/data_publishing) in a [repository](/docs/repositories) and linked to the article to realise research data management according to the [FAIR data principles](/docs/fair) ([Best practice examples](/docs/best_practice)).
- Data publications in repositories include raw and processed data for reuse.
- The use of the [Chemotion ELN](https://www.chemotion.net/chemotionsaurus/index.html) enables a direct transfer of research data and the respective metadata to the [Chemotion Repository](https://www.chemotion-repository.net/welcome). Subsequently, these data are automatically shared with other repositories, e.g. [PubChem](https://pubchem.ncbi.nlm.nih.gov/). For the publication of research data in other discipline-specific repositories, such as the [MassBank](https://massbank.eu/MassBank/) for reference mass spectra, data have to be exported from the Chemotion ELN and submitted to the respective database.
- A [persistent identifier](/docs/pid) (e.g., DOI) is generated for a dataset by a repository (e.g., [DataCite](https://datacite.org/) for the Chemotion Repository), which is given in the journal article or corresponding supporting information to link the data publication with the manuscript.
The biggest challenge in managing analytical chemistry data is the diversity of the field. Different methods generate different types of data, and the data can vary greatly in size and complexity. As mentioned above, the large number of different vendors and proprietary data formats is a major barrier to data sharing and reuse.
Loading

0 comments on commit ab60d47

Please sign in to comment.