NL2RI

Predicting RP-LC retention indices of structurally unknown chemicals from mass spectrometry data

Installation

Use the package and environment management system conda to install the conda environment necessary to run the code in this directory from the gpo_environment.yml file.

conda env create --file environment.yml

This will create a conda environment called retention_indices with the required packages installed for you.

Example

import joblib
import pandas as pd
from predict import *

# load model
model_descriptors_to_ri = joblib.load('models/descs_to_RI_40_model.sav')

# load leverage matrix to check for applicability domain
leverage_matrix = pd.read_csv('leverage_matrices/leverage_mat_train40', index_col=0)

# load some example data
data = pd.read_csv('dataset/small_norman.csv')

# predict retention indices and leverages
ri, leverages = predict_ri_from_descriptor(data, model_desc_to_RI, leverage_matrix)

Likewise, retention indices can be predicted from mass spectrometry data as follows:

import joblib
import pandas as pd
from predict import *

# Load model
model_desc_to_RI = joblib.load('models/nl_to_ri_4220_model.sav')
leverage_matrix = pd.read_csv('leverage_matrices/leverage_mat_train_NL_100k_4220feats.csv', index_col=0)

# Load data
data_nl = pd.read_csv('dataset/small_amide_nls.csv', index_col=0)

# Predict retention indices and leverages
ri, leverages = predict_ri_from_descriptor(data_nl, model_desc_to_RI, leverage_matrix)

In order to get going and start predicting retention indices from mass spectrometry data yourself, all that is left to do is to convert your mass spectra into neutral losses.

# Example parent mass and fragment masses
# Note: all masses should be rounded to two digits
parent_mass = 456.75
fragments = np.array([151.21, 18.83, 25.80, 441.75, parent_mass])

# Borrow columns from other dataset
cols = data_nl.columns[6:]
neutral_loss = parent_mass - fragments

# initiate vector with zeroes
nl_vec = np.zeros(len(cols))

parent_mass_indice = parent_mass * 100
neutral_loss_indices = neutral_loss * 100

# everything higher than isomass, should be set to -1
nl_vec[int(parent_mass_indice):] = -1

# where there is a fragment, place 1.
nl_vec[neutral_loss_indices.astype(int)] = 1

# Add some more info (MONOISOMASS is required)
df_test['NAME'] = 'example_molecule'
df_test['RI'] = 42
df_test['MONOISOMASS'] = parent_mass

# Write to DataFrama
df_test = pd.DataFrame(nl_vec, index=cols).T

Citing

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00699-8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

NL2RI

Installation

Example

Citing

Files

README.md

Latest commit

History

README.md

File metadata and controls

NL2RI

Installation

Example

Citing