Skip to content

Latest commit

 

History

History
97 lines (64 loc) · 3.07 KB

README.md

File metadata and controls

97 lines (64 loc) · 3.07 KB

NL2RI

Predicting RP-LC retention indices of structurally unknown chemicals from mass spectrometry data

License: MIT Open In Colab

Installation

Use the package and environment management system conda to install the conda environment necessary to run the code in this directory from the gpo_environment.yml file.

conda env create --file environment.yml

This will create a conda environment called retention_indices with the required packages installed for you.

Example

import joblib
import pandas as pd
from predict import *

# load model
model_descriptors_to_ri = joblib.load('models/descs_to_RI_40_model.sav')

# load leverage matrix to check for applicability domain
leverage_matrix = pd.read_csv('leverage_matrices/leverage_mat_train40', index_col=0)

# load some example data
data = pd.read_csv('dataset/small_norman.csv')

# predict retention indices and leverages
ri, leverages = predict_ri_from_descriptor(data, model_desc_to_RI, leverage_matrix)

Likewise, retention indices can be predicted from mass spectrometry data as follows:

import joblib
import pandas as pd
from predict import *

# Load model
model_desc_to_RI = joblib.load('models/nl_to_ri_4220_model.sav')
leverage_matrix = pd.read_csv('leverage_matrices/leverage_mat_train_NL_100k_4220feats.csv', index_col=0)

# Load data
data_nl = pd.read_csv('dataset/small_amide_nls.csv', index_col=0)

# Predict retention indices and leverages
ri, leverages = predict_ri_from_descriptor(data_nl, model_desc_to_RI, leverage_matrix)

In order to get going and start predicting retention indices from mass spectrometry data yourself, all that is left to do is to convert your mass spectra into neutral losses.

# Example parent mass and fragment masses
# Note: all masses should be rounded to two digits
parent_mass = 456.75
fragments = np.array([151.21, 18.83, 25.80, 441.75, parent_mass])

# Borrow columns from other dataset
cols = data_nl.columns[6:]
neutral_loss = parent_mass - fragments

# initiate vector with zeroes
nl_vec = np.zeros(len(cols))

parent_mass_indice = parent_mass * 100
neutral_loss_indices = neutral_loss * 100

# everything higher than isomass, should be set to -1
nl_vec[int(parent_mass_indice):] = -1

# where there is a fragment, place 1.
nl_vec[neutral_loss_indices.astype(int)] = 1

# Add some more info (MONOISOMASS is required)
df_test['NAME'] = 'example_molecule'
df_test['RI'] = 42
df_test['MONOISOMASS'] = parent_mass

# Write to DataFrama
df_test = pd.DataFrame(nl_vec, index=cols).T

Citing

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00699-8