Hydrocarbons are the principal components of fuels and are analyzed using gas or liquid chromatography‑mass spectrometry (GC-MS, LC-MS)1. Though these techniques are effectively used for separation, detection and identification of impurities in intricate mixtures, applying them to long-chain hydrocarbons is not straight‑forward:
Characterizing impurities present in a sample is challenging in the absence of a reference mass spectral library essential for identifying each component present in a mixture of long‑chain hydrocarbons
This impedes the prediction of relevant properties for these mixtures. Multiple experiments are required for analysis, leading to a cumbersome process
To address these challenges, a client sought the assistance of our scientists to develop an AI-based approach to identify the various components present in a mixture of hydrocarbons, overcoming the limitations associated with traditional analytical methods.
Syngene scientists used a combination of cheminformatics and deep learning to:
Predict the mass spectrum of long-chain hydrocarbons
Create a reference library for mass spectra analysis
Predict possible impurities or contaminants generated during the synthesis of long chain hydrocarbons using the mass spectra data of the sample
Predict physical properties of the compounds, such as temperature-dependent viscosity and density
Traditional AI/ML (Machine Learning) models to predict properties are ineffective for impurities generated during the synthesis of long chain hydrocarbons that differ from the parent product by a few carbons or side chain positions. The fingerprint-based representation of compounds fails to decode the minor differences in the chemical structure.
Syngene used graph embedding model, accompanied with experimental conditions like ionization energy and precursor type to train a deep learning model that is effective in predicting spectra with reasonable accuracy (see the figure). The model was used to create a reference mass spectra library for long chain hydrocarbons which can be used to deconvolute2 sample mixtures. This was combined with models that can predict physical properties including density and viscosity over a wide temperature range to characterize the sample.
1 Mass spectrometer is used to detect unknown compounds through molecular weight determination, to measure known compounds and to determine structure and chemical properties of molecules.
The developed method reliably predicts the mixture of long-chain hydrocarbons present in a sample, as well as their physical properties. This helps scientists to identify the compounds present in the sample and the associated physical properties and assists in decisions on the usage of these compounds in various product formulations. The reference library of spectra is helping analytical scientists make mass spectrometry more effective for long-chain hydrocarbons. This work was enabled by a customized Syn.AITM workflow.
Figure: Detection of impurities and properties of hydrocarbon mixtures using Artificial Intelligence. From left to right: Chemical structure, Graph embedding model, Neural network, predicted mass spectra
2 Deconvolution in mass spectrometry is the process of computationally separating co-eluting components and creating a pure spectrum for each component.