TY - JOUR
T1 - Prediction of Secondary Structure Content of Proteins Using Raman Spectroscopy and Self-Organizing Maps
AU - Corujo, Marco Pinto
AU - Michal, Pavel
AU - Ang, Dale
AU - Vivian, Lindo
AU - Chmel, Nikola
AU - Rodger, Alison
PY - 2025/4
Y1 - 2025/4
N2 - Proteins are biomolecules with characteristic three-dimensional (3D) arrangements that render them different vital functions. In the last 20 years, there has been a growing interest in biopharmaceutical proteins, especially antibodies, due to their therapeutic application. The functionality of a protein depends on the preservation of its native form, which under certain stressing conditions can undergo changes at different structural levels that cause them to lose their activity. 1 Although mass spectrometry is a powerful technique for primary structure determination, it often fails to give information at higher order levels. Like infrared (IR), Raman spectra are well known to contain bands (especially the amide I from 1625-1725cm-1) that correlate with secondary structure (SS) content. However, unlike circular dichroism (CD), the most well-established technique for SS analysis, Raman spectroscopy allows a much wider ranges of optical density, making possible the analysis of highly concentrated samples with no prior dilution. Moreover, water is a weak scatterer below 3000 cm-1, which confers Raman an advantage over IR for the analysis of complex aqueous pharmaceutical samples as the signal from water dominates the amide I region. The most traditional procedure to extract information on SS content is band-fitting. However, in most cases, we found the method to be ambiguous, limited by spectral noise and subjected to the judgment of the analyzer. Self-organizing maps (SOM) is a type of self-learning algorithm that organizes data in a two-dimensional (2D) space based on spectral similarity and class with no bias from the analyzer and very little effect from noise. In this work, a set of protein spectra with known SS content were collected in both solid and aqueous state with back-scatter Raman spectroscopy and used to train a SOM algorithm for SS prediction. The results were compared with those by partial least squares (PLS) regression, band-fitting, and X-ray data in the literature. The prediction errors observed by SOM were comparable to those by PLS and far from those obtained by band-fitting, proving Raman-SOM as viable alternative to the aforementioned methods.
AB - Proteins are biomolecules with characteristic three-dimensional (3D) arrangements that render them different vital functions. In the last 20 years, there has been a growing interest in biopharmaceutical proteins, especially antibodies, due to their therapeutic application. The functionality of a protein depends on the preservation of its native form, which under certain stressing conditions can undergo changes at different structural levels that cause them to lose their activity. 1 Although mass spectrometry is a powerful technique for primary structure determination, it often fails to give information at higher order levels. Like infrared (IR), Raman spectra are well known to contain bands (especially the amide I from 1625-1725cm-1) that correlate with secondary structure (SS) content. However, unlike circular dichroism (CD), the most well-established technique for SS analysis, Raman spectroscopy allows a much wider ranges of optical density, making possible the analysis of highly concentrated samples with no prior dilution. Moreover, water is a weak scatterer below 3000 cm-1, which confers Raman an advantage over IR for the analysis of complex aqueous pharmaceutical samples as the signal from water dominates the amide I region. The most traditional procedure to extract information on SS content is band-fitting. However, in most cases, we found the method to be ambiguous, limited by spectral noise and subjected to the judgment of the analyzer. Self-organizing maps (SOM) is a type of self-learning algorithm that organizes data in a two-dimensional (2D) space based on spectral similarity and class with no bias from the analyzer and very little effect from noise. In this work, a set of protein spectra with known SS content were collected in both solid and aqueous state with back-scatter Raman spectroscopy and used to train a SOM algorithm for SS prediction. The results were compared with those by partial least squares (PLS) regression, band-fitting, and X-ray data in the literature. The prediction errors observed by SOM were comparable to those by PLS and far from those obtained by band-fitting, proving Raman-SOM as viable alternative to the aforementioned methods.
KW - Raman spectroscopy
KW - Secondary structure of proteins
KW - Circular dichroism spectroscopy
KW - Infrared spectroscopy
KW - Self-organizing maps
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=anu_research_portal_plus2&SrcAuth=WosAPI&KeyUT=WOS:001469386200001&DestLinkType=FullRecord&DestApp=WOS_CPL
U2 - 10.1177/00037028251335051
DO - 10.1177/00037028251335051
M3 - Article
C2 - 40241566
SN - 0003-7028
JO - Applied Spectroscopy
JF - Applied Spectroscopy
ER -