Abstract
Keywords
Introduction
Changes that occur in a sample over time or in response to some experimental treatment, a perturbation, are common targets of spectral analysis. These changes are often probed using Raman spectroscopy, an information-rich optical technique especially suitable for use with biological samples.1–7 These include that the macromolecular compositional changes occurring in human embryonic stem cells undergoing differentiation exhibit an increase in protein content relative to that of nucleic acids, 8 in Chinese hamster ovary cells, lipid content increases in late stages of apoptosis, 9 in red blood cell concentrates stored in transfusion bags, oxyhemoglobin and lactate levels increase with storage time while glucose levels decrease,10–12 and the hydrolysis of peptide bonds is observed at high temperature in freeze-dried tissue. 13
A well-established technique to examine the synchronous (in-phase) and asynchronous (out-of-phase) changes observed between different peaks in Raman spectra that occur in the course of a perturbation is two-dimensional correlation spectroscopy (2D-COS). 14 2D-COS is based on the analysis and two-dimensional representation of covariances and correlation coefficients that reveal correlated synchronous changes between peaks while the application of the Hilbert–Noda transform permits an examination of asynchronous changes.14,15 2D-COS is used to understand the relationship between peaks, 12 to identify the types of macromolecule that contribute to peaks,16,17 to assess the correspondence between Raman peak changes and bioanalytical measurements,12,18 to gain insight into structural changes in molecules, 19 and to analyze kinetic processes. 20 However, synchronous or asynchronous changes, even though highly correlated, may not show the same qualitative profile or trend across the perturbation.
Due to their highly complex composition, the interpretation of Raman spectra from cells can be particularly challenging, a situation mitigated by the fact that many peaks repeatedly represent each of the few major macromolecules (lipids, proteins, nucleic acids, and carbohydrates21,22) that make up more than 85% of the dry weight of cells. 23 The peaks pertaining to each type of macromolecule can, in general, be expected to vary together. In particular, peaks that vary together should exhibit highly similar profiles and clustering techniques can be used to group them. Though complex changes should generally be expected, focusing on the limited number of major macromolecules provides a starting point from which subtler changes can be unraveled and, in conjunction with 2D-COS, the interpretation of Raman spectra can be improved.
We use synthetic “spectra”, each with eight peaks and simple profiles as described further below, to explain the principles and application of the method. We then apply the method to experimentally obtained Raman spectra. First, we use a relatively uncomplicated model system of polystyrene beads submerged in a NaClO4 solution to assess and demonstrate its utility for the analysis of spectroscopic data. 31 By scanning from the center across the edge of a polystyrene bead cluster, the polystyrene peak intensities would decrease while those of the perchlorate ion would increase, thus producing contrary profiles. We then extend the application to the more challenging case of spectra obtained from mammalian cells fixed with different percentages of methanol. Because methanol is a fixative used for cells that coagulates proteins and also dissolves lipids from cell membranes, 32 varying the percentage of methanol used for cell fixation would induce different profiles in the macromolecules of cells.
Methods
(a) Surface plot of eight synthetic peaks (P1 to P8) that change in a correlated, anticorrelated, or independent manner. (b) Profiles of individual peaks.
(a) Image of polystyrene perchlorate model system showing the line scan performed from left to right. (b) Raman spectra from the model system, spectra were not normalized. (c) Constant sum normalized Raman spectra from Jurkat cells fixed with different percentages of methanol.
Four groups of approximately 2 × 106 cells were collected and centrifuged for fixation with different volumes (12.5, 25, 37.5, or 50 μL) of methanol where smaller volumes of methanol were augmented with water to 50 μL, to perform fixation in 25, 50, 75, or 100% methanol. The supernatant was removed, and the cells were washed once with saline. The washed cell pellets were then resuspended in one of the given amounts of methanol and incubated at −20 °C for 20 min. The cell/methanol suspension was then pipetted onto 12.5 mm diameter glass-encapsulated gold mirrors (ThorLabs, US) and allowed to air-dry in a biosafety cabinet with no further manipulation. The fixed cell samples were then stored at 4 °C until Raman spectra were collected.
Results and Discussion
2D-COS and 2D-CMS performed on synthetic data. (a) The 2D-COV and (b) 2D-COR for the synthetic data. (c) Using 
Performing six-cluster
The sparsity of the 2D-CMS that is focused on identifying clusters of synchronously changing peaks contrasts sharply with the synchronous 2D-COS maps. The 2D-COV in Fig. 3a and the 2D-COR in Fig. 3b include numerous auto- and cross-peaks with various degrees of mutual covariance and correlation between peaks. An important advantage of the 2D-CMS is that constant profiles, as shown by the Cluster 3 profile in Fig. 3c, are clustered and represented in a 2D-CMS, for example, green squares in Fig. 3d. They are not present in the corresponding 2D-COR as shown by the green circles because a constant profile does not have a defined standard deviation and the correlation coefficients between constant profiles cannot be determined. Indeed, the correlation coefficient between any profile and a constant profile cannot be determined as evidenced by the empty rows and columns along channels 900 and 1500 in Fig. 3b. Being able to include constant profiles in the segmentation or grouping of peaks, provided that baseline profiles are treated separately, aids in analysis by permitting unchanging peaks to be distinguished from others. However, including constant profiles actually reduces the 2D-CMS matrix sparsity.
Another difference underlying the sparsity effect is that profiles may be relatively highly correlated yet belong to different clusters when the differing characteristic features of their profiles can be more readily captured by a clustering procedure (that correlation coefficients are not designed to do). Thus, Clusters 3 and 5 (green and yellow, respectively) each have two autopeaks (i.e., there are two green peaks and two yellow ones on the diagonal) signifying two members in each cluster. On the other hand, though strong correlation coefficients are observed for the profiles at channels 300 and 700 (highlighted by the magenta circles in Fig. 3b), they belong to different clusters, those being the Fig. 3c Clusters 6 (magenta) and 2 (red). Consequently, there are no corresponding “cross-clusters” (i.e., indicating joint cluster membership) for these profiles in Fig. 3d (magenta circles). Relatedly, unlike 2D-COS and principal components in PCA, clusters cannot contain anticorrelated elements and there are no negative clusters. These effects contribute to the general sparsity observed in 2D-CMS.
The asynchronous results of performing a six-cluster
2D-COS and 2D-CMS performed on a polystyrene perchlorate model system. (a) The 2D-COV and (b) 2D-COR maps for the model data. The major autopeaks for polystyrene and perchlorate are shown with arrows and a cross-peak between polystyrene and perchlorate is indicated by a blue or cyan arrow. The 
For the 2D-CMS processing, peaks below the LOD threshold were set to zero, thus very small peaks and baseline levels were constant. Clustering using three clusters produced the profile centroids in Fig. 4c. The sinusoidal changes in the centroid profiles for polystyrene and perchlorate are due to the spherical nature of the beads and how they were packed together. The 2D-CMS map in Fig. 4d distinctly shows peaks pertaining to perchlorate (green rectangles), polystyrene (black rectangles) and baseline levels with small peaks (red rectangles). In contrast to 2D-COS, there are no cross-clusters between any of the above three groups. Where cross-clusters do occur, they occur between Raman bands pertaining to the same group (e.g., only between polystyrene peaks). In 2D-COS, negative cross-peaks reveal that peaks change in an opposite manner, but in 2D-CMS such information must be determined from an inspection of the centroids.
We next performed 2D-COS and 2D-CMS on a more complex data set. This set contained spectra from fixed Jurkat cells. The results of an instructive subset of the entire measured spectral window, the 600 cm−1 to 900 cm−1 spectral region, is presented in Fig. 5 and the complementary section is shown in Fig. S2 (Supplemental Material). The 2D-COV and 2D-COR for this data set are presented in Figs. 5a and 5b, respectively. It is easier to identify related weak and strong peaks using the 2D-COR than the 2D-COV because the profiles are normalized to their standard deviations. However, this can make the 2D-COR map more complex. Thus, we discuss, only for orientation, the 2D-COV autopeaks and cross-peaks by proceeding along the 2D-COS and 2D-CMS performed on Jurkat cell Raman spectra showing the 600–900 cm−1 spectral region; the complementary region is shown in Fig. S2 (Supplemental Material). (a) The 2D-COV and (b) 2D-COR for the spectra. The 
Peak 7 is a composite nucleic acid peak and its autopeak is prominent on the diagonal where the left arrow would intersect with the Peak 7 arrow. Peak 7 covaries positively (warm colors) with the protein Peaks 1 and 2 at 621 cm−1 and 643 cm−1, respectively, the 668 cm−1 nucleic acid Peak 3, the Peak 5 adenine part of the composite 720 cm−1 peak, and the tryptophan protein Peak 6 at 757 cm−1. Note that, without consulting band assignments, it is not possible to tell from 2D-COV which autopeaks and cross-peaks belong to the same macromolecules. Because methanol leaches lipids from cells but coagulates proteins 32 and compacts nucleic acid conformations, 51 cross-peaks related to lipids will covary inversely with those of proteins and nucleic acids. This is shown for the 714 cm−1 Peak 4 phosphatidylcholine part of the composite 720 cm−1 peak. Consequently, only contrary variations afford here some discrimination between macromolecules.
The 2D-CMS results for the cell spectra are presented in Figs. 5c and 5d. The PDD profiles consisted of four groups of spectra fixed with 25, 50, 75, or 100% methanol. They were (a) The cluster-segmented mean spectrum of the Figs. 3c and 3d clustered and identically color-coded synthetic data shows which two peaks belong to one of two clusters (yellow, green) and which peaks belong to each of the remaining clusters. (b) The cluster-segmented mean spectrum of the Figs. 4c and 4d clustered and identically color-coded model Raman spectra shows peaks and peak segments of which the wavenumbers belong to perchlorate and polystyrene. The spectrum of perchlorate superimposed in blue on the green cluster-segmented spectrum of perchlorate demonstrates that clustering succeeded in identifying all the perchlorate peaks. (c) The cluster-segmented mean spectrum of the Figs. 5c and 5d clustered and identically color-coded mammalian cell Raman spectra shows peaks and peak segments of which the wavenumbers belong to the same cluster. The inset detail of the 600 to 900 cm−1 spectral region shows peaks belonging to the proteins (black), nucleic acids (red), and lipids (green) clusters. More details are discussed in the main text.
High correlations are also evident between proteins and nucleic acids, for example, between the 782 cm−1 composite nucleic acid peak and the 854 cm−1 composite protein peak indicated by the reddish square within the upper dotted circle in Fig. 5b. However, no corresponding cross-cluster is observed in Fig. 5d (upper dotted circle) because the clusters are exclusive: a profile cannot belong to two clusters simultaneously and hence a nucleic acid–protein cross-cluster cannot exist. However, contrary to 2D-COR, opposing or anticorrelated changes between peaks cannot be directly determined from 2D-CMS because they will simply be assigned to different clusters. Instead, opposing changes must be inferred from the profiles of their centroids in Fig. 5c. The anticorrelated change between declining lipid intensities (due to methanol-provoked lipid leaching) 32 and increasing nucleic acid intensities (possibly from nucleic acid precipitation) 51 can be determined for the 717 cm−1 phosphatidylcholine and 782 cm−1 composite nucleic acid 2D-COR cross-peak (solid circle in Fig. 5b) but not for 2D-CMS (solid circle in Fig. 5d) as there is no cross-cluster.
Overall, as for the 2D-CMS of synthetic data in Fig. 3d and model system data in Fig. 4d, the discrete nature of the 2D-CMS of measured Raman spectra permits a differential assessment of relationships between different Raman peaks that makes it complementary to 2D-COS while displaying greater sparsity than the corresponding 2D-COV and 2D-COR. The more complex example, indicated by the two bottom circles above 782 cm−1 and 827 cm−1 on the
The cluster-segmented spectrum in Fig. 6a represents the same data (clusters and their color coding) as in Figs. 3c and 3d. Presenting the clustered information this way makes it easy to see which peaks belong to each profile cluster. 18 Thus, peaks at channels 100 and 1100 belong to the same cluster (yellow) as can be seen from the 100 × 1100 and 1100 × 100 cross-clusters in Fig. 3d because they have the same cluster profile that is shown in Fig. 3c. A similar argument applies to the green peaks at channels 900 and 1500 while the other color-coded peaks each belongs to separate clusters with different Fig. 3c profiles.
The model system contained only polystyrene and perchlorate and their peak intensities changed in opposite ways, thus their peaks belong to different clusters. Furthermore, except for two peaks near 600 cm−1, their peaks do not overlap. Consequently, their cluster-segmented spectra in Fig. 6b almost match the complete spectra of polystyrene and perchlorate. This is illustrated by imposing a scaled perchlorate spectrum on the cluster-segmented spectrum of the perchlorate ion.
In Fig. 6c we show with the same cluster colors the cluster-segmented mean spectrum of the Figs. 5c and 5d clustered profiles. All the wavenumbers belonging to the same cluster, hence the most qualitatively similar Fig. 5c profiles, are shown by the color of their cluster. The inset shows an expansion of the 600 to 900 cm−1 fingerprint spectral region that contains peaks from lipids (e.g., phosphatidylcholine at 717 cm−1), nucleic acids (e.g., composite peak at 782 and adenine at 725 cm−1), and proteins (e.g., composite peaks at 854, 827, 643, and 621 cm−1); thus, we assigned the green cluster to lipid peaks, the red cluster to nucleic acid peaks and the black cluster to protein peaks (see also Fig. 2c). Extending these assignments to the remainder of the spectrum showed fairly consistent labeling of macromolecular peaks. Other lipid peaks were labeled around 536 cm−1 (cholesterol ester), 1299 cm−1 (CH2 deformation in lipids), and 1446 cm−1 (various CH2 modes in lipids). Nucleic acid peaks were also labeled around 1099 cm−1 (phosphodioxy modes in nucleic acids) and 1573 cm−1 (ring breathing modes in nucleic acid bases). Additional protein peaks were identified around 758 and 887 cm−1 (tryptophan), 1003 and 1031 cm−1 (phenylalanine), and 1233 cm−1 (protein amide III modes).
Not all peaks were labeled in a clear manner. The Raman peak ∼720 cm−1 in cell and tissue spectra is a composite band due to the fusing of overlapping peaks from phosphatidylcholine (717 cm−1) and adenine (725 cm−1) and they were correctly clustered as shown by the partitioning of the peak into green (lipid) and red (nucleic acid) segments. Similar peak segmentations occur for other peaks consisting of overlapping bands from different macromolecules and these might be complicated. For example, both protein and nucleic acid bands occur near 667 cm−1 and the cluster-segmented spectrum for this region shows a central black protein peak flanked by two red nucleic acid segments. It is unclear whether the central black peak with red flanking segments should be interpreted as being due to a protein peak with distinct nucleic acid moieties on either side or whether a more intense and narrow protein peak is superimposed on a weaker but broader composite nucleic acid peak. Thus, a distinction must be made on the basis of prior information. Related to this, the reddish 2D-COR square in the lowest dashed circle in Fig. 5b seems to be partitioned into the corresponding two red cluster (nucleic acid) segments in the lowest dashed circle in Fig. 5d. The central part between these red cluster segments is missing but present as a black square in the adjacent dotted circle. Though the precise interpretation of the fragmented 667 cm−1 peak is unclear, this example demonstrates an important difference. It shows how the Fig. 5d 2D-CMS lowest circles provide a complementary interpretation of the same ones in the Fig. 5b 2D-COR by virtue of identifying peaks that are related due to having highly similar profiles as opposed to peaks that are related by virtue of being highly correlated.
Applying 2D-CMS might identify clusters with a large number of peaks. These peaks provide robust and unambiguous perturbation responses and so might be particularly useful in further analysis and interpretation. Clusters with few recognizable peaks might be of little use due to a somewhat random grouping of profiles degraded by overlapping neighbors, low abundance, or other effects. Thus, we have not been able to identify a carbohydrate or “other” cluster above.
Second, individual lipids or proteins or nucleic acids could change independently in response to a perturbation. Though the exact cause is yet uncertain, this is hinted at by the ∼495 cm−1 DNA peak that was assigned to a different cluster (blue) from the cluster (red) to which the other nucleic acid peaks in Fig. 5d and Fig. S2d (Supplemental Material) were assigned. This creates both problems and opportunities, as mentioned above. Third, overlapping peaks that change in unrelated ways might sufficiently distort the profiles of associated wavenumbers to cause misclassification. It is possible that the 495 cm−1 DNA peak discussed above was assigned to cluster 4 due to the being affected by changes in the partly overlapping glycogen band around 485 cm−1. 53 Like many issues that arise in the case of overlapping peaks, this problem might not be tractable without enhancing the resolution of spectra. 54 A final difficulty relates to the selection and back addition of constant profiles in a manner that effectively segregates them from the constant profiles of baseline wavenumbers. Though the smoothing of spectra can suppress noise, 55 residual noise will be present in such profiles, and this will be accentuated through division by their standard deviations. One possibility might be to use the square root relationship between Poisson noise and signal intensity. 55 For example, to be considered a constant profile, the mean profile intensity has to exceed the LOD, its noise distribution has to be approximately Gaussian and the noise level (i.e., the profile standard deviation) must be less than the square root of the mean peak intensity.
Conclusion
The interpretation of two-dimensional correlation spectra from many types of biomedical, pharmaceutical, or microbiological samples is often not a straightforward task due to their high complexity.
56
With 2D-CMS, we used
Supplemental Material
sj-pdf-1-asp-10.1177_00037028221133851 - Supplemental material for Two-Dimensional Clustering of Spectral Changes for the Interpretation of Raman Hyperspectra
Supplemental Material, sj-pdf-1-asp-10.1177_00037028221133851 for Two-Dimensional Clustering of Spectral Changes for the Interpretation of Raman Hyperspectra by H G. Schulze, Shreyas Rangan, Martha Z. Vardaki, Michael W. Blades, Robin F. B. Turner, and James M. Piret in Applied Spectroscopy
Footnotes
Acknowledgments
Declaration of Conflicting Interests
Funding
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

