Abstract
Keywords
Introduction
The widespread and often indiscriminate use of synthetic pesticides in modern agriculture poses significant public health risks due to residue contamination.1–3 Organochlorine pesticides (OCPs), in particular, have been linked to a range of health issues, including reproductive, immune, and neurological damage, as well as an increased risk of cancer.4,5 Although many developed countries have banned OCPs, they remain in use in many developing countries.6,7 This ongoing use highlights an urgent need for methods capable of rapidly screening fresh produce and environmental samples to ensure safety and regulatory compliance, especially in areas where these pesticides are still applied. Conventional methods for pesticide residue analysis, such as gas chromatography–mass spectrometry and high-performance liquid chromatography, are highly sensitive and accurate but are also labor-intensive, time-consuming, and require complex sample preparation.8,9 In contrast, vibrational spectroscopy techniques such as near-infrared (NIR) 10 and Raman spectroscopy 11 have emerged as promising alternatives for pesticide residue analysis.12–15 These techniques offer several advantages, including rapid analysis, portability, low operational costs, and ability to conduct direct, non-destructive testing on samples, making them particularly suitable for targeted screening of large volumes of fresh produce.
Identifying unique vibrational spectral markers (also referred to as fingerprints) that indicate specific pesticide residues is often complicated by matrix effects, spectral overlap, and noise. To address this, multivariate data analysis techniques such as partial least squares discriminant analysis,16,17 multivariate curve resolution,18,19 and analysis of variance 20 have been applied. Although these methods assist in resolving vibrational spectra, overlapping bands and noise continue to present interpretability challenges. Another approach involves modeling theoretical vibrational spectra using density functional theory (DFT) calculations, which have been widely used to simulate the Raman spectra of pesticides.21–25 However, discrepancies between theoretical and experimental spectra, often caused by variations in molecular conformation and interactions, limit the broader applicability of DFT in fingerprint identification. Additionally, the high computational cost of DFT, coupled with the need for costly analytical-grade standards, further restricts its widespread application.
Two-dimensional correlation spectroscopy (2D-COS) is a powerful technique used to analyze system responses to external perturbations.26–32 It has been effectively applied across various spectroscopic methods, including infrared (IR), NIR, Raman, nuclear magnetic resonance, fluorescence, and ultraviolet–visible, providing enhanced resolution of overlapping spectral features.29,33 This capability makes 2D-COS particularly useful for detecting subtle spectral changes and identifying correlations that are often obscured in conventional Raman spectra. In this study, we explore, for the first time, the potential of 2D-COS to identify unique Raman fingerprints indicative of pesticide residues, specifically chlorothalonil (an OCP), across various vegetable matrices, including kale, pigweed, tomato, and bell pepper. We then apply principal component analysis (PCA) and support vector machines (SVMs) to develop classification models for targeted detection. Previous studies using Raman spectroscopy for chlorothalonil detection have relied on surface-enhanced Raman spectroscopy (SERS), focusing on a single Raman fingerprint and one sample matrix.34,35 Although SERS provides high sensitivity, it is relatively costly compared to conventional Raman combined with 2D-COS due to the high cost of nanoparticle synthesis and sample preparation, making it less feasible for routine, field-based applications, especially in resource-limited settings.
Experimental
Materials and Methods
Sample Preparation, Data Acquisition, and Pre-Processing
Two standard solutions of chlorothalonil pesticide were prepared at different concentrations: (i) a control solution containing only deionized water and (ii) a spiked solution with a concentration of 2 g/L of chlorothalonil, prepared by diluting a commercially available formulation (480 g/L chlorothalonil) with deionized water as recommended by the label instructions. Fresh, organically grown bell peppers (
Raman spectra were collected using a 785 nm excitation source (EZ Raman Spectrometer, Enwave Optronics). The spectrometer was calibrated using a polystyrene standard. To ensure reproducibility and account for matrix inhomogeneity, 20 spectra were collected from five different locations on each vegetable sample. The laser spot size was calculated to be 4.35 µm based on the 100 µm fiber core diameter and a numerical aperture of 0.22, providing a balance between spatial resolution and signal intensity. Optimal acquisition parameters, i.e., 200 mW laser power and a 5 s exposure time, were selected for maximizing signal intensity while minimizing noise. The fresh vegetable samples showed no visible signs of damage during measurements. Spectra were limited to the 300−2500 cm–1 range and the vector normalized between 0 and 1 to standardize intensity values for comparison across samples.
Two-Dimensional Correlation Spectroscopy (2D-COS) Analysis and Machine Learning Model Development
For 2D-COS, a mean spectrum for each class of the Raman spectra served as a reference for the Fourier transform correlations. As a result, four distinct correlation spectra were obtained from the four different vegetable matrices. The 2D-COS Raman spectra were generated by plotting 40 equally spaced contour levels with intensity values ranging from 8 to 19, resulting in a clean contour plot of the correlation spectrum. This approach highlights the most significant changes while excluding very low or very high intensities that may not be of interest or might clutter the plot. Following this, PCA 36 was conducted on the identified fingerprint bands (regions of high perturbations) from the 2D-COS analysis to reduce dimensionality and visualize the data. Subsequently, the principal components accounting for the most significant cumulative variance were extracted (two for each case) and utilized to develop a linear SVM 37 model to classify the control and pesticide-spiked samples. All analyses were carried out in open source R v.3.6.3 using Corr2D for the 2D-COS analysis, 33 ChemoSpec for PCA, 38 and e1071 for SVM. 39 Additionally, we compare the performance of 2D-COS analysis with difference spectroscopy whose approach involves subtracting spectra to remove common features thereby highlighting subtle technique that highlights subtle changes. 40
Results and Discussion
Raman Spectral Analysis of Vegetable Samples Highlighting Bio-Molecular and Chlorothalonil-Specific Vibrational Peaks
Figure 1 presents the representative average Raman scattering spectra of four vegetable samples (kale, pigweed, tomato, and bell pepper). Despite being a weak Raman scatterer, the vegetable matrix predominantly absorbs light at low wavenumber regions (below 1500 cm–1), resulting in a high concentration of peaks in that range. The spectra of the control samples unveiled characteristic peaks corresponding to the bio-molecules commonly found in plant cell walls. Specific bands around 746, 914, 1047, 1268, 1326, 1385, and 1438 cm–1 indicated the presence of pectin, cellulose, phenylpropanoids, and aliphatic chains. In addition, peaks at 1000, 1153, 1186, 1213, and 1523 cm–1 suggested the presence of carotenoids, the colored pigments in the samples. The peak at 1690 cm–1 corresponded to the carbonyl group (C=O) stretching vibration, likely from carboxylic acids. The spectra of the spiked samples exhibited all the peaks observed in the control samples, in addition to extra bands characteristic of the spiking agent, chlorothalonil (C8N2Cl4). The presence of peaks at 326 and 384 cm–1 confirmed the presence of C−Cl stretching. The 464 and 1252 cm–1 bands indicated C–C stretching, potentially from the aromatic ring structure within the pesticide. Although the C–C stretching vibration of the carotenoids overlapped with the C–C stretching of the pesticide at 1523 cm–1, the presence of a distinct peak at 2238 cm–1 corresponds to the C≡N stretching vibration definitively identified as chlorothalonil in the sample.. A summary of the vibrational bands and their assignments for the vegetable plants is listed in Table I.

Representative average Raman spectra of the spiked and controlled samples (kale, pigweed, tomato, and bell pepper).
Main Raman vibrational bands identified and their corresponding assignments.
Identifying Chlorothalonil Fingerprints and Spectral Variations in Raman Spectra
Difference Spectroscopy Results
Figure 2 presents the difference spectroscopy results, highlighting spectral variations between spiked and control samples of tomato, bell pepper, pigweed, and kale. Significant changes are observed at wave–numbers 342, 394, 445, 470, 531, 584, 611, 722, 750, 913, 1041, 1053, 1146, 1168, 1194, 1270, 1333, 1371, 1516, 1534, 1547, and 2238 cm–1 corresponding to vibrational modes of chlorothalonil as well as natural components of the vegetable matrices. Despite these observed differences, distinguishing chlorothalonil-specific peaks remains challenging due to overlap with the vegetable matrix and noise amplification. For instance, the C–Cl stretch at 394 cm–1 and the C≡N stretch at 2238 cm–1 serve as clear markers of chlorothalonil, yet peaks at 722, 913, and 1270 cm–1 overlap with vibrational modes from carotenoids, cellulose, and phenylpropanoids. This overlap, particularly within the 500–1500 cm–1 range, complicates the clear identification of pesticide signals amid the vegetable components. Furthermore, noise amplification during the subtraction process introduces baseline distortions, especially in the 500–1700 cm–1 region, further hindering the clarity of the spectral data.

Difference Raman spectra of four vegetable samples (tomatoes, bell pepper, pigweed, and kale). The spectra represent the difference between spiked and control samples, highlighting variations across the spectral range of 300–2500 cm–1.
Enhanced Resolution of Chlorothalonil Fingerprints Using 2D-COS
To overcome the limitations of difference spectroscopy, 2D-COS was applied to enhance spectral resolution and resolve overlapping bands in the Raman spectra. The synchronous 2D-COS results, shown in Figures 3a–3d, reveal significant autopeaks at 385 cm–1 (C–Cl stretching), 1263 cm–1 (C–C stretching), 1548 cm–1 (C–C stretching), and 2238 cm–1 (C≡N stretching). These well-resolved peaks across all tested vegetable samples (tomato, bell pepper, pigweed, and kale) provide strong evidence of chlorothalonil contamination. Positive cross-peak correlations indicate molecular interactions between the pesticide and the vegetable matrix, further validating these assignments. The C≡N stretching peak at 2238 cm–1, a recognized marker for chlorothalonil, stands out as a key indicator of its presence. Additionally, vibrational modes such as the C–Cl stretch at 385 cm–1 and the C–C stretch at 1263 cm–1 are clearly distinguished and highly identifiable. The key fingerprint regions identified for chlorothalonil detection through 2D-COS are summarized in Table II.

Synchronous 2D Raman correlation spectra comparing spiked and control vegetable samples across the wavenumber range of 300 to 2500 cm–1. The spectra are presented for four different vegetables: (a) tomato, (b) kale, (c) bell pepper, and (d) pigweed. The reference Raman spectra for the spiked and control samples are displayed along the top and left margins, respectively. Positive correlations are indicated in red, and the dashed blue diagonal line highlights the autopeaks common to chlorothalonil at 385, 1263, 1548, and 2238 cm–1. The 2D Raman correlation spectra were plotted using only positive correlations, focusing on high-intensity data to emphasize the most significant features while excluding low or extreme intensities that could clutter the visualization.
Identified fingerprint regions of chlorothalonil from 2D-COS with the highest intensity.
The Raman spectrum of pure commercial-grade chlorothalonil, shown in Figure 4, validates the fingerprints identified by 2D-COS. Major peaks at 387, 1261, 1546, and 2239 cm–1 confirm chlorothalonil-related vibrations, verifying these regions as reliable markers for pesticide detection in vegetable samples.

Raman spectra of the pure spiking solution of commercial-grade chlorothalonil, show major peaks at 387, 1261, 1546, and 2239 cm–1. These peaks validate that the elucidated fingerprints in 2D-COS are attributable to the pesticide, corroborating the regions of interest identified in the vegetable samples.
Principal Component Analysis (PCA)-SVM Results for Classifying Spiked and Control Samples
Principal Component Analysis (PCA)-SVM Results for the Full Spectral Range
To establish a baseline for comparison, PCA was first applied to the full Raman spectrum (300–2500 cm–1) to assess whether analyzing the entire spectral range could effectively distinguish between spiked and control samples. The results, however, indicated a significant overlap between the spiked and control samples. The PCA score plot (Figure 5a) showed that the first two principal components, PC1 and PC2, explained 43% and 34% of the variance, respectively, accounting for a total variance of 77%. This relatively low explained variance, combined with the dispersed distribution of points, suggested that the full spectrum contains too much irrelevant or overlapping information to allow clear differentiation between the spiked and control groups. The complexity of the vegetable matrix and the overlapping spectral features from both chlorothalonil and the natural vegetable components contributed to this issue. When the full-spectrum PCA results (two principal components that explained the highest cumulative variance) were fed into the SVM model, the classification accuracy was just 71.5%, with a kappa (κ) statistic of 0.43, indicating a high chance of misclassification (Figure 5b). This outcome reflects the limitations of using the entire spectral range, as the noise and spectral overlap reduced the model's ability to reliably distinguish between spiked and control samples.

(a) Principal component analysis (PCA) of the whole spectra region (300−2500 cm–1) shows no separation of the spiked and control samples. (b) Scatter plots showing the linear PCA-SVM classification results for the whole region of the spectra (300−2500 cm–1). The model had a poor classification accuracy of 71.5% and a κ statistic of 0.43. The gray areas indicate the regions where the SVM model predicts the samples to be spiked, while the white areas indicate the regions where the model predicts the control samples.
Principal Component Analysis (PCA)-SVM Results Using Identified Fingerprints
Given the poor performance of the full spectral range analysis, we focused the PCA on specific fingerprint regions known to be relevant to chlorothalonil, i.e., the regions 354−414 cm–1 (C–Cl stretching), 1260−1286 cm–1 (C–C stretching), 1540−1570 cm–1 (C–C stretching), and 2250−2265 cm–1 (C≡N stretching) correspond to vibrational modes specific to the pesticide and were expected to improve classification accuracy. The results confirmed this expectation, with PCA applied to these regions yielding distinct clusters for spiked and control samples. The cumulative variance explained by the first two principal components was significantly higher for each targeted region: 88%, 96.7%, 92%, and 94.8%, respectively (Figures 6a–6d). The clear separation observed in PCA plots, regardless of vegetable matrix, underscored the ability of these regions to distinctly differentiate between spiked and control samples.

Score plots of the PCA for the regions from 2D-COS. (a) 354–414 cm–1, (b) 1260–1286 cm–1, (c) 1540–1570 cm–1, and (d) 2250–2265 cm–1. The plots for all four chlorothalonil fingerprints unequivocally demonstrate the ability to discriminate between control and spiked samples along PC1. The method's effectiveness is underscored by the clear separation of all the samples into two distinct clusters, control (green) and spiked (red), irrespective of the different vegetable matrices.
Furthermore, applying the first two principal components from these fingerprint regions to the SVM model resulted in perfect classification accuracy, achieving 100% with a κ statistic of 1 (Figures 7a–7d). This result highlights the importance of focusing on specific spectral regions that are directly related to chlorothalonil, rather than relying on the entire spectrum, which contains unnecessary and overlapping information. By concentrating on these fingerprint regions, the PCA-SVM model achieved robust and reliable classification, making it a superior method for detecting chlorothalonil residues in vegetables. The consistent detection of C–Cl vibrations (354−414 cm–1) and C≡N stretching (2238 cm–1) across all vegetable matrices positions these as unique markers for chlorothalonil contamination, as they are naturally absent in the vegetable matrix. In contrast, C–C bonds in the 1260−1286 cm–1 and 1540−1570 cm–1 regions overlap with natural vibrations from the vegetables, but the overall approach still allows for successful differentiation.

Scatter plots showing the linear PCA-SVM classification results for different spectral regions: (a) 354–414 cm–1, (b) 1260–1286 cm–1, (c) 1540–1570 cm–1, and (d) 2250–2265 cm–1. Each region achieved a classification accuracy of 100% and a κ coefficient of 1. The gray areas indicate the regions where the SVM model predicts the samples to be spiked, while the white areas indicate the regions where the model predicts the control samples.
Conclusion
This study effectively demonstrates the utility of 2D-COS applied to Raman data, in combination with PCA and SVM, for detecting chlorothalonil residues in various vegetable matrices. While difference spectroscopy highlighted general spectral variations between spiked and control samples, it was insufficient in resolving overlapping signals, especially in complex matrices. 2D–COS offered a more advanced approach, enhancing spectral resolution and enabling the identification of critical C–Cl and C≡N bond vibrations, which served as reliable fingerprints for chlorothalonil across all the vegetable sample types. The perfect classification accuracy achieved by focusing on these fingerprint regions, as opposed to the full spectral range, highlights the value of targeting key vibrational bands. Although this study used higher pesticide concentrations as a proof of concept, the findings lay a solid groundwork for future research at lower concentrations, closer to regulatory maximum residue limits. This methodology shows substantial promise for field-based pesticide screening, offering a rapid, non-invasive, and accurate approach to food safety. Future studies should validate these findings under real-world conditions and examine their effectiveness at detecting lower pesticide residues, moving toward practical, routine food safety monitoring.
Footnotes
Acknowledgments
The authors acknowledge the Center for High-Performance Computing (CHPC), South Africa, for providing computational resources to this research project.
Author Contributions
C. N. Ndung’u: Conceptualization, methodology, investigation, data curation, formal analysis, writing original draft, writing review, and editing. K. A. Kaduki: Conceptualization, funding acquisition, supervision, writing review, and editing. M. I. Kaniu: Conceptualization, supervision, resources. software, methodology, writing review, and editing. L.W. Kiruri: Conceptualization, supervision, methodology, writing review, and editing.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Declaration of Generative Artificial Intelligence (AI) and AI-Assisted Technologies in the Writing Process
During the preparation of this manuscript, the authors utilized an LLM for copy editing and language refinement. The AI tool was employed solely to enhance text clarity and fluency. All content, including citations and references, was independently generated and verified by the authors prior to AI usage. The authors take full responsibility for the accuracy and integrity of the manuscript. The limitations of AI, including potential biases and errors, were acknowledged, and steps were taken to mitigate these in the final manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Swedish International Development Cooperation Agency (SIDA), through the International Science Programme (ISP), Uppsala University, for financial support through grant numbers IPPS KEN:04 and IPPS AFR:04/AFSIN, and The Kenya Education Network financially supported this project through SIG grant number KENET/CMMS/2022.

