Abstract
Introduction
Prompt detection of seizures and intervention can prevent injury and improve outcomes. 1 Caregivers can provide first aid and administer seizure rescue medication, reducing the risks of status epilepticus 2 and complications. Accurate reporting of seizures and frequency aids in treatment, risk assessment, and potential prevention of sudden unexplained death in epilepsy (SUDEP). 3 Most cases of SUDEP go unwitnessed and occur at night, 4 so timely detection of seizures is crucial for intervention and the prevention of SUDEP.
Interest in video- and audio-based seizure detection systems is growing. These systems use algorithms to analyze recorded signals to detect real-time seizures. The algorithms extract, track, and classify the signal to determine the likelihood of a seizure. Automated techniques offer advantages over traditional methods, especially in home environments, promoting independence for individuals with epilepsy. Additionally, these systems may reduce the need for continuous supervision and expensive, time-consuming continuous EEG monitoring in hospitals.
Methods
A preliminary search of “video seizure detection” in PubMed, Medline, and Cumulative Index to Nursing and Allied Health Literature was aimed to identify relevant keywords. After 2 independent reviewers evaluated these abstracts, Karayiannis et al 5 published one of the earliest automated video seizure detection manuscripts. Thus, we restricted publication dates to 2006 for the final search strategy. Several articles discussed automated audio-based seizure detection methods, leading us to include “audio” in our final search strategy. Keywords searched in the titles and abstracts were combined using Boolean logic with appropriate controlled vocabulary terms from relevant articles. This strategy was translated into other databases: Embase, ClinicalTrials.gov, and Web of Science. Google Scholar and MedNars were also searched to identify unpublished data and gray literature. Searches were run from 2006 to June 27, 2022, with no language filters. Validated human filters were applied where they existed.
After importing all abstracts and citations into EndNote, we deduplicated them using the method by Bramer et al. 6 This set was then imported into Covidence systematic review software™. 7 Two independent reviewers screened titles and abstracts for inclusion based on criteria; audio- and/or video-detection of any seizure type, patients with a diagnosis of epilepsy, and articles written in English. Exclusion criteria included animal studies, analyses solely on psychogenic nonepileptic seizures, other scoping or systematic reviews, and opinion or commentary articles. Selected citations underwent full-text assessment by the same reviewers against the inclusion and exclusion criteria. A third independent reviewer resolved all conflicts. Data from selected studies was extracted by 2 independent reviewers using Covidence. Figure 1 details the study design from database searches to final article selection. Additionally, guidelines for evaluating seizure detection devices put forth by Beniczky and Ryvlin were followed. 8 For a detailed methodology for classifying articles into phases 0 to 4 as proposed by Beniczky and Ryvlin, see Figure 2.

Methods for database searches and database sorting, including inclusion and exclusion criteria.

Phase 0 to 4 classification.
Results
Search Results and Phase Classification
Our final deduplicated search yielded 4487 unique abstracts. After applying inclusion and exclusion criteria, 82 full-text articles were chosen for further evaluation. Thirty-five articles, including one addendum, were then selected for extraction. A total of 34 articles underwent data extraction, and the addendum was used to update the data extracted from the original article. One study utilized audio-based seizure detection, 9 while the remaining 33 studies were video-based seizure detection. Table 1 contains pertinent information on general study data, seizure detection methodology, and performance metrics. Many studies reported multiple values for sensitivity, specificity, and accuracy. Due to this heterogeneity, the default reporting metric in Table 1 is the mean value for these variables. When mean values were unavailable, the highest reported values were noted in Table 1.
Main Study Results.a
Abbreviations: CNN, convolutional neural network; FFNN, feedforward neural network; GTC, generalized tonic–clonic; LSTM, long short-term memory; NR, not reported; QNN, quantum neural network; RBFNN, radial basis function neural network.
a Please contact the authors for a comprehensive evaluation of video resolution, frames per second, total recording time, night recording data availability, deficiency time, false alarm rate, safety, detection latency, and user experience. Additional information regarding relevant excluded studies may also be obtained by contacting the authors.
Ten studies analyzed myoclonic seizures, 6 analyzed generalized tonic–clonic (GTC) seizures, and 15 analyzed clonic seizures. Some also assessed absence seizures and other minor seizures. The majority analyzed multiple seizure types. Only 13 of 34 studies reported recording durations, ranging from 20 minutes 20 to 2965 days. 40 Thirteen studies recorded data at night. Each study was reviewed to classify appropriately by phase. However, many studies exhibited characteristics of multiple phases, posing challenges in classification decisions. To address this, two reviewers (EWP and TS) independently classified each study, resolving discrepancies.
Detection Metrics for Phase 0 and Phase 1 Studies
Data on sensitivity, specificity, and accuracy is unavailable for the 3 phase 0 studies. Only 5/14 phase 1 studies reported sensitivity data. Of those, the highest sensitivity (100%) was achieved by a study that used an optical-flow-based algorithm to detect clonic seizures. 22 The lowest reported sensitivity (77%) was from a study that used Harris corner detection and optical flow to detect myoclonic jerks. 14 Three studies reported specificity data. The highest was 93.3% from a study that used Gaussian mixture models, coarse-to-fine paradigm, graph-cut-based segmentation, and domain knowledge to detect motor and hypermotor seizures. 16 The lowest reported specificity (89%) was from a study that used an algorithm consisting of cumulative mean normalized difference function and the normalized autocorrelation function to detect clonic seizures. 19 Of the 6 phase 1 studies that provided data on accuracy, the highest value (100%) was given by a study that used Radon Transform-based technique on dual tree complex wavelet Dual Tree Complex Wavelet Transform (DT-CWT). 15 The lowest reported accuracy (69.8%) was provided by a study that used a leave-one-patient-out cross-validation approach. 13
Detection Metrics for Phase 2 Studies
Eleven of 15 phase 2 studies reported data on sensitivity. Of those, the highest reported sensitivity was 100% from a study using optical flow for generalized clonic, GTC, long generalized tonic, and hyperkinetic seizures. 30 The lowest reported sensitivity for a phase 2 study was 70% from a region-based convolutional neural network, long short-term memory (LSTM), and leave-one-subject-out cross-validation in their specific pose analysis subanalysis. 27 Of the 8 studies that reported on specificity, the highest value was 97.7% using a cosine radial basis function neural network (RBFNN). 5 The lowest value for specificity from a phase 2 study (60%) was also the same study that had the lowest sensitivity for a phase 2 study, again in their pose analysis subanalysis. 27 The highest reported accuracy was 95.2% from a study using an LSTM approach. 41 The lowest reported accuracy was 50.9% from the same study, this time using a leave-one-subject-out cross-validation approach. 41
Detection Metrics for Phase 3 and 4 Studies
Two phase 3 studies reported on sensitivity. The highest reported sensitivity (100%) was from a hybrid video/audio-based epilepsy monitoring system utilizing optical flow, specifically for detecting clonic and GTC seizures. 38 The lowest reported sensitivity (0%) was from the same study when they analyzed “subtle seizures.” 38 Only one of the two phase 3 studies reported on specificity and accuracy, with the highest reported values of 78% and 76.3%, respectively. 39 The single phase 4 study did not report any performance metrics. 40
Discussion
Summary
This scoping review started with 4487 unique abstracts and ultimately included 34 articles for data extraction. The review focused on audio and video-based seizure detection methods, highlighting key findings: the highest reported sensitivity was 100% in 3 studies using optical flow analysis. 22,30,38 Specificity ranged up to 97.7% in one study that used RBFNN. 5 Accuracy was reported by 29% of studies, with the highest at 100% using Radon Transform-based technique. 15
Sensitivity
Among the studies that provided data on sensitivity, the highest reported values were 100%. 22,30,38 All 3 studies employed optical flow analysis, effectively capturing subtle movement patterns associated with seizures, leading to high sensitivity. In one study, which uses a hybrid of audio-video data, a computer vision-based algorithm detects seizure epochs. 38 It is important to note that all 3 of these studies included a relatively small number of seizures: 36, 22 48, 38 and 50. 30 These smaller sample sizes may limit the representativeness and statistical power of the results compared to the study with the highest reported number of seizures analyzed (2767 seizures). 40 Although the larger study did not report sensitivity, specificity, or accuracy data, it provides a stronger foundation for drawing conclusions. Interestingly, the same study that used an audio-video hybrid detection system and reported 100% sensitivity for seizure epochs also reported the 2 lowest sensitivities of 0% and 27% for subtle seizures and epileptic spasms, respectively. 38 This underscores the importance of both the algorithm employed and the dataset’s size and diversity. Future studies could benefit from categorizing results by seizure type for a more equitable assessment, and potentially combining separate approaches.
Specificity
A study that utilized RBFNN and included 80 myoclonic and focal clonic seizures exhibited the highest specificity of 97.7%, supporting a nonoptical flow-based approach. 5 Data length, particularly with true negative data, greatly affects specificity estimation. Studies with sufficient nighttime data offer more true negative instances, leading to more stable specificity estimates and increased result reliability. Conversely, shorter recordings with limited true negatives may yield unreliable estimates due to sensitivity to minor variations. Proper representation of true negative data is essential for achieving a balanced assessment of a model’s performance.
Accuracy
Few studies reported accuracy data. Among them, a study with a Radon Transform-based approach on DT-CWT coefficients 15 exhibited the highest accuracy. However, accuracy is not always a viable metric due to the strong data imbalance, that is, many more interictal times than seizures. In such scenarios, accuracy may be misleading by simply predicting the majority class, in this case, interictal. Additionally, this study only included 5 seizures in their dataset, further limiting the interpretation of this result.
Challenges
Challenges in the review process such as search strategy, database selection, selection bias of articles, reviewer bias, and the two-reviewer approach were carefully managed as best as possible to ensure the integrity and robustness of the research. Part of our inclusion criteria was that the full texts were written in English. This may have resulted in missed pertinent articles due to a language barrier. The first round of selection was based on abstracts and titles to limit the number of full-text reviews. Likewise, this strategy could have excluded relevant studies due to the limited scope of information initially available to the reviewers. Furthermore, it was not feasible to search every available database online, so a selection bias exists among the 7 databases we searched. To reduce reviewer bias, structured training, calibration exercises, and regular discussions were conducted. While the two-reviewer approach created a higher risk of selection bias, it also ensured a rigorous evaluation of each study, promoting consensus and reducing individual subjectivity. In split decisions, a collaborative process involving a third reviewer was employed to reach consensus, ensuring transparency, and minimizing biases.
Data collection was hindered by various reporting methods in the studies. For example, some authors aimed to collect data as mean values. However, not all studies reported means; instead, they reported the highest value for specific performance metrics. This further limits the interpretation of results because studies may have reported on outlier statistics. Data interpretation is difficult for all studies without true mean values and standard deviations.
Sample sizes varied widely across studies, with datasets often tailored to specific patient populations or controlled conditions. Understanding outcomes was influenced by the type of seizure detected, notably highlighting challenges with absence seizures compared to visually evident GTC seizures. Some studies focused on specific seizure types or detected a combination, while others did not specify seizure types. One study achieved exceptional accuracy at 99.9%, 24 specializing in absence seizure detection. However, lacking sensitivity and specificity data restricted result interpretation.
Additionally, the lack of a standardized evaluation protocol makes directly comparing the results across studies challenging. Future research should address these limitations by conducting large-scale studies with consistent statistical approaches for comparison. There is also a need for standardized datasets for covalidation and to exchange algorithms to identify the best-performing approaches.
Conclusion
This review showcases the potential of video and audio-based seizure detection, emphasizing diverse high-performing methodologies. However, limited sample sizes and a lack of standardized evaluation protocols indicate the necessity for more research. Consistent reporting metrics are vital for evaluating these systems. Improvements in testing and algorithms will advance noninvasive seizure detection, benefiting epilepsy management. Integrating advanced machine learning and AI techniques can enhance accuracy, reliability, and pave the way for personalized treatment strategies. Like the historical use of “veni, vidi, vici,” automated and AI powered video detection of seizures is here to stay and slowly but surely cementing its position as a feasibly deployed tool in clinical seizure detection and classification.
