Abstract
Introduction
Heart rate variability (HRV) measures variation in the time interval between adjacent heartbeats and has become a widely used biomarker of autonomic nervous system (ANS) balance. 1 High variability in heart rate is associated with efficient adaptation of the ANS to physical and psychological challenges (sometimes called stress), whereas low variability in heart rate may indicate insufficient ANS adaptation to challenge and illness. 2 Resting HRV, as quantified by the root mean square of successive differences (RMSSD), ranges from less than 20 ms to over 70 ms in healthy adults. 3 However, this variability is highly individual and can be influenced by several factors, including age, sex, physical fitness, and general health status. HRV parameters in both the time and frequency domains have been associated with cardiovascular morbidity and mortality,4,5 diabetes, 6 depression,7,8 stress levels,8,9 pain,10,11 sleep quality, 12 and athletic performance.13–15 Therefore, HRV measures have the potential to indicate imbalances in the ANS that may also affect an individual's overall health status and function.
Heart rate variability is measured using electrocardiogram (ECG) or photoplethysmography (PPG) applied to the chest and/or limbs.
1
While multi-lead ECG is considered the gold standard for assessment, this method does not provide for the continuous and routine collection of HRV data.
16
Wearable devices utilizing ECG to measure HRV can be generally placed into three categories: wired chest monitors, clothing garments, and chest/shoulder/arm straps. Multiple systematic reviews suggest that wired chest monitors (e.g., Bittium FarosTM, Firstbeat Bodyguard 2, and Actiheart) demonstrate the best reliability and validity compared to multi-lead ECG, making these devices a suitable alternative to the gold standard.17–19 The Firstbeat Bodyguard 2 in particular demonstrated a very strong correlation (r > 0.95,
Multiple studies have evaluated the reliability and validity of using rings or smart watches equipped with PPG to assess HRV compared to the gold standard of ECG. Apple, Samsung, and Garmin smart watches demonstrate acceptable accuracy for specific resting or nocturnal time-domain measures of HRV, but not for measures recorded during moderate to intense activity.22,23 In another laboratory-based study designed to validate six wearable devices (i.e., Apple Watch S6, Garmin Forerunner 245 Music, Polar Vantage V, Oura Ring Generation 2, WHOOP 3.0 and Somfit) for assessing HRV against the gold-standard electrocardiogram, the authors found mixed results (ICCs ranging from 0.24 to 0.99). 24 All six devices underestimated HRV (RMSSD), with absolute bias ranging from 0.7 to 33.1 ms. These findings suggest that PPG recordings of HRV may be useful, but only when assessed in resting or nocturnal states due to the susceptibility of PPG to motion noise.25,26
Fitbit is a smart watch utilizing PPG that can provide a measure of nocturnal HRV and is one of the most popular brands among consumers and researchers.27–29 Despite its widespread use, only one study has investigated the clinimetrics of any Fitbit device for measuring HRV. Hermans et al. assessed the validity of the Fitbit Charge 4, using the Polar H10 chest strap as the reference standard, for measuring HRV over seven consecutive days (24 h per day) in patients with chronic obstructive pulmonary disease (COPD) and age- and sex-matched healthy controls. 30 The study reported a strong correlation between HRV measures from the Fitbit Charge 4 and the reference device in COPD patients (r = 0.83), and a moderate correlation in healthy controls (r = 0.64). 27 A key strength of this study was the inclusion of a matched healthy control group. A potential limitation of this study is that the Polar H10 is a relatively new model that has not been as rigorously evaluated for concurrent validity 31 and requires the wearer to simultaneously don a chest strap and wristwatch to collect data.
The resting RMSSD measure reported by Fitbit is one of the most widely used time domain measures of HRV, and is best at describing the short-term beat-to-beat variations reflective of parasympathetic activity. 32 The ability of Fitbit to track variations in parasympathetic activity could be particularly useful in patients with pain, who experience increases in heart rate, blood pressure, and respiratory rate when pain is elevated. 33 While evidence generally supports the collection of HRV at night and in a supine position using ECG,34,35 definitive clinimetric studies are lacking on the ability of Fitbit to provide reliable and valid measures of nocturnal HRV, particularly in a sample of patients with pain. Therefore, the purpose of this investigation was to compare nocturnal HRV measures obtained from the Fitbit Versa 4 to a validated reference standard measure obtained from the Firstbeat Bodyguard 3 in a sample of adults with and without shoulder pain.
Methods
Participant population
Institutional Review Board approval (STU-2021-0495) was acquired prior to the inception of this study, and all experiments were conducted in accordance with the Helsinki Declaration. 36 All participants were fully informed of the experimental procedures and risks, and gave their written consent before any testing was conducted. Each participant was screened for eligibility via email or phone call. Participants were recruited as part of a larger study with inclusion criteria of: 1) 18 years of age or older and 2) scheduled to undergo a primary shoulder arthroplasty. Exclusion criteria were: 1) presence of central or peripheral neurological conditions impairing sensation, 2) non-English speaking, and 3) pregnant or may become pregnant during the duration of the study. Healthy controls were also recruited as part of the larger study (no history of shoulder pain or dysfunction in the preceding 36 months). Between July 2023 and January 2024, eight participants (seven with shoulder pain and one healthy control) were prospectively recruited and agreed to participate in the study. We performed a separate correlation analysis excluding the one healthy control participant, which yielded a similar result. Therefore, we included the control participant in the final analysis.
Measurements
Data was collected using one of three Fitbit Versa 4 (Google LLC, Mountain View, California, USA) watches and one of three Firstbeat Bodyguard 3 (Firstbeat Analytics, Jyväskylä, Finland) devices checked out to the participant. Participants were also given Firstbeat ECG electrodes and a Fitbit power charger. In addition to a comprehensive handout detailing operation instruction for both wearables, participants were given detailed verbal instructions on proper wearing techniques and wearing schedules. Participants were instructed to wear both devices concurrently for a minimum of one hour prior to bedtime, throughout the night, and for at least one hour upon waking, over the course of at least three nights. Nocturnal HRV was measured simultaneously on both devices for a total of 26 nights across 8 participants.
In this study, we focused on determining the agreement of RMSSD values reported by the Fitbit Versa 4 with those reported by Firstbeat Bodyguard 3. RMSSD is calculated by first determining the time intervals between consecutive heartbeats in milliseconds. Next, these intervals are squared, averaged, and finally, the square root of that average is taken to yield the RMSSD value. 32 Fitbit provides to the user a singular measure of nocturnal HRV calculated as the median of 5-min RMSSD recordings collected during sleep. Nocturnal RMSSD is the only time-domain HRV metric produced and displayed by the device. Among the time-domain measures of HRV, RMSSD has the advantage of computation simplicity and reasonable accuracy for short and ultrashort recodings. 37 As a result, it is a preferred measure for the short-term recordings provided by wearable sensors.
Data were stored on the individual devices until they were returned via mail to the investigators, at which point Fitbit data was uploaded to the Fitbit cloud and accessed through Fitabase, and Firstbeat data were downloaded to a CSV file. ECG signals were sampled at 256 samples per second, allowing for high temporal resolution and thus accurate HRV computation.38,39 The sampled ECG signals were transferred to Kubios HRV Software (version 4.1.0, Kubios Oy, Kuopio, Finland) to extract HRV (expressed as RMSSD in ms). 40 Kubios provides flexible preprocessing options for extracting RMSSD values from ECG data to enhance heartbeat detection. The Kubios automatic beat detection algorithm identifies artifacts based on differences between successive RR intervals using an adaptive (time-varying) threshold. 41 Ectopic and misplaced beats are corrected via interpolation, while abnormally long or short RR intervals are adjusted by removing superfluous R-waves and recalculating intervals. A threshold-based correction also evaluates each RR interval against a median-filtered local average, where values deviating beyond a user-defined threshold are excluded. In this study, we applied the “Medium” setting, corresponding to a 0.25 s threshold. Additionally, Kubios applies detrending using the smoothness priors method to mitigate slow, nonstationary trends in heart rate, with a default cutoff frequency of 0.049 Hz, which was used in our analysis. 42 The detrending helps avoid errors in HRV analysis due to slow nonstationary trends that may be present in the mean heart rate. This detrending approach has the effect of a time-varying filtering where the cut-off frequency can be adjusted by changing the level of a smoothing parameter. 42
The details of the preprocessing or any filtering of the optically detected heartbeats by Fitbit are not publicly available. Available documentation indicates that RMSSD is calculated from the longest sleep period over the past 24 h, and only sleep periods greater than 3 h are considered. As noted previously, Fitbit computes RMSSD values for 5-min intervals during sleep, but only when its proprietary algorithm deems the heart rate data acceptable. This is evidenced by occasional missing RMSSD values during sleep periods. Rather than relying on Fitbit's internal selection criteria, our analysis includes only the 5-min epochs for which RMSSD values are reported, with Firstbeat data aligned to these same intervals.
Statistical analysis
Normality was examined for continuous data by the Shapiro-Wilk test, and normal data were expressed as the mean plus/minus standard deviation. Non-normal data were expressed as median with first and third quartiles (Q1 and Q3). Wilcoxon signed-rank test was performed to compare the median RMSSD of the reference standard Firstbeat Bodyguard 3 with RMSSD measured by Fitbit Versa 4. Spearman correlation analysis was conducted to test the correlation between Fitbit and Firstbeat for the median RMSSD of each night. A linear mixed-effects model (LMM) with log transformation was used to investigate the difference between Firstbeat and Fitbit RMSSD 5-min epochs, accounting for the correlation among repeated measurements from the same subject. In an effort to address movement artifacts or variability in device fit across participants, we removed outliers from the final analysis that exceeded a threshold of ±3 standard deviations from the mean. We selected this threshold since values exceeding the mean by more than three standard deviations were supraphysiological and unlikely to represent genuine physiological fluctuations in heart rate. Thirty-two, or 1.51%, data points were ultimately removed from our final analysis.
Finally, to assess the agreement between the Fitbit and the Firstbeat median RMSSD, we performed a Bland-Altman analysis accounting for multiple observations per participant.43,44 We calculated the mean of the difference between the Fitbit and Firstbeat-derived RMSSD values (bias), the standard deviation (SD) of the differences, and the 95% confidence interval of the agreement. 45 We also calculated the mean absolute difference between the Fitbit and Firstbeat-derived RMSSD values (absolute bias). For all tests, a p-value less than 0.05 was considered statistically significant. All the data analyses were conducted using SAS 9.4 (SAS Institute Inc., Cary, NC).
Results
Detailed participant demographics and clinical characteristics can be found in Table 1. The median RMSSD from Fitbit was 21.660 ms [15.101, 30.671] and the median RMSSD from Firstbeat was 20.558 ms [12.741, 30.710], resulting in an absolute bias of 1.102 ms. There was no significant difference between Fitbit and Firstbeat median RMSSD values,

Scatter plot of the 5-min median RMSSD values recorded during sleep. Each data point represents a 5-min epoch of median RMSSD measured throughout each night from all participants.

Scatter plot of the Fitbit median RMSSD and Firstbeat median RMSSD for each night. Each data point represents the median RMSSD of an entire night of sleep for a participant.

Bland-Altman plot showing the agreement between Fitbit and Firstbeat for measuring nocturnal HRV.
Demographic and clinical characteristics.
Values are mean ± one standard deviation unless otherwise noted.
Abbreviations: BMI – Body Mass Index; RMSSD – Root mean square of successive differences; Q1 – First quartile; Q3 – Third quartile.
Discussion
The primary objective of this study was to compare nocturnal HRV measures obtained from the Fitbit Versa 4 and the Firstbeat Bodyguard 3. The results from this study demonstrated no significant difference and a strong correlation between devices for median nocturnal RMSSD values, as well as acceptable agreement based on the Bland Altman Plot. This finding is critical since the Fitbit user only sees the median nocturnal RMSSD value. As seen in Figure 1, after removal of the outliers, the 5-min epochs from the Fitbit Vera 4 appeared to be more susceptible to extreme values. One possible explanation for the extreme values in the Fitbit data is the placement on the participant's wrist, as opposed to being securely fastened to the chest like the Firstbeat device. This difference in positioning makes the Fitbit more susceptible to movement artifacts and may lead to decreased contact with the skin. Another explanation could be variability in the tightness of the Fitbit watch strap, potentially resulting in inconsistent contact with the skin.
The results of the present study are consistent with previous research conducted on older adults with chronic obstructive pulmonary disease (COPD), which found a strong correlation (r = 0.83) between HRV measures from a Fitbit Charge 4 and the reference device, Polar H10. 30 The authors concluded that the Fitbit Charge 4 significantly underestimated the average median HRV by 3 ms in patients with COPD compared to 7 ms in healthy patients. They hypothesized that the enhanced accuracy and stronger correlation observed in patients with COPD, compared to healthy controls, may be attributed to the accelerated aging and skin thinning associated with COPD, which could facilitate a more efficient reflection of light during photoplethysmographic measurements. Results from our analysis suggested that the Fitbit Versa 4 overestimated median HRV on average by 0.76 ms in patients with shoulder pain, which is smaller in magnitude but opposite in direction to the previous study. Although skin thinning did not likely impact our results since no patients in the present study had COPD, it is possible that subjects in both studies shared a mechanism of autonomic nervous system dysfunction. This suggestion is supported by subjects in both studies reporting the presence of chronic conditions and RMSSD values in the low 20 s.
Although medication effects, inflammation, and sleep disturbance are all likely to influence HRV, our results suggest that Fitbit is still able to provide accurate median nocturnal RMSSD values compared to a reference standard in those with suspected autonomic dysfunction. RMSSD is an established measure of parasympathetic nerve activity that is useful for tracking autonomic nervous system responses. 46 Jarczok et al., in a proof of concept study involving 9550 working adults from 19 study sites, have shown that reduced RMSSD can reflect elevated risk across a range of established cardiovascular risk factors, namely, hyperglycemia, hyperlipidemia, and inflammation. 47 Also, Schuman et al. 48 have reported that baseline RMSSD predicted symptom improvement in veterans suffering from post-traumatic stress disorder undergoing HRV biofeedback. Most recently, a study has shown the efficacy of RMSSD in tracking physiological stress. 49 These studies provide support for the use of clinical devices to track RMSSD in patients with autonomic dysfunction in a variety of conditions, and in a meaningful way, to use this data to impact patient prognosis or management decisions.
To the best of our knowledge, research evaluating the agreement of Fitbit to a reference device for measuring HRV is limited to the present study and a recent study by Hermans et al. 30 For other metrics, systematic reviews have reported that specific Fitbit models showed high inter-device reliability for steps, distance, energy expenditure, and sleep, and acceptable validity for measures of steps and time in bed or asleep.50,51 Fitbit devices tend to overestimate daily step counts in cohorts with better functional exercise tolerance (e.g., healthy adults50–52 or cancer survivors 53 ), but not in those with lower exercise tolerance (e.g., patients living with Parkinson's Disease 54 or COPD 55 ). In two studies validating Fitbit against reference measures for heart rate (HR), one found that the Fitbit significantly underestimated HR on average in a cohort of healthy adolesents, 56 while another found no significant difference in healthy adults. 57 Further research is needed to investigate the accuracy of step counts, HR, and HRV in patients with comorbidities affecting the autonomic nervous system, as accuracy could differ from that seen in healthy controls. Additionally, it is important to highlight that sex differences exist in the autonomic control of the heart, as reflected by heart rate variability measures. 58 Future studies should account for age, sex, and the presence of comorbidities affecting the autonomic nervous system when investigating the agreement and reliability of wearable devices for measuring HRV.
The present study is not without limitations. Our sample size was small (n = 8); however, our analysis included data over 26 nights, and post-hoc power analysis confirmed sufficient power, β = 0.84. The comparisons between the two devices are limited to HRV and cannot be generalized to other metrics in the absence of comparative data. Additionally, our study population consisted primarily of older adults with symptomatic shoulder pathology, limiting the generalizability of our findings. Further, because we did not sync the Fitbit watch to the cloud until the devices were returned to us, there were some technical issues related to data extraction from the Fitbit devices. For example, if the device at any point was fully drained of battery charge, data for an entire night of sleep was lost. To address this potential for missing data, we instructed participants to fully charge the device prior to returning the device via first-class mail. Finally, we used a validated medical-grade device (Firstbeat) as a reference standard for the Fitbit; however, even these devices are susceptible to measurement error outside of laboratory settings.
Conclusion
To our knowledge, this is the first study to examine both the correlation and agreement between the consumer-grade Fitbit Versa 4 wearable and a validated reference standard device for measuring nocturnal HRV. Our findings suggest that the Fitbit Versa 4 demonstrates acceptable agreement with Firstbeat Bodyguard 3 in a population of adults with shoulder pain. Future studies should explore the feasibility and acceptability of using physiological data collected from the Fitbit Versa 4 to inform management decisions in real-world settings.
