Abstract
Introduction
There is a substantial body of evidence demonstrating that physical activity reduces morbidity and mortality risk.1–5 This has led government organisations like the UK National Health Services (NHS) to advise the public on methods for engaging in physical activity. 6 In its guidance, the NHS suggests that participating in ‘moderate’ intensity (or above) physical activity leads to health benefits and advises people on how they can gauge ‘moderate’ intensity using subjective measures. The NHS suggests that moderate intensity “…will raise your heart rate, and make you breathe faster and feel warmer. One way to tell if you're working at a moderate intensity level is if you can still talk, but not sing”. 6 Given these public facing recommendations it is important that they are examined for accuracy via scientific investigation.
As an alternative to these subjective methods of gauging ‘moderate’ intensity, the ACSM's guidelines for exercise testing and prescription 7 highlight a range of objective measures of exercise intensity, including both relative (% of heart rate reserve [%HRR], % of maximal heart rate [%HRmax], % of maximal oxygen consumption [%VO2max], % of oxygen consumption reserve [%VO2R]) and absolute (metabolic equivalents [METs]) methods. Relative intensity is described as the exercise intensity relative to the user's physiological capacity 8 and absolute intensity as the energy required to perform an activity. 8 However, there is a large disparity between relative and absolute measures of physical activity intensity.8–10 For example, Warner et al. 10 conducted a systematic review with Bayesian meta-regression of secondary data to examine the agreement between relative and absolute intensity during walking and reported agreement in only 43% of trials. Moreover, a large variation in %HRR at a given MET was reported. For example, at 3 METs, which is considered the lower bound of moderate intensity, the model predicted that %HRR was 33%. This is lower than the 40%HRR considered as the lower bound of moderate intensity when using %HRR. 7 However, the 95% credible interval ranged from 18%HRR to 57%HRR, which spans both ‘light’ and ‘moderate’ intensity classifications and is very close to the lower bound of ‘vigorous’ intensity (60%HRR). As such, there is considered heterogeneity in relative intensity at a fixed value of absolute intensity.
Cadence, which is the number of steps taken per minute, is another measure that can be used to guide walking intensity. Although some researchers suggest that a walking cadence ≥ 100 steps.min− 1 is an appropriate heuristic value that will allow most people to reach absolute moderate intensity (3 METs),11–14 recent evidence suggests this ‘one-size-fits-all’ simplistic approach may be misguided.15–17 Recent research using relative intensity (%VO2R) has highlighted the impact both cardiorespiratory fitness and health status have on the cadence required to reach moderate and vigorous intensity. 15 Cardiorespiratory fitness plays an integral role in the required walking cadence to reach moderate intensity, which suggests that an individualised prescription of exercise intensity, would be a more appropriate measure of physical activity intensity compared to an absolute measure like METs.
As relative measures of physical activity use percentage thresholds of physiological parameters, they are likely to be more individualised and thus more accurate on an inter-individual basis compared to METs. 10 However, the main limitation of relative measures and prescription of individualised physical activity has been one of accessibility, or lack of, in a real-world setting. Relative intensity measures routinely collected in laboratory settings, such as oxygen consumption or ventilatory threshold, are currently not feasible outside of a laboratory, mostly due to the inability of wearable devices to collect these measures. 18 However, heart rate measurement, and more specifically HRR, is now possible using a wearable device. The main advantage of using HRR is that it is highly associated with oxygen consumption reserve, which is a criterion measure of exercise intensity. 19 With the ever-developing wearables market, 21 the measurement of heart rate is rapidly becoming mainstream. Moreover, the accuracy of heart rate measurement at typical walking speeds is very good.22,23 There are, however, some limitations that have been reported with using a wearable device to measure physical activity, with previous research indicating that wearable devices may overestimate physical activity intensity. 24 Yet this effect is usually due to the accelerometer in wearable devices using an external measure of absolute physical activity intensity, and therefore not individualised to the user. Abt et al. 15 reported that the exercise intensity at which the native Apple Watch activity app recognised that the user was ‘exercising’ was ∼30% VO2R, suggesting that this device was not accurate at guiding the user to exercising at criterion moderate intensity (40% VO2R). The underestimation of moderate relative intensity would lead to an overestimation of moderate-to-vigorous physical activity (MVPA) minutes throughout the day. This is important, because Apple suggest that meeting the default 30 minutes of ‘exercise’ per day (as measured by the Apple Watch) is associated with significant health benefits. 25
Although the measurement of heart rate via a wrist worn wearable device now offers a viable option for measuring exercise intensity for millions of people, 21 it would be preferable to use a relative measure of intensity, such as HRR to do so. Heart rate reserve has been reported to be comparable to VO2R, 19 and therefore offers an accurate, yet feasible method for measuring relative intensity at a population level. 19 As the development of wearable devices grows, it is important for researchers, exercise professionals, and consumers to understand how relative, individualised measures of MVPA can be used in everyday exercise monitoring, prescription, and goal setting.
Therefore, the main aim of this study was to build on the results of Abt et al. 26 to examine whether a relative measure of exercise intensity, %HRR, can measure criterion exercise intensity more accurately compared to the native Apple Watch Activity app. We did this through the creation of a bespoke Apple Watch app (herein called the ‘bespoke MVPA app’) which measures %HRR during exercise.
Methods
Study preregistration
The study design, including a statistical analysis plan, was preregistered prior to data collection, and can be found at https://osf.io/j39fa. STROBE guidelines were adhered to throughout the reporting of this observational study. 27 All raw data and coding script used to analyse the data can be found at https://osf.io/3uejs/files/osfstorage.
Ethics
Ethics approval was obtained from the Faculty of Health Sciences Research Ethics Committee at the University of Hull, East Riding of Yorkshire, United Kingdom (REF – FHS230) prior to recruitment of participants.
Study design
This was an observational study, with participants attending the laboratory on two occasions. The first visit included medical screening, obtaining written informed consent, resting measures, and measurement of maximal oxygen consumption and heart rate. The second visit was the main experimental trial.
Participants
Participants were healthy adults aged between 18 and 65 years of age (see Table 1 for demographic data) and provided written informed consent to participate in the study. Participants were excluded if: 1) they were classified as moderate or high-risk according to the ACSM risk classification criteria, 28 2) unable to walk on a motorised treadmill, 3) had a BMI ≥ 40), 4) were currently prescribed medication that altered their heart rate response to exercise (e.g., beta-blockers), or 5) had an injury or disability that altered their gait. Participants were recruited from the University and local community through written promotional material and personal communication.
Proportion of female and male participants, together with the mean (SD) participant demographic data.
Sample size
As outlined in the study pre-registration, a minimum sample size of 50 participants was required, but with final sample size was determined by Bayesian sequential testing until a particular level of precision was attained or, if not achieved, by a stopping rule of 12 months from the beginning of data collection as time allocated for data collection was limited to this time period. Bayesian statistical methods are particularly suited to sequential estimation where data come in a sample at a time and the posterior distribution is updated at each time point. 29 The precision required to stop data collection was when the width of the 95% HDI (high density interval) ≤ standard deviation*0.6. Using this criterion, a total of 74 participants completed all trials.
Setting
Participants were recruited from May 2021 for a 12-month period. All laboratory trials were completed on campus at the University of Hull.
Visit 1
On initial visit to the laboratory, participants completed the following protocols in the sequence highlighted below.
Preliminary measures
On entry to the laboratory, participants’ nude body mass was measured to the nearest 0.1 kg using digital scales (WB-100MA Mark 3, Tanita Corporation, Tokyo, Japan). A wall-mounted stadiometer (Holtain Ltd, Dyfed, Wales, UK) was used to measure stretch stature 30 and leg length measurement taken from the anterior superior iliac spine to lateral malleolus. 31
Resting oxygen consumption and heart rate
Criterion resting VO2 and heart rate were measured in a temperature-controlled laboratory. Participants lay in a supine position for 30 minutes with a cushion placed under their head. All other laboratory activity ceased, lights turned off, with participants were instructed to relax but not to go to sleep (closing of eyes was permitted). Breath-by-breath oxygen consumption (Cortex Metalyser 3B, GmbH, Germany), and heart rate were recorded continuously. Prior to the commencement of testing, the Cortex Metalyser was calibrated using a 3-point calibration: room air, a known gas concentration of oxygen and carbon dioxide, and volume calibrated using a 3-litre syringe in line with manufacturer guidelines.
The initial 10 minutes and final five minutes of oxygen consumption and heart rate data were discarded to allow for habituation, and expectation effects, respectively. The remaining 15 minutes of oxygen consumption and heart rate data were used for the analysis of resting oxygen consumption and heart rate. During minutes 15-20, the bespoke MVPA app recorded a resting heart rate value using its built-in resting heart rate protocol.
Maximal oxygen consumption
A graded exercise test completed on a motorised treadmill (h/p/cosmos, Pulsar, Nussdorf-Traunstein, Germany), using a step protocol commencing at 3 km.h−1 and a 1% gradient, was used to measure maximal oxygen consumption. Treadmill speed was increased by 0.5 km.h−1 every 30 seconds until running economy degraded to a point that the participant's posture and technique were affected, at this point, incline then increased every 30 seconds by 0.5% until volitional exhaustion. Oxygen consumption was measured continuously using the same system described for the resting oxygen consumption protocol. Maximal oxygen consumption criteria were: 1) a plateau in VO2 despite an increase in workload, 2) an RER of 1.05, and 3) a minimum category ratio RPE of 9. 32
Visit 2
Participants completed the experimental trial outlined below on their second visit to the laboratory.
Experimental trial
Prior to completion of the main exercise trial, participants were instructed to avoid vigorous exercise for 24 hours, maintain their normal diet, and avoid caffeinated drinks for three hours. The main trial involved participants completing 5-minute exercise bouts on a motorised treadmill starting at a walking speed of 3.5 km.h−1 and a gradient of 1% while wearing a Series 5 Apple Watch with the native Apple Watch activity app and bespoke app installed. Each subsequent bout increased speed by 0.5 km.h−1 until three minutes of the 5-minute bout were recorded as ≥ 40% HRR by the bespoke MVPA app and all five minutes by the native Apple Watch Activity app. During each 5-minute exercise period, participants were told to maintain their normal walking gait and were not permitted to hold onto the handrails. Oxygen consumption (Cortex Metalysers 3B, GmbH, Germany) and heart rate, via ECG chest strap (Polar T31, Polar Electro, OY, Finland), were recorded. Between each walking bout, participants had five minutes of seated rest. Once the treadmill bout had terminated and the treadmill belt was stationary, a chair was placed on the treadmill and participants were instructed to sit, motionless, with their hands placed on the treadmill handrail in order to ensure no movement contributed to the measurement of ‘exercise’ on the native Apple Watch Activity app. Five minutes of seated rest were provided to ensure the native Apple Watch Activity app had adequate time to update the green ‘exercise’ ring in line with previously reported methods. 26 The green ‘exercise ring’ is the Apple Watch's visual monitoring tool that displays daily minutes of completed exercise in the native Activity app.
Development of the bespoke MVPA app
The bespoke MVPA Apple Watch app was designed to quantify MVPA and to overcome the barriers and limitations of the native Apple Watch Activity app. The bespoke MVPA app was built in Core app C and Unity language, while the Apple Watch connectivity to Apple iPhone was built in Swift and Objective C. Previous studies have indicated that the native Apple Watch Activity app underestimates the intensity required to achieve MVPA by ∼25% when compared to directly measured %VO2R.22,26 The bespoke MVPA app has been designed to overcome this limitation, using %HRR to quantify moderate intensity. The bespoke MVPA app uses %HRR because it has been reported to be similar to %VO2R,
19
and %HRR can be calculated easily by the Apple Watch using data already available from HealthKit (resting HR and estimated maximal heart rate). A fifth generation Apple Watch (Series 5) running watchOS 7.1.1, or later update, was used to determine moderate intensity exercise. The bespoke MVPA app used the ACSM definition of moderate intensity, defined as 40% to 59% of HRR or VO2R.
28
As VO2R data are not available through the Watch photoplethysmography (PPG) sensor, HRR was used to substitute for this, such that moderate intensity physical activity was defined as 40–59% HRR. The equation used to calculate moderate intensity within the bespoke MVPA app was:
MVPA activity log
Every second spent at an intensity at or above 40% HRR counted towards the accumulation of an MVPA minute. The accumulated total was displayed on the Apple Watch, via the bespoke MVPA app interface and on the companion MVPA iPhone app.
Statistical analysis
Statistical models
Data from the native Apple Watch Activity app were compared to data from the bespoke MVPA app, and to the criterion measure of intensity, %VO2R. Data from the stage, when at least three minutes of the 5-minute bout met the moderate intensity criterion (40-59% HRR) measured by the MVPA app, were compared with data from the stage when all five minutes of the 5-minute bout had been classified as ‘exercise’ by the native Apple Watch Activity app. %VO2R and %HRR were analysed separately, and different groups of models for these two dependent variables were produced. Posterior distributions were estimated for both the bespoke MVPA app and the native Apple Watch Activity app, and these distributions were compared to a 40% VO2R criterion value measured via ECG chest strap, as well as to each other. In the comparisons between apps and 40% HRR criterion, a Region of Practical Equivalence (ROPE) was used. This ROPE meant that, for practical purposes, values from 37% to 43% were considered equivalent to 40% HRR. Posterior distributions of the % HRR, estimated by the MVPA app, meant that the %HRR during all five minutes of the 5-minute bout had been classified as ‘exercise’ by the native Apple Watch Activity app. Together with the ECG chest strap, these data were modelled using a series of Bayesian regression models (see Models fitted subsection below for further details). In all %HRR estimates, the 40% HRR measured by the ECG chest strap was considered the response variable.
For %VO2R, posterior distributions were estimated for both the MVPA app and the native Apple Watch Activity app and these distributions were compared to a 40% VO2R criterion value, as well as to each other. In the comparisons between apps and metabolic cart, a ROPE was used. As with % HRR, the ROPE was set from 37% to 43%. The posterior distributions for the apps and metabolic cart were modelled using a series of Bayesian regression models (see Models fitted subsection below) and then compared to determine the best model in terms of out-of-sample prediction accuracy and data fit (see Comparison Methods subsection below). In all %VO2R estimates, the %VO2R measured by the metabolic cart was considered the response variable.
Models fitted
The models estimating MVPA (40% HRR and 40% VO2R) were fitted with different response distributions (Gaussian, Skew Normal, and Student's t-distribution) and different prior distributions: 1) general weakly informative prior, 2) a measurement constrained prior that distributes probability over possible values constrained by the measurement, and 3) strongly informative priors, informed by previous studies. Each of these models included: 1) fixed effect models, 2) random intercept models, where intercepts for each individual measure were allowed to vary, and 3) models where both intercepts and slopes for everyone were allowed to vary. This provided a pool of models for each analysis, which were compared (see Comparison methods subsection below). The basic structure of the fixed effects, random intercept and random slope models are detailed below. These were adjusted to accommodate the different response distributions and priors outlined above.
Fixed effects model:
Comparison methods
All the models produced were compared using three methods in the following order of priority: Leave-One-Out cross-validation (LOO), Bayes Factor, and Bayesian R2. Firstly, LOO information criterion (LOOIC) was used to determine the relative predictive performance of the models in terms of pointwise out-of-sample prediction accuracy using log-likelihoods from posterior simulations of parameter values. 33 A model was considered better if it produced a LOOIC difference greater than twice its corresponding standard error. Where a comparison of models using LOOIC does not achieve this level of difference, the second model comparison method was used where Bayes Factors (BF) quantified the support for one model over another. Where a particular model achieved a BF10 greater than 10, relative to other models, it was considered the better model. To determine which model was better using LOOIC and then BF, the model with the highest Bayesian R2 was selected. The Bayesian R2 is seen as a data-based estimate of the proportion of variance explained for new data. 34
Results
Seventy-seven participants were initially recruited and consented to taking part in the study. Seventy-four participants (n = 3 non-participation) completed all trials. Three participants did not complete all the trials because of lack of time (n = 2) and family bereavement (n = 1).
Table 2 shows the mean (SD) for maximal oxygen consumption and resting oxygen consumption as measured using an online gas analysis system, chest strap ECG resting heart rate, 7-day RHR recorded via the Apple Watch, 5-min MVPA app reading taken during the resting VO2 protocol, recorded HRmax taken from the VO2max protocol, and age-predicted HRmax using the Gellish formula. 35
Mean (SD) maximal oxygen consumption, resting oxygen consumption, resting heart rate (RHR) across three different methods, and maximal heart rate directly measured and estimated.
Figure 1 illustrates the posterior distributions for resting heart rate from the three protocols used to record resting heart rate. Specifically, the mean effect (ME) and credible interval (CI) for each posterior distribution were: 7-day protocol ME: 60 CI: 58 to 62 beats.min−1, 5-min protocol ME: 63 CI: 61 to 65 beats.min−1, and 30-minute protocol ME: 63 CI: 61 to 65 beats.min−1.

Posterior distributions for the measurement of resting heart rate using 30-minute, 5-minute supine laboratory measurements, and 7-day mean from the Apple watch.
Figure 2 illustrates the maximum heart rate protocols from the two predictive methods, Gelliish et al. 35 and 220 – age, compared to criterion HRmax measured during the maximal oxygen consumption test. Specifically, the mean effect (ME) and credible interval (CI) for each posterior distribution were: Gellish ME: 185 CI: 182 to 187 beats.min−1, 220 – age ME: 189 CI: 187 to 191 beats.min−1 and maximal oxygen consumption test ME: 186 CI: 183 to 188 beats.min−1.

Posterior distributions for measured HRmax and predicted HRmax from the Gellish (206.9-[0.67*age]) and traditional age-prediction (220-age) formulae.
Mean (SD) %HRR, %VO2R and treadmill speed required to advance the Apple Watch green ‘exercise’ ring by five minutes and record three minutes of MPA on the MVPA app (reported as the inflection point) are presented in Table 3.
Mean (SD) %VO2R, %HRR and treadmill speed at the inflection point for the native Apple watch activity app and MVPA app.
Mean (SD) %HRR, %VO2R, treadmill speed and cadence required to meet 3 METS (absolute moderate intensity) are presented in Table 4 (METs calculated using the hybrid method, where measured oxygen consumption is divided by 3.5 mL.kg−1.min−1).
Mean (SD) %VO2R, %HRR, treadmill speed and cadence at 3 METs.
Table 5 displays the posterior distribution mean effect and credible intervals for the 3 MET inflection point for %HRR, %VO2R, treadmill speed and cadence.
Mean effect (95% CI) for the 3 MET inflection point for %HRR, %VO2R, treadmill speed and cadence.
Figure 3 illustrates the oxygen consumption reserve (%VO2R) at the inflection point for the native Apple Watch Activity app whereby all five minutes had been recorded as ‘exercise’ and the bespoke MVPA app had recorded a minimum of three minutes of the 5-minute bout as criterion moderate intensity. Specifically, the mean effect (ME) and credible interval (CI) for the native Apple Watch activity app were: ME: 33% CI: 31% to 36% VO2R, and for the bespoke MVPA app: ME: 43% CI: 40% to 44% VO2R. The Bayesian R2 value for the bespoke MVPA app as a data-based estimate of the proportion of variance at 40% VO2R was 0.75. For the native Apple Watch Activity app Bayesian R2 was 0.066. The native Apple Watch Activity app recorded 0% of participants above 40% VO2R when the native Apple Watch activity app's green exercise ring recorded minutes of ‘exercise’, compared to 95% of participants being above 40% VO2R when using our ‘MVPA’ app Figure 4 illustrates the %HRR at the inflection point for the native Apple Watch Activity app whereby all five minutes had been recorded as ‘exercise’ and the bespoke MVPA app had recorded a minimum of three minutes of the 5-minute bout as criterion moderate intensity. Specifically, the mean effect (ME) and credible interval (CI) for the native Apple Watch activity app were: ME: 33% CI: 31% to 35% HRR, and the bespoke MVPA app: ME: 44% CI: 43% to 45% HRR. The Bayesian R2 value for the bespoke MVPA app as a data-based estimate of the proportion of variance at 40% HRR was 0.64. For the native Apple Watch activity app Bayesian R2 was 0.015

Posterior distributions for oxygen consumption reserve (%VO2R) at the inflection point for the native Apple watch activity app and bespoke MVPA app.

Posterior distributions for %HRR at the inflection point for the native Apple watch activity app and bespoke MVPA app. Heart rate is from the ECG chest strap.
Discussion
The aim of this study was to examine whether %HRR more accurately measures criterion ‘moderate’ exercise intensity (40-59% HRR) compared to the native Apple Watch ‘Activity’ app when compared to a criterion measure, % oxygen consumption reserve (%VO2R). The main finding of this study is that using %HRR more accurately measures moderate intensity than the Apple Watch Activity app (green ‘exercise’ ring) when compared to the %VO2R criterion. The mean (95%CI) %HRR for the bespoke MVPA app and native Apple Watch Activity app were 43% (40% to 44%) and 33% (31 to 36%), respectively. These data show that at the point both apps (bespoke MVPA and native Activity) indicated that participants were exercising at moderate intensity, only the bespoke MVPA app achieved this criterion, with the native Apple Watch Activity app intensity being well below the 40% HRR threshold. Moreover, the entire posterior distribution for the native Activity app was outside the pre-registered ROPE, and the 95% credible interval for the bespoke MVPA app of 40% to 44% suggests that our estimations of these population effects are very precise. We can also describe the size of effect by the fact that when using the native ‘Activity’ app, 0% of participants were above 40% VO2R when the Activity app's green ‘exercise’ ring advanced, compared to 95% of participants being above 40% VO2R when using the bespoke MVPA app.
Given these results, %HRR may be the most accurate and accessible individualised measure of exercise intensity that can be incorporated into a wrist-worn wearable device. The popularity of wearable devices, 21 and the ability to scale individualised measurement of moderate (and vigorous) exercise intensity at a population level, should not be underestimated. The posterior distributions comparing the bespoke MVPA app with the Apple Watch native Activity app highlight that using %HRR is likely to measure physical activity intensity of moderate or above (40% VO2R) more accurately compared to the Apple Watch native Activity app, which was unlikely to measure intensity accurately compared to criterion intensity %VO2R. These findings are in line with previous investigations,15,17,26 highlighting the need for individualisation of exercise monitoring using relative intensity, as physical activity intensity is influenced by cardiorespiratory fitness status. For example, in the current study the minimum treadmill speed required to elicit moderate physical activity (MPA) for an individual (40% HRR) by the bespoke MVPA app was 3.5 km.h−1 while the fastest treadmill speed required to reach MPA for an individual was 8.0 km.h−1. This highlights the individualised nature of the intensity spectrum and how participants’ cardiorespiratory fitness plays an integral role in the speed required to elicit MPA. Wearable technology manufacturers may wish to consider incorporating relative measures of physical activity intensity into their exercise monitoring.
Accurate measurement of RHR is important if relative measures of intensity are to be used, and trusted, in wearable devices. Although a 12-lead ECG measurement of RHR is seen as the gold standard, 36 there are obvious practical implications for using this method outside of laboratory conditions. The resting heart rate data measured in the current study demonstrate that, in comparison to the criterion 30-minute supine recorded RHR, the 5-minute protocol embedded in the bespoke MVPA app overestimates mean RHR by ∼1 beats.min−1, while the mean 7-day Apple Watch estimate seems to underestimate RHR by ∼3 beats.min−1. These results have implications for real world app useability and subsequent physical activity monitoring via wearable technology. First, it is unlikely that wearable device users would be willing to maintain a supine position for a period (either 30 minutes or 5 minutes) while heart rate is recorded. Second, although the underestimation of RHR using the 7-day Apple Watch data may result in slightly different %HRR measurements, this effect might be trivial in practice. For example, the data displayed in Figure 4 showing the posterior distribution of %HRR for the bespoke MVPA app are based on the use of the 7-day RHR data from the Apple Watch, and yet this distribution matches quite closely to that displayed for %VO2R (Figure 3). Moreover, the difference in HRR at 40% between a RHR of 60 and 63 for a person with a HRmax of 200 is 2 beats.min−1. So, the use of background RHR as recorded automatically by the Apple Watch might represent a ‘sweet spot’ balance between convenience and accuracy.
While to the best of the authors’ knowledge there are no current studies investigating the validity of recording RHR via the Apple Watch compared to ECG at rest, a recent investigation 37 recorded resting heart rate via an Apple Watch and ECG during 3 minutes of seated rest prior to a maximal oxygen consumption protocol. The mean absolute percentage error between Apple Watch and ECG was 1.7%. The Apple Watch's useability in measuring health and physical activity may have a wider impact than the prescription of exercise intensity alone. For example, Greiwe and Nyenhuis 38 suggest that telemonitoring of health measures (such as RHR) may transform the way health is measured and diagnosed. Never-the-less, additional research into the validity and accuracy of resting heart rate monitoring with wearable devices might be required, especially if wearable device manufacturers start to incorporate relative measures of exercise intensity that are dependent on these values.
The posterior distributions of HRmax (Figure 2) indicate that the Gellish formula 35 was the most accurate at predicting HRmax (∼ 1 beats.min−1 difference) while the 220 - age calculation overestimated HRmax (∼ 4 beats.min−1 difference) compared to the criterion measured during a maximal oxygen consumption test. These findings are in line with previous studies39–41 reporting that 220 - age did not accurately predict HRmax. In order to prescribe and monitor exercise intensity more accurately, an upper (HRmax) and lower (RHR) bound of heart rate is required to quantify intensity thresholds using %HRR.42,43 These findings have important implications in a real-world setting. The likelihood of individuals performing maximal oxygen consumption testing is very low, and therefore predictive measures are often used to calculate maximal values. 44 Based on the findings of this study, those predictions should use the Gellish et al. 35 formula as it aligns more closely with the HRmax recorded during the maximal oxygen consumption test performed in this study. Using other predictions may lead to inaccurate exercise intensity prescription.
Cardiorespiratory fitness plays an integral role in the walking cadence required to meet a relative measure of moderate intensity. 15 When physical activity is monitored relatively via a physiological measure, such as %HRR, and thus individualised via intensity thresholds, the need for a walking cadence prescription is only useful when walking is performed in short bursts. In such a circumstance, the delay in heart rate response does not allow for physiological mechanisms to be measured quickly enough. Although previous studies and public health guidelines have monitored/promoted physical activity using absolute measures such as step count (typically 10,000 steps per day, or 3,000 steps in 30 minutes),12,45,46 using a relative measure of physical activity intensity as shown in the current study is a more accurate measure of physical activity intensity. The growing popularity of wearable devices now makes this individualised approach a reality, and at scale. Ultimately, the use of individualised relative intensity measures for physical activity monitoring and prescription has potential implications for mortality and morbidity risks, improving people's quality of life and subsequently reducing pressures on health services. As such, a move toward greater emphasis on individualised physical activity intensity monitoring and prescription should be incorporated into public health guidelines.
Measuring physical activity accurately on a population scale is complex. However, based on the findings from the current study, the Apple Watch would overestimate MVPA when compared to measuring MVPA using the criterion %VO2R (or %HRR). Mean treadmill speed at which the Apple Watch registered physical activity on the green ‘exercise’ ring was 0.8 km.h−1 (5.4 km.h−1 compared to 6.2 km.h−1) slower than the bespoke MVPA app. Currently, the Apple Watch records physical activity via an accumulation of accelerometery data and intermittent heart rate data. 47 This is likely configured to prolong battery life of the wearable device, as using the PPG sensor continually requires additional battery power. Therefore, as a relative measure of intensity is not used regularly, the Apple Watch is likely overestimating physical activity, because it measures physical activity mostly via an absolute accelerometery measurement. Additionally, the point at which 3 METs was reached (in this case METs were calculated via oxygen consumption divided by a resting oxygen consumption value of 3.5 mL.kg−1.min−1), also indicates a substantial overestimation of physical activity intensity when compared to criterion %VO2R (or %HRR). At 3 METs, %HRR mean (SD) was 25 (8)% and %VO2R was 23 (8)% substantially under the 40% moderate intensity boundary. Additionally, cadence was 104 (9) steps.min−1 which may indicate why 100 steps.min−1 has been advocated as the walking cadence required to elicit moderate intensity.11,48 However, the findings of the current study support that using %HRR, which takes into consideration two measures of physiological capacity (resting HR and maximal HR), is likely to measure physical activity intensity more accurately on an individual basis. Given the potentially large overestimation of MVPA throughout the day when using the Apple Watch Activity app (green ‘exercise’ ring), a greater degree of personalisation is required. The need for personalisation of exercise intensity thresholds requires either a more continual use of background heart rate measurements by the Apple Watch during everyday ambulation, or a greater personalisation of data (feedback) given to users to ensure walking cadence meets relative moderate intensity thresholds. Currently the Apple Watch has heart rate zones for physical activity displayed in the Workout app 49 that are based on %HRR. The default zone 1 begins at 60% HRR, or vigorous intensity as defined by the ACSM. 7 These zones, and the monitoring of exercise via %HRR, are not used to quantify exercise on the native Apple Watch Activity app during everyday ambulation, or when an exercise mode is selected in the Workout app. Although heart rate zones can be modified manually, it is unlikely that Apple Watch users will change the zones to replicate ACSM guidelines. 7 Starting the first heart rate zone at vigorous intensity is unlikely to increase adherence to exercise. 50 The default Apple Watch heart rate zones would benefit from being aligned to the moderate and vigorous exercise intensity guidelines recommended by the ACSM, 7 by setting zone 1 (moderate intensity) as 40-59% HRR and zone 2 (vigorous intensity) as 60-90% HRR.
Study limitations
This study has focussed on using %HRR that can be measured by wearable devices and is therefore accessible to the general population. Nevertheless, there are limitations in using this method that are worth noting. While %HRR is individualised to an extent, the method uses fixed percentage boundaries to define intensity categories (e.g., moderate intensity is defined at 40–59%HRR). Physiological markers such as lactate or ventilatory thresholds may offer a more individualised and subsequent better method of monitoring physical activity intensity as they reflect a real physiological threshold for each individual. 51 Additionally, ventilatory thresholds have been reported to better anchor physical activity intensity as the metabolic stimulus is better normalised across people with varying fitness levels. 52 However, at the present time wearable devices do not have the ability to measure such physiological thresholds, and future research in this field may be dictated by wearable device manufacturers making these measurements available. Therefore, %HRR currently offers the most accurate relative measure available for use at a population level. Never-the-less, future research may wish to identify how accurate the prescription of training zones offered on wearable devices such as the Apple Watch correspond to physiological phenomena like ventilatory thresholds.
A potential limitation of the current study is that the Apple Watches used were Series 5, with the latest Watches now available being Series 10. Although there is no public information available on whether, or how, the Apple Watch has changed how it measures ‘exercise’ via the green ring, it is possible that some changes have been made to improve the accuracy of the Watch for measuring MVPA. One of the problems that all research studies examining consumer technology suffer from is the rapid advancement of technology, and the inevitable difference between the technology used in the study for data collection and that available to the consumer when the study is published.
Conclusion
The bespoke MVPA app measured relative moderate intensity more accurately compared to the native Apple Watch Activity app (green ‘exercise’ ring) when assessed against the criterion %VO2R measure. Exercise guidelines and wearable devices like the Apple Watch should incorporate relative measures of physical activity to individualise physical activity monitoring and prescription more accurately, and with a concomitant move away from arbitrary absolute values that apply a single exercise goal to the mass population, such as 100 steps.min−1 walking cadence. A shift towards individualised physical activity intensity prescription is available via wearable technology, and incorporating individualised prescription has the potential to improve population health.
