Abstract
Objective:
Echo intensity measurements are highly influenced by ultrasound system and parameters used for measurement, making comparisons of results obtained from different ultrasound machines difficult. Therefore, it is necessary to understand how reliability changes when using different ultrasound systems and parameters.
Materials and Methods:
ALOKA SSD4000 and GE LOGIQ P6 systems were used to compare rectus femoris echo intensity in 16 healthy young subjects (eight women) using different depths (D), gains (G), and frequencies (F). The following settings were adopted: ALOKA1 (D6/G30/F7.5), ALOKA2 (D6/G45/F7.5), ALOKA3 (D6/G30/F10), LOGIQ1 (D6/G50/F15), LOGIQ2 (D6/G0/F15), LOGIQ3 (D6/G0/F10), and LOGIQ4 (D6/G30/F10). Intraclass correlation coefficient, standard error of the measure, minimum difference, and Bland-Altman tests were performed to calculate reliability and agreement between systems’ settings.
Results:
ALOKA1 × LOGIQ1, ALOKA1 × LOGIQ4, and ALOKA3 × LOGIQ1 showed moderate to high ICCs and agreement on the Bland-Altman test.
Conclusion:
Echo intensity varies between systems and parameters, but reliability can be increased by adjusting the ultrasound settings.
Ultrasound (US) measurements of muscle echo intensity (EI) have been widely used in the literature as a measurement of muscle quality.1,2 EI values have been used to evaluate the effects of aging,3–6 neuromuscular diseases,7–9 strength training adaptations,10,11 and exercise-induced muscle damage.12–15 This technique allows us to quantify an image’s shades of gray, which in turn allows the identification of the presence of connective tissue, intramuscular fat, or edema, which makes the image whiter (i.e. lower muscle quality), or healthy muscle tissue, which makes the image darker (i.e. higher muscle quality). 1
Significant positive associations have been found in different studies between EI values and isometric peak torque, rate of force development, functional tasks such as 30-second sit-to-stand and preferred gait speed in the elderly,2,16 sprinting performance in healthy young individuals, 17 stair-climbing performance in professional firefighters, 18 and agility in middle school boys. 19 On the other hand, negative associations have been reported between EI values and exercise capacity in patients with heart failure. 20 This is evidence that EI can be used as an important noninvasive tool to study functionality and performance.
Changes in parameters such as depth, gain, and frequency can be used to generate excellent quality measurements. However, changing these parameters can affect the image’s shades of gray, thereby affecting the EI values and muscle quality results. Depth (also known as range), gain (time gain compensation [TGC]), and frequency (rate of ultrasound propagation) play a crucial role in the image’s shades of gray and consequently in muscle EI values.
For a measurement to be reliable, it is essential to know which factors affect it. This allows for controlling intervening variables and ascertaining whether changes in the participant’s results are truly due to an adaptation and not to changes in the US system parameters. Reliability studies have been conducted to quantify the influence of moments,21,22 raters,23,24 transducer tilt, 25 muscle site, 24 image analysis technique,26,27 and analysts24,28 on the results, finding that reliability of EI measurements are high to very high. However, few studies have evaluated the influence of using different US systems on muscle EI values. Pillen et al. 29 used a phantom (a combination of pig muscles with known shades of gray) and seven healthy subjects to elaborate a conversion equation. This allowed them to obtain EI normal values with the Sonos 2000 Phased Array Imaging System (Hewlett-Packard, Andover, MA) and apply them to measurements made by the Philips IU22 (Philips Healthcare, Eindhoven, The Netherlands). The results helped to improve this technique in clinical practice. Meanwhile, O’Brien et al. 30 have compared EI measurements between four different machine/transducer combinations (two machines and two transducers). The US machines used did not have post-processing filters and did not allow changes in the parameters. They found a high reliability in their comparisons; however, the two systems used were of identical models. No study was found in the literature focused on determining the EI reliability, based on measurements, obtained from two different US systems.
Because US is a more accessible imaging tool in clinical practice than magnetic resonance imaging (MRI) and computed tomography, it is in increasing demand, resulting is an increasing variety of US systems available on the market. Thus, clinical studies conducted in different locations and conducted with different US systems. Normally, commercial US systems have built-in postprocessing filters, with the goal of enhancing image resolution and identifying individual structures. However, these filters alter the images’ shades of gray and can affect the EI results.
Because published studies often discuss their results while comparing the values found with those available in the literature,10,16 it is important to identify the influence of using different US systems when determining the muscle’s EI. Therefore, the aim of the present study was to verify the reliability and the agreement of rectus femoris (RF) muscle EI values calculated from images of a similar muscle location, using two different US systems.
Materials and Methods
This study was conducted according the Declaration of Helsinki, and the local Institutional Research Ethics Committee approved all procedures (project number 708.362). All participants were informed of the benefits and risks prior to signing an institutionally approved consent document for study participation.
Sample size was determined using mean and standard deviation data from a study that evaluated the same variable in a similar population. 22 Sixteen healthy young subjects (eight men and eight women; age, 27.56 ± 1.71 years; body mass index [BMI], 23.64 ± 2.79 kg/m2) volunteered to participate in the study. All US images were obtained on the same day while subjects were laying down on a stretcher in a supine position with their knees fully extended. Prior to assessment, volunteers rested for a period of 5 to 10 minutes in order to reestablish body fluids. 31 Three RF images were obtained from each thigh by the same experienced rater. The transducer from each system was positioned at the same position, at 50% of the thigh length, transversely to the muscle fibers. 1 All images were analyzed on ImageJ software (Version 1.43u; National Institutes of Health, Bethesda, MD) by the same experienced analyst, using a rectangular region of interest that included as much of the muscle as possible.26,27 Images were obtained using both ALOKA (SSD4000; Aloka, Tokyo, Japan) and LOGIQ (P6; General Electric, Milwaukee, WI) US systems, with all their respective settings. The order in which each system and setting was used was randomized.
ALOKA and LOGIQ US systems were compared using a different combinations of depths (D), gains (G), and frequencies (F). The values were chosen based on a previous qualitative analysis of the images. The values for G were chosen by using the minimum values for both systems (30 for ALOKA and 0 for LOGIQ), plus one intermediate value for ALOKA (45) and two for LOGIQ (30 and 50). The F values chosen for ALOKA were the minimum and maximum available (7.5 and 10), while for LOGIQ, the maximum value and a common value between US systems were chosen (15 and 10, respectively). Depth was set at 6 for both systems. Therefore, for ALOKA, three different settings were adopted: ALOKA1 (D6/G30/F7.5), ALOKA2 (D6/G45/F7.5), and ALOKA3 (D6/G30/F10). For the LOGIQ, four different settings were adopted: LOGIQ1 (D6/G50/F15), LOGIQ2 (D6/G0/F15), LOGIQ3 (D6/G0/F10), and LOGIQ4 (D6/G30/F10) (Figure 1).

Images obtained from the same position of the rectus femoris (RF) muscle from a representative subject with both systems and their respective settings: (a) ALOKA1 (D6/G30/F7.5), (b) ALOKA2 (D6/G45/F7.5), (c) ALOKA3 (D6/G30/F10), (d) LOGIQ1 (D6/G50/F15), (e) LOGIQ2 (D6/G0/F15), (f) LOGIQ3 (D6/G0/F10), and (g) LOGIQ4 (D6/G30/F10).
All ALOKA settings were compared with all LOGIQ settings. An intraclass correlation coefficient (ICC), its 95% confidence interval (CI), standard error of the measure (SEM), and minimum difference (MD) were calculated. This was done to quantify reliability between combinations, reliability between the three images obtained (intrarater reliability), and reliability between images analyzed twice by the same analyst for ten of the study’s participants (intra-analyst reliability). For ICC classification, the following criteria were adopted: no correlation (
Results
Table 1 shows mean and standard deviations for the measures obtained by each US setting. Intrarater reliability was very high (ICC = 0.999; 95% CI, 0.998–0.999; SEM = 0.99 a.u.; MD = 2.76 a.u.). Intra-analyst reliability was also very high (ICC = 0.999; 95% CI, 0.999–0.999; SEM = 0.92 a.u.; MD = 2.55 a.u.). Table 2 shows the reliability results from all the comparisons. ALOKA1 × LOGIQ4 and ALOKA3 × LOGIQ1 comparisons showed a high correlation (
Mean and Standard Deviation Values for Echo Intensity Analysis Using Different Settings in Both Limbs.
Reliability of the Rectus Femoris Echo Intensity Measurements Obtained From the Two Ultrasonic Systems Using Their Respective Settings.
Reliability results are expressed by the intraclass correlation coefficient (ICC), 95% confidence interval (CI), standard error of the measure (SEM), minimum difference (MD), and
Bland-Altman plot was used for each of the aforementioned comparisons (ALOKA1 × LOGIQ1, ALOKA1 × LOGIQ4, ALOKA3 × LOGIQ1, and ALOKA3 × LOGIQ4) (Figure 2). Bland-Altman plots for ALOKA1 × LOGIQ1, ALOKA1 × LOGIQ4, and ALOKA3 × LOGIQ1 revealed a relatively homogeneous dispersion of EI values within the limits of agreement; however, in the ALOKA3 × LOGIQ4, there was a slight positive trend, demonstrating that, for higher EI values (above 40 a.u.), the differences between systems were greater, likely due to higher values obtained from ALOKA.

Bland-Altman analysis showing the echo intensity agreement of the rectus femoris muscle between the two ultrasonic systems. Dashed lines represent 95% upper and lower limits of agreement (±1.96 SD) and mean difference. Continuous line represents a reference for zero mean difference.
Discussion
The results of the present study demonstrate that EI values are different between US systems, being greatly influenced by settings such as frequency and gain. Therefore, absolute values obtained from different studies should not be compared without establishing normal values for each system, even if the reported parameters for depth, gain, and frequency are similar. It would be advised that results should be reported as a percentage of these values.
By varying the settings of both US systems and comparing results among them, the current study verifies that simply pairing the settings was not enough to make the EI measurements equal between the systems. From the four comparisons that showed a moderate or high correlation, only one had the same settings in both US systems (ALOKA3 × LOGIQ4). However, for the same comparison, the Bland-Altman plot did not show agreement between the EI measurements, while all three other comparisons with different parameters did.
A few factors can explain the differences observed in the present study. Each US system has a built-in postprocessing filter that improves image quality, but this limits the uniform shades of gray depicted. Furthermore, the gain adjustment in each system is not the same for each equipment, where, for example, 30% gain in one system does not necessarily correspond to 30% in other systems.
Some strategies can be used to improve the reliability between images obtained in different US systems, making them comparable. Pillen et al. 29 used a phantom to obtain a conversion equation that allowed for a reliable use in children. O’Brien et al. 30 used a US system without any postprocessing and were able to use the technique reliably. In the present study, researchers experimented with different settings’ combinations, finding combinations in which EI reliability was moderate or high. This allowed the researchers to find a combination of settings that could be used with a satisfactory confidence and without the use of a phantom or of a specific system that possesses no post-processing. The use of a phantom typically adds extra work and a specific system that possesses no post-processing must be specifically designed, as it is not typically found on the market. Using this approach of comparing selected combinations of parameters and finding the most similar ones, hospitals and laboratories that work with distinct US systems can compare their data, if necessary. This could greatly improve their efficiency, resulting in benefits to the patient population.
The EI values obtained did not seem to be influenced either by the rater or by the analyst, as indicated by the very high reliability scores found. However, some limitations of the present study should be acknowledged. The values obtained for different systems and parameters presented large standard deviations, which may have negatively affected the accuracy of the comparisons. The current study has a limited sample of healthy young subjects; therefore, extrapolation of these results should be approached with caution. Finally, the frequencies and gains chosen were limited to just a few for each system, and those numbers were chosen arbitrarily, and settings with different values than ours may help to introduce different combination pairs with higher reliability.
In conclusion, this study analyzed images obtained from two different and popular US systems, using multiple combinations of depth, gain, and frequency. The results indicate that the absolute values obtained from these systems cannot be compared even when the parameters are identical in both systems. However, it was also discovered that there are possible combinations of parameters that render the measurements obtained by the different systems more similar to each other. Comparisons between two US systems, however, should be approached with caution, since even the comparison that obtained the best reliability scores did not present perfect agreement between systems. Further studies should seek to replicate these findings using different US systems and different parameters. This would provide clinicians and researchers with more options to employ different systems when necessary. Another aspect that could greatly benefit clinicians is a possible collaboration between manufacturers. The goal of such a collaboration would be mitigating the differences in images obtained using different US systems.
