Abstract
Keywords
Most psychometric tests and questionnaires are currently administered with a Likert format in which respondents mark their position along a discrete range with somewhere between four and seven levels of gradation. Depending on the item content, this gradation may represent levels of agreement with a statement, levels of severity of some symptom, or temporal frequency of occurrence of that symptom. Such response levels are regarded as ordinal steps along a presumed continuous dimension and the Likert format is generally preferred over a simpler yes–no format that forces respondents to express only all–or–none positions. Thus, switching from just two (i.e., yes–no) to more response alternatives per item is regarded as allowing for finer granularity that, in turn, might result in more precise measures or more ability to discriminate respondents from one another (see, for example, Alan & Kabasakal, 2020; Donnellan & Rakhshani, 2023; Maydeu-Olivares et al., 2009; Müssig et al., 2022; Shi et al., 2021). In the limit, a continuous response format can be implemented via a visual analog scale (VAS) on which respondents mark any position along a continuous line, limited only by the resolution with which the mark can be made and subsequently read off. This should provide the largest possible precision on the assumption that respondents can actually use this response format consistently and in full, which does not seem to be the case (see, for example, Furukawa et al., 2021; Gideon et al., 2017; Krosnick, 1991; Preston & Colman, 2000).
Digital technology allows administering tests with VAS or Likert formats equally easily (see, for example, Kinley, 2022; Reips & Funke, 2008) and some studies have investigated the classical psychometric properties (i.e., reliability, validity, factor structure, etc.) of the same tests administered under each response format (see, for example, Kuhlmann et al., 2017; Preston & Colman, 2000; Simms et al., 2019). Generally, these studies have not found any evidence that a potentially limited ability of respondents to use the VAS format in full has deleterious effects on those properties but, at the same time, the continuous VAS format does not seem to improve those properties beyond what a discrete Likert format provides.
Then, in principle, psychometric instruments administered with Likert or VAS formats might be considered equivalent as regards their global properties, but it is not immediately obvious that this is the result of analogous response processes on the part of the respondents. Justification for a VAS format arises from consideration that respondents may feel that their position lies somewhere between any two consecutive discrete landmarks in a Likert scale. Then, because the VAS format allows respondents to express their position with higher resolution, a seemingly reasonable surmise is that a Likert response is simply a quantization of the corresponding VAS response. Thus, one may surmise (a) that the VAS continuum is partitioned into a discrete number of exhaustive and mutually exclusive intervals and (b) that the Likert response is that associated with the interval on which the VAS response would had fallen. Analyzing VAS and Likert data produced by a sample of respondents in consecutive administrations of the same test with each format would reveal whether or not this is the case and it could additionally illustrate how the VAS continuum is partitioned, with direct attention to the issue of whether or not the partition involves intervals of the same width within and across items.
Several studies have addressed in various forms the question of whether the steps in a Likert format involve jumps of the same magnitude. For instance, Knutsson et al. (2010) had respondents indicate the perceived magnitude with which they felt that typical category labels on a Likert item express severity of symptoms, frequency of symptoms, or level of agreement. Their results indicated that progressive grading via labels does not map onto constant increases in perceived magnitude. Toland et al. (2021) administered a four-item instrument with the VAS format and subsequently discretized the responses into 10, five, or four discrete categories by partitioning the VAS continuum into intervals of equal size. They argued that if the equal-interval assumption held for the VAS scale, analysis of the discretized responses via polytomous item response theory (IRT) models would render category location parameters that are equispaced. Their results indicated otherwise and they rejected the equal-interval assumption. An analogous approach based on checking for equidistant category threshold parameters in polytomous IRT accounts of Likert data was used by Sideridis et al. (2023) and they concluded that their results ruled out the equal-interval assumption.
The work described in this article addressed this question differently and using publicly available data collected by Kuhlmann et al. (2017) in a study involving dual administration of three personality scales, once with a VAS format and once with a 5-point Likert format. IRT allows obtaining separate characterizations for Likert data via the graded response model (GRM; Samejima, 1969) and for VAS data under the continuous response model (CRM; Samejima, 1973). A comparison of both IRT characterizations reveals the similarities and differences between VAS and Likert response processes, including a quantitative description of how Likert responses may arise from partitioning the VAS continuum into a number of intervals whose boundaries are implicitly indicated by GRM and CRM item parameter estimates. Estimates of these boundaries allow a direct test of the equal-interval assumption.
Theoretical justification for the specific form of our analysis is given in the next section, which shows that CRM item parameters can be transformed into equivalent GRM item parameters under arbitrary assumptions about the discretization process by which continuous (VAS) responses map onto discrete (Likert) responses. This theoretical analysis also indicates how CRM and GRM item parameter estimates should be used to estimate the boundaries of the discretization intervals by which VAS responses are mapped onto Likert responses on each individual item. A simulation study then checks out the accuracy with which these relations hold in finite samples, also looking at the corresponding item and test information functions and at the accuracy of estimation of respondents’ trait levels under each response format. Finally, an analogous analysis conducted on Kuhlmann et al.’s (2017) data reveals characteristics that are compatible with the theoretical and simulation results presented earlier, indicating that VAS and Likert response processes share common features that are not immediately apparent in a simple comparison of raw VAS and Likert data. Estimates of discretization boundaries varied greatly across items and they did not support the equal-interval assumption. In addition, this analysis did not find any sign that a VAS format actually improves the accuracy with which respondents’ trait levels can be estimated. Implications of these results for empirical practice are finally discussed, with emphasis on the limited advisability of the VAS response format as a replacement for the conventional Likert response format.
Formal Relation Between CRM and GRM
In Samejima’s (1969) normal-ogive graded response model (NO-GRM) for a Likert item with
where
Note that the (discrete) response
Samejima (1973, 2005) defined the CRM to be the limiting case of the NO-GRM as the number of categories approaches infinity. Then, the expression in Equation 2 invariably yields
where

Conditional Distribution of the Continuous VAS Response Y at Several Trait Levels θ (From −3 to 3 in Steps of 0.5). The Red Sigmoid on the Bottom Plane Describes the Expected Item Score as a Function of θ, that is, How the Expected Value of Y Varies with Trait Level. (A) Item Parameters are a = 2, b = 0, and α = 1. (B) Item Parameters are a = 4, b = 0.5, and α = 1.8
Although NO-GRM and CRM parameterizations appear incommensurate at first glance, a simple relation between them can be formally derived under the assumption that Likert responses arise by quantization of the continuous response dimension. This assumption is often made when VAS responses are discretized into
with
Note that

Transformation of Continuous VAS Responses Into Discrete Likert Responses Over
Three aspects of this formal equivalence are worth pointing out. First, the number and locations of the boundaries
for 1 ≤
The formal relation in Equation 4 allows expressing the operational characteristics of the same item under both the CRM and the NO-GRM. This, in turn, permits a theoretical comparison of the potential of the two response formats in practical applications, particularly as regards the accuracy of estimation of respondents’ trait levels. In IRT, this is captured by the item information function (IIF). Under the CRM, Samejima (1973) showed that
that is, the IIF is constant and independent of θ. In contrast, the IIF under the NO-GRM is
with item parameters from Equations 5 and 6, and where

Item Information Functions for the Items in Figures 1A (Left Panel) and 1B (Right Panel) Under the Continuous Response Format (Green Horizontal Lines), the Likert Response Format After Discretization With (
Simulation Study
This proof-of-concept simulation assesses the feasibility of a quest for common processes underlying VAS responding and Likert responding in finite samples. Initial data sets were generated by simulating continuous responses to CRM items; Likert responses were created either by discretization of continuous responses with arbitrary boundaries or by independent simulation of responses to equivalent NO-GRM items. Estimates of item parameters and respondents’ trait levels were obtained from CRM and NO-GRM data sets and compared according to the theoretical relations presented earlier. Results are presented in detail for the main (and representative) simulation run; results for other runs are summarized at the end of this section.
Data Generation
The simulation run described here in detail involved a 15-item test and 10,000 respondents. True trait levels were randomly drawn from a unit-normal distribution. For direct comparison with estimates of trait levels (see below) the resultant values were rescaled to zero mean and unit standard deviation. True CRM item parameters were randomly drawn from uniform distributions in the range [1.5, 2.5] for
Continuous responses
and finally rounded to three decimal places.
Likert responses
Other simulation runs were analogous and involved different numbers of items, different numbers of respondents, and different discretization boundaries that were fixed for all items or that varied randomly across items.
Parameter Estimation
CRM parameters were estimated from VAS data with the R package EstCRM (Zopluoglu, 2012) using default options. This package returns item parameter estimates plus estimates of respondents’ trait levels scaled to have zero mean and unit standard deviation. Maximum-likelihood estimates of NO-GRM parameters from Likert data were obtained with custom software written to estimate item parameters under this model and to return estimates of respondents’ trait levels scaled to have zero mean and unit standard deviation.
Results
CRM Data
Figure 4 shows scatter plots of estimated against true CRM item and respondent parameters from the simulation involving 15 items and 10,000 respondents. All data points lie almost on the diagonal identity line in each panel, particularly for item parameters (Figure 4A, 4B, and 4C), and reporting measures of agreement (e.g., root mean square error (RMSE)) seems unnecessary. Of particular interest is the fact that trait estimates are no less accurate at extreme true trait levels than they are at centrally located true trait levels (Figure 4D). Simulation runs in which the numbers of respondents were smaller resulted in less accurate item parameter estimates; simulation runs in which the numbers of items were smaller resulted in less accurate estimates of trait levels.

Scatter Plots of Estimated CRM Parameters Against True CRM Parameters for Continuous Data in the Simulation Involving 15 Items and 10,000 Respondents. Each Symbol in Panels (A)–(C) Stands for an Individual Item. Each Symbol in Panel (D) Stands for an Individual Respondent. A Diagonal Identity Line is Plotted for Reference in Each Panel
Deterministic NO-GRM Data
Deterministic conversion of continuous data into Likert data only results in a lower resolution of responses without altering the outcome of the random process by which each continuous response had been generated. In other words, this transformation of continuous data mimics what respondents would do if they reported the continuous response analyzed earlier alongside the interval
NO-GRM item parameter and trait level estimates obtained from discretized data are shown in Figure 5 against their counterparts. For item parameters, the counterparts are the NO-GRM item parameters

Scatter Plots of Estimated NO-GRM Parameters Against Predicted or True Parameters for Deterministic NO-GRM Data Obtained by Discretization of Continuous Data With (
Yet, it is not immediately obvious why true trait levels in the central range are no more accurately estimated from discretized data (Figure 5C) than they were from continuous data (Figure 4D). As shown in Figure 3, the IIFs of Likert items are higher in the central range than the IIFs of their continuous counterparts. This should hold analogously for the 15 items on this test and it should show in the test information function (TIF), which is the sum of the IIFs. Then, within the central range of trait levels, the standard error of estimation (SE), defined as the inverse of the square root of the TIF, should also be lower in the discrete case and, accordingly, trait estimates should be more accurate. Figure 6 shows the TIF and the SE for the continuous and discretized versions of this 15-item test. The TIF and SE for the continuous version (green horizontal lines) is computed from estimated CRM item parameters and they are indistinguishable from those computed from true parameters, given the estimation accuracy displayed in Figure 4A. The TIF and SE for the discretized version differ slightly when computed from estimated NO-GRM item parameters (red curves) or from true (i.e., predicted from true CRM item parameters via Equations 5 and 6) item parameters (blue curves), owing to the slight mismatch displayed in Figure 5A. Then, despite the clear superiority of the Likert version of the test as regards the TIF in the central range of trait levels (Figure 6A), the actual advantage in terms of expected estimation accuracy in the central range is virtually negligible (Figure 6B).

Test Information Functions (A) and Standard Error of Estimation (B) for the Continuous Version of the Test (Green Horizontal Lines) and for the Discretized Version of the Test Using Either Estimated NO-GRM Item Parameters (Red Curves) or True Item Parameters Predicted From True CRM Item Parameters Via Equations 5 and 6 (Blue Curves)
Disengaged NO-GRM Data
Disengaging Likert responses from continuous responses requires replacing the discretized data analyzed in the preceding section with actual Likert data generated by the NO-GRM response process for the same respondents on items whose true parameter values are derived by Equations 5 and 6 from those of their CRM counterparts. To illustrate the differences in approach and to appreciate their potential implications, Figure 7 shows the correspondence between continuous responses and Likert responses obtained in either form. By discretization (Figure 7A), Likert responses are deterministically placed in the category pertaining to the interval on which the original continuous response had fallen. By independent simulation of Likert responses to an item whose NO-GRM parameters are derived via Equations 5 and 6 from the CRM item parameters that produced the continuous data (Figure 7B), continuous responses that had fallen into any given interval may nevertheless be associated with Likert responses in any other category, contingent on the parameters of the NO-GRM item.

Correspondence Between Continuous Data and Likert Data Obtained in Two Ways. The Height of Each Individual Bar is Proportional to the Number of Respondents (Out of 10,000 in this Simulation) in Each Bin (0.01 Units in Width) Along the Continuous Dimension in Each Likert Category. (A) Likert Data Obtained by Discretizing Continuous Data with Boundaries (
In the isolated analysis of disengaged NO-GRM data thus generated, the differences illustrated in Figure 7 in relation to continuous data should be inconsequential. After all, Likert responses are simulated under the NO-GRM for items with known parameters (irrespective of the origin of these parameters) and, then, a comparison of estimated and true item parameters and respondents’ trait levels should indicate typical estimation accuracy for NO-GRM data. Figure 8 shows that this is the case and, in fact, data in each panel of Figure 8 are identical within sampling error to those in the corresponding panel of Figure 5. This is unsurprising on consideration that (a) data in Figure 5 conform to the model in Equation 4 (for discretization of continuous responses) whereas data in Figure 8 conform to the model in Equation 2 (for direct generation of Likert responses) but (b) both models are formally identical and the simulation produced data from matched sets of true parameters. Then, the (minor) differences between results in Figures 5 and 8 only reflect sampling variability. In addition, the slight overestimation of item discrimination parameters that was observed in Figure 5A is clearly not a consequence of discretization of continuous responses, given that it is identically observed in Figure 8A where no discretization was involved. Misestimation of discrimination parameters is ubiquitous in simulations involving graded responses to Likert items (see, for example, García-Pérez, 2017; García-Pérez et al., 2010; Kieftenbeld & Natesan, 2012).

Scatter Plots of Estimated NO-GRM Parameters Against True Parameters for NO-GRM Data Obtained by Independent Simulation of Likert Responses Under the Model of Equation 2, With True Item Parameters Determined via Equations 5 and 6 by the CRM Item Parameters Used to Simulate Continuous Responses, Which Are Not Otherwise Involved Here. Graphical Conventions as in Figure 5
The importance of this simulation strategy is that it allows investigating the possibility of identifying the boundaries
The conditions of this simulation ensure the validity of Equation 6 for true parameter values used in this simulation, but one would also expect this equation to be valid for parameter values estimated from data. The same holds regarding the validity of rearrangement of Equation 6 into Equation 7. Not surprisingly, Figure 9A shows that these relations hold for simulated data: Estimates of discretization boundaries obtained via Equation 7 are in good agreement with the true boundaries (

Scatter Plots of Estimated Discretization Boundaries (via Equation 7) Against True Discretization Boundaries
In sum, Equation 7 is useful to estimate discretization boundaries for each individual item when continuous and Likert responses are both collected for the same set of items in a within-subjects design.
Other Simulation Runs
The simulation whose results have been presented in detail was accompanied by other runs involving the exact same steps for the generation of continuous responses and deterministic versus disengaged generation of Likert responses. These alternative runs differed only as regards the number of respondents (5,000, 1,000, 600, and 300), the number of items (12 and 8), and the locations of discretization boundaries, which could be identical for all items or vary randomly across them. In the latter case, the random location of discretization boundary
The results were analogous in all cases except for the accuracy of parameter estimates, which naturally deteriorated as the size of the data sets (i.e., number of respondents and number of items) decreased. For the record, Section 1 of the Supplementary Material provides full graphical results (analogous to those in Figures 4–8 here) for the most extreme case involving 300 respondents, eight items, and discretization boundaries that vary randomly across items. Note that the recovery of discretization boundaries in this case is already illustrated here in Figure 9C.
Analysis of Empirical Data From Kuhlmann et al. (2017)
Description and Analysis of the Data
Kuhlmann et al. (2017) investigated measurement equivalence of VAS and Likert formats in a within-subjects design involving 879 respondents who took three personality scales under both formats: A Conscientiousness (C) scale with 12 items, an Excitement Seeking (ES) scale with eight items, and a Narcissism (N) scale with six items. Respondents included student attendants to a seminar who were additionally instructed to recruit a minimum of 20 other respondents from the general population in exchange for course credit. There is no reason to think that this recruitment procedure may provide peculiar data in comparison to what other recruitment procedures could have produced. Data for each personality scale were collected on a 5-point Likert scale and on a 101-unit VAS scale, with the order of administration of VAS and Likert response formats counterbalanced across respondents. An unrelated, filler scale involving a free response format was administered to separate the first and second administrations of the personality scales. Kuhlmann et al.’s main research question was stated as “what we gain from implementing VASs as the response format, in comparison to Likert-type response scales” (p. 2175) and their analyses under classical test theory revealed nearly identical psychometric properties of each scale under both response formats as regards means and standard deviations of score distributions, internal consistencies (Cronbach’s alphas) intercorrelations among personality scales, and correlations with age and gender. In other words, no meaningful advantage was associated with the presumably richer VAS response format, at least in terms of the psychometric characteristics of the resultant instrument.
Kuhlmann et al.’s (2017) data are publicly available at https://osf.io/gvqjs and we subjected them to CRM and NO-GRM analyses with our alternative purpose of estimating IRT item parameters and discretization boundaries for each item in each scale. Pre-processing of the data first consisted of removing respondents who had not answered all of the items in both formats on the scale under analysis. This removal left 590, 594, and 599 respondents for the analysis of the C, ES, and N data, respectively. These three separate samples share 570 respondents who answered all items on all scales in both formats. VAS and Likert responses were inverted for items worded in reverse (Items 3, 6, 9, and 11 on the C scale and Items 2 and 4 on the ES scale), and VAS responses were finally rescaled from the original range [1, 101] to the range [0, 1].
CRM and NO-GRM item parameters and respondents’ trait levels were independently estimated for each scale from the corresponding VAS and Likert data, using the software described above for estimation from simulated data. Item parameter estimates for each scale under each IRT model are tabulated in Section 2 of the Supplementary Material. In addition, sum scores were obtained for each respondent in each version of each scale by adding up the Likert scores on each item (each rescaled from the original range [1, 5] to the range [0, 4]) and by adding up the VAS scores that had already been placed in the range [0, 1] for each item.
Results
Comparison of VAS and Likert Data and Their IRT Descriptions
This section starts describing some of the surface-level features of the raw data for which Kuhlmann et al. (2017) had already presented results. These features are displayed here in graphical form to highlight aspects that are relevant to the forthcoming IRT analyses and the check of consistency of the processes underlying VAS and Likert responding.
Figure 10 shows histograms of VAS responses from respondents who marked each Likert category on each item on the ES scale. Analogous plots for all scales are provided in Section 3 of the Supplementary Material. Note that these plots are similar to that in Figure 7B, that is, Likert responses occurred in categories that are not consistent with the interval on which the (independent) VAS response had fallen. Note also that Items 2 and 4 (which were reverse-worded) seem peculiar in that a few respondents gave opposite responses under VAS and Likert formats, that is, responses near one end of the VAS scale and responses near the other end of the Likert scale. This feature was also present in data from the four reverse-worded items on the C scale (Items 3, 6, 9, and 11; see Section 3 of the Supplementary Material). It is not clear whether this is saying something about the (in)convenience of using reverse-worded items alongside other items that are not reverse-worded, but this outcome certainly adds fuel to the controversy over the use of reverse wording (see, for example, García-Fernández et al., 2022; Józsa & Morgan, 2017; Kam, 2023; Suárez-Álvarez et al., 2018; Swain et al., 2008; Vigil-Colet et al., 2020).

Correspondence Between VAS and Likert Responses for Each Item on the ES Scale. Graphical conventions as in Figure 7. Bin width along the VAS dimension is 0.05 units. Estimated Discretization Boundaries for Each Item Are Indicated by the Lines on the Bottom Plane That Run Along the Likert Dimension at the Corresponding
For a global look at the relation between VAS and Likert scores, Figure 11 shows scatter plots of Likert sum scores against VAS sum scores. The relation is moderately tight along the diagonal line in all scales, although the high values of Pearson correlation seem to overstate the agreement. There are also no signs of nonlinear regimes in these relations. The correlation given in each panel of Figure 11 is nearly identical to the corresponding correlation reported by Kuhlmann et al. (2017) in their table 2. The minute differences are surely due to the different numbers of respondents included for the computations in each case (i.e., respondents who did not omit any item on the two versions of the corresponding scale here versus presumably all respondents in the case of Kuhlmann et al.).

Scatter Plots of Likert Sum Scores Against VAS Sum Scores on Each Scale. Pearson Correlations for Data in Each Panel are Given in the Insets
Turning now to the accuracy of IRT estimates of trait levels and their relation to sum scores, Figure 12 first shows the estimated TIFs and SEs for each of the three scales using CRM and NO-GRM parameter estimates. In contrast to Figure 6 for simulated items with other true parameter values, the VAS format certainly appears to have more potential than the Likert format on all scales throughout the entire range of trait levels, not just at the extremes. Yet, these differences also seem to be largely inconsequential. To illustrate, Figure 13 shows, for each scale, scatter plots of CRM against NO-GRM estimates of trait levels (Figure 13A), VAS sum scores against CRM estimates of trait levels (Figure 13B), and Likert sum scores against NO-GRM estimates of trait levels (Figure 13C). Red curves in Figures 13B and 13C are the test characteristic functions (TCFs) that describe expected test score (given the estimated item parameters) as a function of trait level. Note in Figure 13A that the relation between estimated trait levels under each response format agrees with the corresponding relation between VAS and Likert sum scores (see Figure 11) and the correlations are similarly high. On the other hand, VAS sum scores and CRM estimates of trait levels follow the relation indicated by the TCF for each scale (Figure 13B) and the same holds for Likert sum scores and NO-GRM estimates of trait levels (Figure 13C). Note that the vertical spread around the TCF is smaller in the latter case. The range spanned by the vertical axis in each panel is four times broader in Figure 13C than it is in Figure 13B, but the common size of the vertical axes in these plots reveals that the normalized dispersion of Likert sum scores is smaller than that of VAS sum scores.

Test Information Functions (Top Row) and Standard Error of Estimation (Bottom Row) for Each Scale (Columns) Under the VAS Format of Administration (Green Lines) and the Likert Format (Red Curves)

Scatter Plots of Scores and Estimates of Trait Levels for Each Scale (Columns). (A) CRM Estimates of Trait Levels Against NO-GRM Estimates of Trait Levels. Pearson Correlation is Given in the Insets and a Diagonal Line is Plotted for Reference. (B) VAS Scores Against CRM Estimates of Trait Levels. The Red Curve in Each Panel is the Corresponding Test Characteristic Function. (C) Likert Scores Against NO-GRM Estimates of Trait Levels. The Red Curve in Each Panel is the Corresponding Test Characteristic Function
Compatibility of CRM and NO-GRM Accounts of the Data
The preceding results indicate that separate analyses of VAS data under the CRM and Likert data under the NO-GRM result in similar estimates of respondents’ trait levels (see Figure 13A), which surely reflect comparable sum scores under each format of administration (see Figure 11). These results are to be expected if VAS and Likert response processes are actually related as our theoretical analysis surmised, namely, that a VAS response to an item is the result of a draw from the distribution in Equation 3 while a Likert response to the same item is the result of an independent draw from the same distribution that is subsequently discretized according to a partition with the boundaries that hold for the item. Use of Equation 7 to estimate the discretization boundaries for each item from the corresponding CRM and NO-GRM item parameters renders the results shown in Figure 14 for each item on each scale.

Discretization Boundaries (Colored Circles) for Each Item on Each Scale (Columns). Item Numbers are Indicated on the Left of Each Panel. For Reference, Vertical Lines at Locations 0.2, 0.4, 0.6, and 0.8 Along the VAS Continuum Indicate Where the Boundaries Would Fall for a Partition Into Intervals of the Same Size
Discretization boundaries vary greatly across items and the boundaries within each item are generally away from the locations (vertical lines in each panel) that would provide a partition of the VAS continuum into intervals of the same size. The only case in which boundary locations for all items appear to be placed where an equispaced partition of the VAS continuum would suggest is for boundary
To further assess the agreement between VAS and Likert response processes, Figure 15 plots the CRFs for each item on the ES scale as directly estimated by the NO-GRM account of Likert data (i.e., Equation 2; black curves) and as predicted from the CRM account of VAS data (i.e., via Equations 4–7; red curves). Analogous plots for all scales are provided in Section 4 of the Supplementary Material. The agreement is not as good as one might have expected, but this is mostly caused by higher estimates of the NO-GRM item discrimination parameters

Category Response Functions for the Items on the ES Scale
Discussion
This work set out to investigate the correspondence between continuous (VAS) and discrete (Likert) data provided by the same respondents upon answering the same set of items under both formats. The guiding principle of the study was that every item presumably has unique functional characteristics that are independent of the format of its administration and that these characteristics simply manifest differently under different formats, particularly in the IRT characterization of the items obtained from either data set. The ultimate goal of the study was to use such IRT characterizations to investigate whether the discrete steps inherent to the Likert response format represent equal magnitudes along the underlying dimension, both within and across items.
Summary of Results
A formal analysis first showed that if an item administered with a VAS format conforms to the CRM, then the item has an equivalent expression under the NO-GRM when administered with a
Simulations then confirmed that these relations hold in finite samples both when the Likert response is obtained as a mere discretization of the original VAS response and, more realistically, when the Likert response is generated anew and independently from the original VAS response. At the same time, there was no clear sign that estimates of trait levels differ in accuracy according to the format in which the items are administered, with the exception that extreme trait levels were generally estimated more accurately with the VAS format due to the lack of floor and ceiling effects.
An analysis of empirical data from nearly 600 respondents who took three personality scales administered with both response formats revealed characteristics that matched those observed in simulated data, both in terms of the observable aspects of the raw data and in terms of the IRT accounts obtained via CRM and NO-GRM parameter estimates. CRM and NO-GRM item parameters permitted estimating discretization boundaries that map the continuous dimension onto discrete Likert responses. These discretization boundaries differed greatly across items within and across the three personality scales and they generally partitioned the continuum into intervals of different widths. This outcome lends little support to the notion that consecutive steps on a discrete Likert scale represent constant increments in magnitude along the underlying dimension.
Comparison With Earlier Attempts to Investigate the Equal-Interval Assumption
The Introduction mentioned earlier approaches to investigate the equal-interval assumption. These are discussed here in more detail and in relation to the approach taken in this article.
Knutsson et al. (2010) investigated whether the verbal labels (for frequency, intensity, or agreement) that often accompany successive response options on a Likert item are perceived as representing equal increases in magnitude along the underlying continuum. For this purpose, they had respondents indicate such perceived magnitudes on a VAS line. Thus, if each verbally labeled step were perceived to represent the same increase in magnitude, a plot of average VAS setting against (ordered) verbal label would display a linear trend, but their results displayed nonlinear trends instead (see their figure 1).
These results rule out the equal-interval assumption but it should be noted that they provide a global outlook that only applies to the category labels themselves. It is unlikely that the partition that these results reveal will be universal and independent of the specific content of the item with which such labels are used. In contrast, the approach taken in this article allows assessing the form that the partition takes for each individual item on a test, which turned out to differ across items (see Figure 14). Then, even if the category labels for Likert response options did not universally and per se represent steps of equal size on the underlying judgments, item content seems to modulate the relation in ways that need to be assessed on an item-by-item basis.
Toland et al. (2021) took a different route and actually investigated the issue on an item-by-item basis. They administered a four-item test in VAS format and subsequently discretized the responses into
Unquestionably, Toland et al.’s results show that estimated category threshold parameters are not equispaced, but it is not at all clear that this fact speaks about the equal-interval assumption. The intervals of concern lie along the response continuum and not along the trait dimension on which category threshold parameters are located. Figure 7 and its legend already showed that equispacing along the response continuum does not render equispaced category threshold parameters. This fact can be easily proved for arbitrary
and, thus, the distance between any two consecutive category threshold parameters is
for 2 ≤
which holds only when
for all 2 ≤
In sum, checking for equispaced category threshold parameters in IRT models assesses an equispaced partition of the latent trait dimension, but it does not address the question of whether Likert response categories partition the response continuum into intervals of the same size. This applies also to the analogous strategy followed by Sideridis et al. (2023) in their quest for the validity of the equal-interval assumption.
Implications for Scoring of Likert Items
Likert items are scored in the integers from 0 to
recalling that
Figure 16 shows weighted Likert sum scores against NO-GRM estimates of trait levels for each of the personality scales, with TCFs (red curves) recomputed accordingly. Compared with the analogous plot involving conventional Likert sum scores (Figure 13C), the relation is slightly tighter here, particularly for the N scale. At the same time, the relation is much tighter than it was in Figure 13B for actual VAS scores.

Scatter Plots of Weighted Likert Sum Scores Against NO-GRM Estimates of Trait Levels
Although the use of weighted Likert scores as defined above seems more appropriate than the use of conventional Likert scores, the difficulties associated with obtaining the former must be acknowledged. The scoring function in Equation 15 can only be obtained by dual administration of the test in VAS and Likert formats to estimate the needed discretization boundaries
Choice of Response Format
One may ask at this point whether there is any strong evidence supporting a preference for Likert or VAS response formats in practical applications. Ease of administration is no longer a criterion with the current availability of digital technology. The VAS response format has indeed started to be used on the intuition that it provides higher precision (see, for example, Dragan et al., 2022; Liu et al., 2019; Weigl et al., 2021; Wissing & Reinhard, 2018), although no solid evidence to this effect has ever been reported. In contrast, recent and direct evidence reported by Kuhlmann et al. (2017) indicates that scales administered in a within-subjects study with Likert and VAS format do not differ in their score distributions or classical psychometric properties, which shows that the surmised higher precision provided by a continuous response format is only an unfounded myth. Additional analyses reported here of the same data under applicable IRT models show that the continuous VAS format also does not provided any advantage over the discrete Likert format when data are scrutinized from within this alternative framework.
At the same time, there is also no sign that the VAS format brings up issues that should cause concerns to practitioners. These conclusions align with those of other studies in the literature (e.g., Simms et al., 2019). In these circumstances, the choice of format stands only as a matter of convenience or personal preferences with no consequences on the quality of measurement.
The only issue that seems to remain yet unexplored in this context is whether any extra measurement precision can arise from the continuous response format when the VAS line includes intermediate tick marks along its length. In Kuhlmann et al.’s (2017) study, the VAS line was unmarked (see their figure 1), but a recent study has shown that the use of intermediated tick marks help respondents make more accurate settings in comparison with those produced on an unmarked line (García-Pérez & Alcalá-Quintana, 2023). Yet, whether settings that are more accurate results in more accurate measurement is unclear.
Supplemental Material
sj-pdf-1-epm-10.1177_00131644231164316 – Supplemental material for Are the Steps on Likert Scales Equidistant? Responses on Visual Analog Scales Allow Estimating Their Distances
Supplemental material, sj-pdf-1-epm-10.1177_00131644231164316 for Are the Steps on Likert Scales Equidistant? Responses on Visual Analog Scales Allow Estimating Their Distances by Miguel A. García-Pérez in Educational and Psychological Measurement
Footnotes
Declaration of Conflicting Interests
Funding
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
