In their article on theory-based measurement, Borgstede and Eggert (2023) argue that a substantive formal psychological theory that is capable of predicting expected measurement outcomes for the theoretical objects of measurement it posits to exist is both necessary and sufficient for psychological measurement. They reveal that measurement in psychology mostly concerns the estimation of latent variables and compares unfavorably to the development of measurement in the history of physics. They, however, fail to include a comparison with the great advances in theory-based measurement achieved in modern physics. In this commentary, I describe how measurement is formalized in classical physics and examine what would be required to formalize the physical measurement of psychological phenomena. I conclude that, without an examination of the theoretical assumptions underlying current measurement procedures and a formal notion of psychological measurement, it is unlikely that psychological science will be able to generate the substantive theories suggested by Borgstede and Eggert.
A difficulty of much psychological theorizing is vagueness in the terms employed.
(Ashby, 1945, p. 15)
In their article, Borgstede and Eggert (2023) argue that a substantive formal psychological theory is both necessary and sufficient for psychological measurement. Crucial to understanding their claim is the distinction between the instrumental and the scientific value of measurement procedures. When measurement outcomes can be associated with the content of a formal theory, it can be said to have scientific value because it will allow us to decide between the veracity of divergent predictions by competing theories. Absent such theories, latent variable models, as well as psychological tests and standardized experimental procedures, can have instrumental value, but they do not have any surplus interpretability, which many physical measurements do possess—for example, “the law of equipartition links the values obtained from a mercury thermometer to the abstract concept of energy” (Borgstede & Eggert, 2023, pp. 131–132).
Borgstede and Eggert (2023) compare the current state of psychological measurement to the development of measurement in physics and conclude that psychological measurement is still in the prescientific stage. Although I agree with their conclusion, it is rather curious that the authors never discuss what constitutes a physical measurement in the formal sense. Based on the examples presented in their article, their history of measurement in physics appears to end with the emergence of classical mechanics. I assume that Borgstede and Eggert are referring to classical physical measurement whenever they use the term. This represents a missed opportunity to learn about the highly successful formal approaches to measurement that have been developed in modern physics. For example, physical theories about the quantum world—specifically, quantum electrodynamics—provide the most precise and accurate predictions of measurement outcomes ever observed in the history of science (Aoyama et al., 2008; Bennett et al., 2006). One of the reasons for this success has been the formalization of every aspect of the measurement process, which can yield different theories of measurement for different physical domains (e.g., classical physics, general relativity, quantum physics) but remains relatively independent from the specific content of the theories that operate within the domains (see Ludwig, 1985). Borgstede and Eggert (2023) seem to assume that the measurement problem will be resolved as soon as psychological science is able to produce substantive formal theories: “formal theory does not only provide meaning to the concepts used in a theory; it also provides rules about how to apply the theory, rendering a specific measurement theory obsolete” (p. 131). This strengthens the assumption that, to the authors, measurement concerns ideal classical physical measurement, which has arguably been the default interpretation of psychological measurement throughout the history of the discipline (Luce & Narens, 1983).
The first claim I will defend in this commentary is that even if a revolution happens in psychological theorizing that will finally provide us with substantive formal theories, without a formalization of the process of measurement of psychological phenomena, the discipline will be back to square one—inferring the best-fitting parameters of statistical (latent variable) models from noisy data but still lacking a clear notion of how to incorporate the measurement context and the act of measurement into the description of the psychological phenomenon (Hasselman et al., 2019). The second claim I will defend concerns the following statement: “Our critique of LVM [latent variable modeling] is independent of the specific assumptions made in latent variable models, like quantitative structure, ergodicity, local independence, and so forth” (Borgstede & Eggert, 2023, p. 127). Analogous to Dennett’s (1995) famous dictum about philosophy-free science, I argue that there is no such thing as theory-free measurement; there is only measurement whose theoretical baggage is taken on board without examination. In what follows, I first describe how the measurement procedure has been formalized for classical physical theories and subsequently explore whether this approach can be used to describe psychological measurement. Finally, I argue that psychological science, perhaps inadvertently, has been testing a formal theory for many decades. Can psychological phenomena be considered to originate from an ergodic system?
Formalizing physical measurement
In classical physics, a measurement brings about a correlation between a quantity
of a physical system
and a quantity ℛ, which is a characteristic of another physical system, the measuring apparatus ℳ. Before the measurement,
is assumed to have a value
; it is the objective of the measurement process to reveal this value. After the measurement, ℛ takes on the value
=
, in which
is a one-to-one mapping (bijection) of the possible values of
before the measurement onto the possible values of ℛ after the measurement (Hilgevoord, 2009, pp. 160–161). We can apply this formal notion to the following measurement context: quantify the temperature
of the water
contained in a paper cup, which is standing on a table in the center of a room where the ambient temperature is held constant at 18 °C. The measurement apparatus ℳ is a mercury thermometer, of which the measurement error is known, indicating about 18 °C at the start of the measurement. The measurement procedure concerns putting the thermometer into the cup, waiting 5 minutes, and reading the temperature ℛ. The unobservable quantity
is made observable by the thermometer indicating
=
= 12.5 °C. Note that for a correlation to occur between
and ℛ, there must be an interaction between
and ℳ. The state of
, the water in the cup, could be affected by the act of putting the thermometer in the cup: water molecules might be disturbed, leading to a slight increase in temperature. The question is: Should this interaction effect be included in our interpretation of the measurement outcome?
As an example of a nontrivial interaction between
and ℳ, consider another thermometer that has been prepared in a special way: it has been kept in a freezer for 8 hours prior to the measurement and now reads −10 °C at the start of the measurement. The exact same measurement procedure is followed, and now we get a measurement outcome that is considerably lower compared to the previous measurement. In this case, the thermometer ℳ interacted with the state of
by cooling the water down to change the state
to a value
’. In general, the purpose of measurement is to reveal the value of
before the interaction with ℳ. If the interaction influences
, we would like to be able to predict, from the value
and the interaction between
and ℳ, the value of
. If such a prediction is possible, the measurement procedure is said to prepare a state of
in which
takes on the value
’. If
=
’, this is an ideal, or nondisturbing, measurement. In this example, one could use thermodynamics to estimate the effect of the interaction and get a correct estimate for
. This appears to be what Borgstede and Eggert (2023) mean by “empirically applying” the theory that prescribes the measurement procedure and ultimately the expected measurement outcomes, leading them to conclude that a separate theory of measurement would be redundant. However, the formal conception of physical measurement is independent from the theory of thermodynamics and can also be used to calculate the orbits of planets. Modern physical theories operate within a formalism that defines a specific domain in reality, which applies to all theories seeking to explain the phenomena defined within the domain. A formalism generally connects concepts such as “physical system,” “state,” “quantity,” and “measurement” to mathematical concepts in a way that is specific for each domain. The quantum formalism also includes postulates that define what happens to the state of a system due to the act of measurement.
To summarize, irrespective of the specifics of the phenomenon of interest, the measurement process in physics concerns two aspects—state preparation and measurement—and the important questions to resolve are whether the measurement procedure is disturbing or not, and whether it is possible to predict the effects of the disturbance, which requires knowledge about the nature of the interaction between
and ℳ. The answers can be evident: the measurement of the path of a celestial object across the night sky, irrespective of the nature of ℳ (a telescope or the naked eye), will not disturb the state of the celestial object. This means that the interpretation of a measurement outcome
as a representation of the value
before the act of measurement will be unproblematic: the measurement itself is nondisturbing with respect to
. In quantum measurement, the scale differences between the measurement objects and the measurement apparatus are such that there will always be an interaction, and all measurement outcomes should be interpreted as the result of an interaction (de Muynck, 2006).
Psychological measurement is disturbing
Can the formalization of the measurement process in physics be a model for psychological measurement? Psychological measurement generally concerns the quantification of internal physiological, emotional, or psychological states, or mental phenomena that are not directly perceivable by an outside observer. This is different from measuring the position of a planet but may be similar to measuring the temperature of water or the mass of your body, which are also quantities that are not directly observable. I suggest that it is reasonable to assume that all psychological measurement concerns at least disturbed measurement—that is, the interaction between
and ℳ will always affect the value of
. Consider asking a participant in an experiment to rate their current experience of happiness by evaluating the statement “Right now, I feel happy” and reporting their degree of agreement with the statement on a scale of 1 to 7. The object of the measurement is to reveal the value of the internal state of happiness
before the measurement takes place. The system
is the participant; the measurement apparatus ℳ consists of the instruction, the question, and the rating scale. The projection of the current experienced level of happiness
onto an arbitrary ordinal scale with seven values means that
can take on 1, 2, 3, 4, 5, 6, or 7. Suppose the process returns a value
= (
) = 1. Obviously, the very act of asking someone to project their current internal state of happiness onto an arbitrary ordinal scale will interfere with their state of happiness before the measurement. It is likely that the measurement procedure concerns a preparation of a state of
in which
takes on the value
’ after the measurement, such that
≠
’. To proceed, we need to answer the following questions: What do we know about the interaction between
and ℳ that would allow us to reveal the value
before the measurement? And what are the values that
can theoretically take on? This is required to construct the mapping
= (
).
The sobering conclusion must be that psychological science does not have a formal conceptualization of exactly what measurement entails in this context. Even if we had a formal theory of happiness that would yield numerical predictions for
, without detailed knowledge of the mapping
= (
) and the nature of the interaction between
and ℳ, the interpretation of the measurement outcome
is problematic. There are additional complications, some of which may be unique to psychological phenomena. For example, does
, the level of happiness, exist as an identifiable state of
before the measurement takes place? If so, what timescale before the measurement will
refer to—1 second, 1 hour, the average of last week? If it does not exist but somehow emerges due to the act of measurement itself, how should we interpret the outcome? Also, an argument can be made for ℳ to also include the participant providing the self-report. After all, the individual is the “apparatus” that generates the projection of
onto the arbitrary ordinal scale. If this is the case, there is a problem with defining
and ℳ as separate physical systems and
and ℛ as separate states. The entanglement of systems and mixing of states is a problem that, for very different reasons, also occurs in quantum measurement. Although their article does not discuss these problems, Borgstede and Eggert (2023) might suggest that it has to be the substantive theory that has to be able to resolve these matters through the empirical application of posited principles and laws. If so, it is likely that one would end up with something similar to the quantum formalism, specified for psychological measurement, in which
,
, ℳ, and ℛ are formally defined for psychological systems, psychological states, and the nontrivial effects of the act of psychological measurement.
Theoretical baggage in psychological methods and models
The statistical models used in contemporary psychology require measurement outcomes to be stationary, homogeneous, and independent—that is, beyond a sufficiently short timescale, repeated observations should no longer be correlated. The latter assumption is called the memorylessness property (Ramachandran, 1979) and, together with stationarity and the homogeneity of central moments, puts very specific constraints on the data-generating process and the kind of physical system that can produce such data. This is known as ergodicity (Molenaar, 2004, 2008), which refers to the condition in which ensemble averages of variables observed in samples of sufficiently many systems of the same identity, are expected to be arbitrarily similar to the time averages of variables evolving over a sufficiently long interval of time in a single system, irrespective of the set of possible initial conditions. For example, if we were able to throw 1,000 fair dice all at once, the observed distribution of values is expected to be arbitrarily similar to the distribution we would obtain by throwing a single fair die 1,000 times in a row. The assumption of ergodicity constitutes the basis of all statistical models, including latent variable models. These assumptions are theory-laden and, as such, make specific theoretical claims about the object of measurement and inform the measurement procedure (random sample, factorial design, random assignment, etc.) and interpretation (group-to-individual generalization). Is it really the case that psychological science studies the behavior of ergodic systems?
Recent observations of discrepancies between inferred properties at the ensemble level (interindividual) and the individual level (intra-individual) indicate an emerging consensus that ergodicity does not apply to psychological measurements (Fisher et al., 2018; Wolfers et al., 2018). Olthof et al. (2020) evidenced that multivariate time series of self-reports of human experience in the context of psychopathology violate all of the ergodicity assumptions and more likely reflect the properties of the nonlinear dynamics of complex adaptive systems. A prominent result was the nonstationarity of the autocorrelation function, which implies a temporal structure in the data that is likely multifractal in nature—a phenomenon that has been suggested to play a role in the reproducibility crisis in psychology (Wallot & Kelty-Stephen, 2017).
Contrary to what Borgstede and Eggert (2023) suggest, there is an abundance of theory-based measurement in psychological science; it just happens to be based on erroneous theoretical assumptions about the object of measurement. If a formalism for the physical measurement of psychological phenomena is defined, the substantive theories will follow.