Sage Journals: Discover world-class research

Abstract

In their article on theory-based measurement, Borgstede and Eggert (2023) argue that a substantive formal psychological theory that is capable of predicting expected measurement outcomes for the theoretical objects of measurement it posits to exist is both necessary and sufficient for psychological measurement. They reveal that measurement in psychology mostly concerns the estimation of latent variables and compares unfavorably to the development of measurement in the history of physics. They, however, fail to include a comparison with the great advances in theory-based measurement achieved in modern physics. In this commentary, I describe how measurement is formalized in classical physics and examine what would be required to formalize the physical measurement of psychological phenomena. I conclude that, without an examination of the theoretical assumptions underlying current measurement procedures and a formal notion of psychological measurement, it is unlikely that psychological science will be able to generate the substantive theories suggested by Borgstede and Eggert.

Keywords

disturbed measurement ergodicity measurement theory preparation procedure

A difficulty of much psychological theorizing is vagueness in the terms employed.

(Ashby, 1945, p. 15)

In their article, Borgstede and Eggert (2023) argue that a substantive formal psychological theory is both necessary and sufficient for psychological measurement. Crucial to understanding their claim is the distinction between the instrumental and the scientific value of measurement procedures. When measurement outcomes can be associated with the content of a formal theory, it can be said to have scientific value because it will allow us to decide between the veracity of divergent predictions by competing theories. Absent such theories, latent variable models, as well as psychological tests and standardized experimental procedures, can have instrumental value, but they do not have any surplus interpretability, which many physical measurements do possess—for example, “the law of equipartition links the values obtained from a mercury thermometer to the abstract concept of energy” (Borgstede & Eggert, 2023, pp. 131–132).

Borgstede and Eggert (2023) compare the current state of psychological measurement to the development of measurement in physics and conclude that psychological measurement is still in the prescientific stage. Although I agree with their conclusion, it is rather curious that the authors never discuss what constitutes a physical measurement in the formal sense. Based on the examples presented in their article, their history of measurement in physics appears to end with the emergence of classical mechanics. I assume that Borgstede and Eggert are referring to classical physical measurement whenever they use the term. This represents a missed opportunity to learn about the highly successful formal approaches to measurement that have been developed in modern physics. For example, physical theories about the quantum world—specifically, quantum electrodynamics—provide the most precise and accurate predictions of measurement outcomes ever observed in the history of science (Aoyama et al., 2008; Bennett et al., 2006). One of the reasons for this success has been the formalization of every aspect of the measurement process, which can yield different theories of measurement for different physical domains (e.g., classical physics, general relativity, quantum physics) but remains relatively independent from the specific content of the theories that operate within the domains (see Ludwig, 1985). Borgstede and Eggert (2023) seem to assume that the measurement problem will be resolved as soon as psychological science is able to produce substantive formal theories: “formal theory does not only provide meaning to the concepts used in a theory; it also provides rules about how to apply the theory, rendering a specific measurement theory obsolete” (p. 131). This strengthens the assumption that, to the authors, measurement concerns ideal classical physical measurement, which has arguably been the default interpretation of psychological measurement throughout the history of the discipline (Luce & Narens, 1983).

The first claim I will defend in this commentary is that even if a revolution happens in psychological theorizing that will finally provide us with substantive formal theories, without a formalization of the process of measurement of psychological phenomena, the discipline will be back to square one—inferring the best-fitting parameters of statistical (latent variable) models from noisy data but still lacking a clear notion of how to incorporate the measurement context and the act of measurement into the description of the psychological phenomenon (Hasselman et al., 2019). The second claim I will defend concerns the following statement: “Our critique of LVM [latent variable modeling] is independent of the specific assumptions made in latent variable models, like quantitative structure, ergodicity, local independence, and so forth” (Borgstede & Eggert, 2023, p. 127). Analogous to Dennett’s (1995) famous dictum about philosophy-free science, I argue that there is no such thing as theory-free measurement; there is only measurement whose theoretical baggage is taken on board without examination. In what follows, I first describe how the measurement procedure has been formalized for classical physical theories and subsequently explore whether this approach can be used to describe psychological measurement. Finally, I argue that psychological science, perhaps inadvertently, has been testing a formal theory for many decades. Can psychological phenomena be considered to originate from an ergodic system?

Formalizing physical measurement

In classical physics, a measurement brings about a correlation between a quantity $A$ of a physical system $S$ and a quantity ℛ, which is a characteristic of another physical system, the measuring apparatus ℳ. Before the measurement, $A$ is assumed to have a value $a$ ; it is the objective of the measurement process to reveal this value. After the measurement, ℛ takes on the value $r$ = $m (a)$ , in which $m$ is a one-to-one mapping (bijection) of the possible values of $A$ before the measurement onto the possible values of ℛ after the measurement (Hilgevoord, 2009, pp. 160–161). We can apply this formal notion to the following measurement context: quantify the temperature $A$ of the water $S$ contained in a paper cup, which is standing on a table in the center of a room where the ambient temperature is held constant at 18 °C. The measurement apparatus ℳ is a mercury thermometer, of which the measurement error is known, indicating about 18 °C at the start of the measurement. The measurement procedure concerns putting the thermometer into the cup, waiting 5 minutes, and reading the temperature ℛ. The unobservable quantity $a$ is made observable by the thermometer indicating $r$ = $m (a)$ = 12.5 °C. Note that for a correlation to occur between $A$ and ℛ, there must be an interaction between $S$ and ℳ. The state of $S$ , the water in the cup, could be affected by the act of putting the thermometer in the cup: water molecules might be disturbed, leading to a slight increase in temperature. The question is: Should this interaction effect be included in our interpretation of the measurement outcome?

As an example of a nontrivial interaction between $S$ and ℳ, consider another thermometer that has been prepared in a special way: it has been kept in a freezer for 8 hours prior to the measurement and now reads −10 °C at the start of the measurement. The exact same measurement procedure is followed, and now we get a measurement outcome that is considerably lower compared to the previous measurement. In this case, the thermometer ℳ interacted with the state of $S$ by cooling the water down to change the state $A$ to a value $a$ ’. In general, the purpose of measurement is to reveal the value of $A$ before the interaction with ℳ. If the interaction influences $A$ , we would like to be able to predict, from the value $a$ and the interaction between $S$ and ℳ, the value of $A$ . If such a prediction is possible, the measurement procedure is said to prepare a state of $S$ in which $A$ takes on the value $a$ ’. If $a$ = $a$ ’, this is an ideal, or nondisturbing, measurement. In this example, one could use thermodynamics to estimate the effect of the interaction and get a correct estimate for $A$ . This appears to be what Borgstede and Eggert (2023) mean by “empirically applying” the theory that prescribes the measurement procedure and ultimately the expected measurement outcomes, leading them to conclude that a separate theory of measurement would be redundant. However, the formal conception of physical measurement is independent from the theory of thermodynamics and can also be used to calculate the orbits of planets. Modern physical theories operate within a formalism that defines a specific domain in reality, which applies to all theories seeking to explain the phenomena defined within the domain. A formalism generally connects concepts such as “physical system,” “state,” “quantity,” and “measurement” to mathematical concepts in a way that is specific for each domain. The quantum formalism also includes postulates that define what happens to the state of a system due to the act of measurement.

To summarize, irrespective of the specifics of the phenomenon of interest, the measurement process in physics concerns two aspects—state preparation and measurement—and the important questions to resolve are whether the measurement procedure is disturbing or not, and whether it is possible to predict the effects of the disturbance, which requires knowledge about the nature of the interaction between $S$ and ℳ. The answers can be evident: the measurement of the path of a celestial object across the night sky, irrespective of the nature of ℳ (a telescope or the naked eye), will not disturb the state of the celestial object. This means that the interpretation of a measurement outcome $r$ as a representation of the value $a$ before the act of measurement will be unproblematic: the measurement itself is nondisturbing with respect to $A$ . In quantum measurement, the scale differences between the measurement objects and the measurement apparatus are such that there will always be an interaction, and all measurement outcomes should be interpreted as the result of an interaction (de Muynck, 2006).

Psychological measurement is disturbing

Can the formalization of the measurement process in physics be a model for psychological measurement? Psychological measurement generally concerns the quantification of internal physiological, emotional, or psychological states, or mental phenomena that are not directly perceivable by an outside observer. This is different from measuring the position of a planet but may be similar to measuring the temperature of water or the mass of your body, which are also quantities that are not directly observable. I suggest that it is reasonable to assume that all psychological measurement concerns at least disturbed measurement—that is, the interaction between $S$ and ℳ will always affect the value of $A$ . Consider asking a participant in an experiment to rate their current experience of happiness by evaluating the statement “Right now, I feel happy” and reporting their degree of agreement with the statement on a scale of 1 to 7. The object of the measurement is to reveal the value of the internal state of happiness $A$ before the measurement takes place. The system $S$ is the participant; the measurement apparatus ℳ consists of the instruction, the question, and the rating scale. The projection of the current experienced level of happiness $A$ onto an arbitrary ordinal scale with seven values means that $r$ can take on 1, 2, 3, 4, 5, 6, or 7. Suppose the process returns a value $r$ = ( $a$ ) = 1. Obviously, the very act of asking someone to project their current internal state of happiness onto an arbitrary ordinal scale will interfere with their state of happiness before the measurement. It is likely that the measurement procedure concerns a preparation of a state of $S$ in which $A$ takes on the value $a$ ’ after the measurement, such that $a$ ≠ $a$ ’. To proceed, we need to answer the following questions: What do we know about the interaction between $S$ and ℳ that would allow us to reveal the value $a$ before the measurement? And what are the values that $a$ can theoretically take on? This is required to construct the mapping $r$ = ( $a$ ).

The sobering conclusion must be that psychological science does not have a formal conceptualization of exactly what measurement entails in this context. Even if we had a formal theory of happiness that would yield numerical predictions for $a$ , without detailed knowledge of the mapping $r$ = ( $a$ ) and the nature of the interaction between $S$ and ℳ, the interpretation of the measurement outcome $r$ is problematic. There are additional complications, some of which may be unique to psychological phenomena. For example, does $A$ , the level of happiness, exist as an identifiable state of $S$ before the measurement takes place? If so, what timescale before the measurement will $a$ refer to—1 second, 1 hour, the average of last week? If it does not exist but somehow emerges due to the act of measurement itself, how should we interpret the outcome? Also, an argument can be made for ℳ to also include the participant providing the self-report. After all, the individual is the “apparatus” that generates the projection of $a$ onto the arbitrary ordinal scale. If this is the case, there is a problem with defining $S$ and ℳ as separate physical systems and $A$ and ℛ as separate states. The entanglement of systems and mixing of states is a problem that, for very different reasons, also occurs in quantum measurement. Although their article does not discuss these problems, Borgstede and Eggert (2023) might suggest that it has to be the substantive theory that has to be able to resolve these matters through the empirical application of posited principles and laws. If so, it is likely that one would end up with something similar to the quantum formalism, specified for psychological measurement, in which $S$ , $A$ , ℳ, and ℛ are formally defined for psychological systems, psychological states, and the nontrivial effects of the act of psychological measurement.

Theoretical baggage in psychological methods and models

The statistical models used in contemporary psychology require measurement outcomes to be stationary, homogeneous, and independent—that is, beyond a sufficiently short timescale, repeated observations should no longer be correlated. The latter assumption is called the memorylessness property (Ramachandran, 1979) and, together with stationarity and the homogeneity of central moments, puts very specific constraints on the data-generating process and the kind of physical system that can produce such data. This is known as ergodicity (Molenaar, 2004, 2008), which refers to the condition in which ensemble averages of variables observed in samples of sufficiently many systems of the same identity, are expected to be arbitrarily similar to the time averages of variables evolving over a sufficiently long interval of time in a single system, irrespective of the set of possible initial conditions. For example, if we were able to throw 1,000 fair dice all at once, the observed distribution of values is expected to be arbitrarily similar to the distribution we would obtain by throwing a single fair die 1,000 times in a row. The assumption of ergodicity constitutes the basis of all statistical models, including latent variable models. These assumptions are theory-laden and, as such, make specific theoretical claims about the object of measurement and inform the measurement procedure (random sample, factorial design, random assignment, etc.) and interpretation (group-to-individual generalization). Is it really the case that psychological science studies the behavior of ergodic systems?

Recent observations of discrepancies between inferred properties at the ensemble level (interindividual) and the individual level (intra-individual) indicate an emerging consensus that ergodicity does not apply to psychological measurements (Fisher et al., 2018; Wolfers et al., 2018). Olthof et al. (2020) evidenced that multivariate time series of self-reports of human experience in the context of psychopathology violate all of the ergodicity assumptions and more likely reflect the properties of the nonlinear dynamics of complex adaptive systems. A prominent result was the nonstationarity of the autocorrelation function, which implies a temporal structure in the data that is likely multifractal in nature—a phenomenon that has been suggested to play a role in the reproducibility crisis in psychology (Wallot & Kelty-Stephen, 2017).

Contrary to what Borgstede and Eggert (2023) suggest, there is an abundance of theory-based measurement in psychological science; it just happens to be based on erroneous theoretical assumptions about the object of measurement. If a formalism for the physical measurement of psychological phenomena is defined, the substantive theories will follow.

Footnotes

I would like to thank Ralf Cox for introducing me to the ideas behind modern physical measurement,as well as his efforts to translate the formal concepts in order to be applied to psychological measurement. This commentary would not have been possible without our discussions.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author received no financial support for the research,authorship,and/or publication of this article.

ORCID iD

Fred Hasselman

Author biography

Fred Hasselman is an assistant professor in the Behavioural Science Institute at Radboud University and a founding member of the Radboud Interfaculty Complexity Hub in Nijmegen,the Netherlands. He uses a complex systems approach to study phenomena of the body and mind,inspired by ecological psychology and informed by physical principles and laws. His main focus is on developing a so-called person-specific (idiographic) research paradigm in which the modeling,quantification,and qualification of intra-individual variability is more important than interindividual variability. A secondary goal is to develop methods that can be applied in (clinical) practice,either by informing diagnosis or as tools to enhance the efficacy of interventions. His recent publications include “Early Warning Signals in Phase Space: Geometric Resilience Loss Indicators from Multiplex Cumulative Recurrence Networks” in Frontiers in Physiology (2022) and (with M. Olthof,F. Oude-Maatman,A. M. T. Bosman,& A. Lichtwarck-Asschoff),“Complexity Theory of Psychopathology” in the Journal of Psychopathology and Clinical Science (in press).

References

Aoyama

Hayakawa

Kinoshita

Nio

(2008). Revised value of the eighth-order QED contribution to the anomalous magnetic moment of the electron. Physical Review D, 77(5), Article 053012. https://doi.org/10.1103/PhysRevD.77.053012

Ashby

W. R.

(1945). The physical origin of adaptation by trial and error. Journal of General Psychology, 32(1), 13–25. https://doi.org/10.1080/00221309.1945.10544480

Bennett

Bousquet

Brown

Bunce

Carey

Cushman

Danby

Debevec

Deile

Deng

Deninger

Dhawan

Druzhinin

Duong

Efstathiadis

Farley

Fedotovich

Giron

Gray

, . . . Zimmerman

(2006). Final report of the E821 muon anomalous magnetic moment measurement at BNL. Physical Review D, 73(7), Article 072003. https://doi.org/10.1103/PhysRevD.73.072003

Borgstede

Eggert

(2023). Squaring the circle: From latent variables to theory-based measurement. Theory & Psychology, 33(1), 118–137. https://doi.org/10.1177/09593543221127985

de Muynck

W. M

. (2006). Foundations of quantum mechanics, an empiricist approach. Springer Science & Business Media.

Dennett

D. C.

(1995). Darwin’s dangerous idea: Evolution and the meanings of life. Simon & Schuster.

Fisher

A. J.

Medaglia

J. D.

Jeronimus

B. F.

(2018). Lack of group-to-individual generalizability is a threat to human subjects research. Proceedings of the National Academy of Sciences of the United States of America, 115(27), E6106–E6115. https://doi.org/10.1073/pnas.1711978115

Hasselman

Seevinck

M. P.

Cox

R. F. A.

(2019). “So you confirmed, replicated and emptied your file-drawer. . . now what?” A structural realist’s guide to theory evaluation in psychological science. PsyArXiv. https://doi.org/10.31234/osf.io/b8csj

Hilgevoord

(2009). Foundations of quantum mechanics (10th ed.). Institute for the History and Foundations of Science, University of Utrecht.

10.

Luce

R. D.

Narens

(1983). Symmetry, scale, types, and generalizations of classical physical measurement. Journal of Mathematical Psychology, 27(1), 44–85. https://doi.org/10.1016/0022-2496(83)90026-3

11.

Ludwig

(1985). The measurement process and the preparation process. In Ludwig

(Ed.), Foundations of quantum mechanics (pp. 303–354). Springer. https://doi.org/10.1007/978-3-662-28726-2_9

12.

Molenaar

P. C. M.

(2004). A manifesto on psychology as idiographic science: Bringing the person back into scientific psychology, this time forever. Measurement: Interdisciplinary Research & Perspective, 2(4), 201–218. https://doi.org/10.1207/s15366359mea0204_1

13.

Molenaar

P. C. M.

(2008). On the implications of the classical ergodic theorems: Analysis of developmental processes has to focus on intra-individual variation. Developmental Psychobiology, 50(1), 60–69. https://doi.org/10.1002/dev

14.

Olthof

Hasselman

Lichtwarck-Aschoff

(2020). Complexity in psychological self-ratings: Implications for research and practice. BMC Medicine, 18(1), Article 317. https://doi.org/10.1186/s12916-020-01727-2

15.

Ramachandran

(1979). On the “strong memorylessness property” of the exponential and geometric probability laws. Sankhyā: The Indian Journal of Statistics, Series A, 41(3–4), 244–251. https://www.jstor.org/stable/25050199

16.

Wallot

Kelty-Stephen

D. G.

(2017). Interaction-dominant causation in mind and brain, and its implication for questions of generalization and replication. Minds and Machines, 28(2), 353–374. https://doi.org/10.1007/s11023-017-9455-0

17.

Wolfers

Doan

N. T.

Kaufmann

Alnaes

Moberget

Agartz

Buitelaar

J. K.

Ueland

Melle

Franke

Andreassen

O. A.

Beckmann

C. F.

Westlye

L. T.

Marquand

A. F.

(2018). Mapping the heterogeneous phenotype of schizophrenia and bipolar disorder using normative models. JAMA Psychiatry, 75(11), 1146–1155. https://doi.org/10.1001/jamapsychiatry.2018.2467