Abstract
Although Krantz and Wallsten (2019) claim that interval and ratio scales abound in psychology, they miss the opportunity to deliver specific evidence for their existence. Michell (2019), on the other hand, misconstrues my objection against the practical usefulness of conjoint measurement (Trendler, 2019). Furthermore, he underestimates the critical role humans play as measurement instruments—that is, as detectors of magnitudes of psychological attributes as derived quantities—and he also misunderstands the meaning of the Millean Quantity Objection. Finally, in answer to Krantz and Wallsten, I specify my position with regard to the connection between scientific stagnation, measurability, and reproducibility.
If confronted with a quantity objection the opponent is in quite a comfortable position. He or she has to point out just one single case where measurement has been successfully established. In essence, what must be demonstrated, for at least one psychological attribute A (e.g., ability), is that the ratio between two magnitudes of quantity of A1/A2 is constant. That is, for example, in the case of the Rasch hypothesis θ = A/D, it must be shown that for two persons A1 and A2, and for different items D1, D2, D3, …, the ratio A1/A2 = θ11/θ21 = θ12/θ22 = θ13/θ23 = … = const. This is what measurement is all about. What should be added as a supplementary requirement to the invariance criterion is that a reported finding must be replicated by at least one independent researcher or research group (Trendler, 2013; see also Nozick, 2001). Therefore, giving only one example of a firmly established and generally accepted metric measurement scale—and not just present, in the manner of Krantz and Wallsten (2019), a list of publications, potentially containing the evidence—would not only clarify the matter substantially, but it would also set a standard for the attainability of measurement in psychology. 1 It is not up to the critics to search for proof of the existence of measurement in psychology.
Furthermore, it should be noted that the view expressed by Krantz and Wallsten (2019) that “[i]nterval or ratio scales abound” (p. 130) is not generally shared among psychologists; in contrast to physics, where the existence of ratio measurement is not contested. For example, in his assessment of the three volumes of Foundations of Measurement, Schönemann (1994) does not recognize the abundance described. On the contrary, what he detects is a “virtually perfect absence of empirical support” (p. 150) for axiomatic measurement theories and, in particular, with regard to conjoint measurement he notes that “[w]hatever utility such measurement may have, it is a far cry from ‘FM [fundamental measurement] in the same sense that it is possible in physics’” (p. 154). In his comprehensive study of measurement in psychology, Michell (1999) also does not note the alleged plentiful availability of metric scales. This does of course not mean that Krantz and Wallsten’s view is incorrect; it may only be a communication problem or some kind of bias on the part of the majority to acknowledge that “scales of the highest repute: interval and ratio scales” (Luce & Tukey, 1964, p. 4) are already available in psychology.
With regard to Michell’s (2019) criticism of my argumentation (Trendler, 2019), I would first like to point out that the purpose of investigating the abstract-mathematical and the practical-concrete role of the standard sequence procedure in the representational measurement theory is to illuminate its relation to the classical or traditional concept of measurement. The result of the comparison is that, in essence, the concept of measurement is the same in both theories. This is the light in which my treatment of the method of solving inequalities should be viewed; i.e., in abstract form the standard sequence procedure underlies this “measurement procedure” as well. Neither is it my thesis that, in general, constructing standard sequences is the only practical method to discover ratios between magnitudes of quantity nor, in particular, that the method of solving inequalities necessarily presupposes the construction of standard sequences. Nonetheless, every measurable attribute can be imagined as a standard sequence.
So, what is my thesis? My argument is that, since magnitudes of derived quantities cannot be determined without the help of quantitative indicators, derived measurement is preferable to conjoint measurement, because it is simpler in practical application. I have also pointed out that the reason why the problem with psychological attributes as derived attributes—i.e., that they are not fundamentally measurable—is not immediately recognized in psychology, is because it is more or less tacitly assumed that humans have the capabilities of measurement instruments. This is, I believe, the main cause of the illusion that all quantities are fundamental quantities; a view which is endemic to the representational measurement theory.
In response to Michell’s (2019) objections some specifications are therefore necessary: first, the question is not whether the human body can serve as measurement instrument (e.g., the heart rate for time measurement) or whether the human participant can differentiate between magnitudes of physical stimuli (e.g., light intensity, sound intensity, or length), but whether humans can unequivocally identify magnitudes of psychological attributes qua derived quantities. 2 Second, in order to challenge my view, it is sufficient to indicate one single psychological attribute that is fundamentally measurable, at least on a nominal scale. As pointed out (Trendler, 2009, 2013), if it comes to quantities, nominal measurement is far from a trivial matter. That is, the following question must be answered by specifying a concrete measurement procedure: how can we determine, for instance, if two persons A1 and A2 possess the same amount of ability (i.e., a1 = a2) or how can we find out if the same person A1 has the same amount of ability at different times (i.e., a1 = b1 = c1 = …), so that we can confidently conclude that the same point on the quantitative dimension has been identified (for how the task of identifying “fixed points” is accomplished in physics, see Chang, 2004)?
What also seems to escape Michell’s (2019) attention is that, when the quantitative hypothesis is tested, the issue investigated is not only of whether the relevant psychological attributes are quantitative, but what inevitably enters as an auxiliary hypothesis is the question of whether humans have the capabilities of measuring devices, no matter if the test participant or the researcher is aware of this or not. In what sense are humans conceived as measuring instruments? They are considered as such not under any circumstance, but only when it is assumed that the observed behavior conveys directly or indirectly quantitative information about the relevant psychological attributes. This is in general the case when the quantitative hypothesis is tested by asking test participants questions about the position of magnitudes on a quantitative dimension (e.g., Michell, 1990, 1994). 3 More precisely, what is tacitly assumed is that, first, humans have “internally” the capability to determine magnitudes of psychological attributes, compare them for more or less, or determine ratios between them and, second, that they are able to communicate, partly or completely, the result of the “internal” measurement operations “outwardly” to the experimenter. Accordingly, Sixtl (1982) notes, methods for data collection can be differentiated into direct and indirect methods. In the case of direct methods of data collection, test participants are required to provide metric information about psychological factors directly (e.g., estimations of ratios between levels of psychological attributes). If indirect methods are used, then test participants are merely required to deliver nominal (e.g., yes/no answers) or ordinal data (e.g., judgments about more or less). In this case it is assumed that metric information is provided implicitly.
It is important to understand that if the verification of the quantitative hypothesis fails, it does not necessarily follow that the investigated factors are non-quantitative; it is also conceivable that humans do not have the capabilities of measurement instruments or that as such they are impaired in their function. In short, the validity of inferences about the theoretical meaning of negative empirical results depends on the issue of the undisturbedness of humans as measuring instruments (for details on the theory of measuring devices see Janich, 1985). Therefore, in the face of negative empirical evidence, if one does not want to abandon the hypothesis that the investigated psychological attribute is quantitative (e.g., in the cases described in Michell, 1990, Chapters 5–7), one will have to make sure before repeating an experiment that the test participants are valid and undisturbed devices for measurement. In the case of artificial, man-made instruments it is clear how this can be done. But how are we to proceed with human beings? We cannot simply call the craftsman or the mechanic to check and, if necessary, fix them. The only alternative consists in the assumption that humans are by nature perfect, i.e., undamageable measuring devices. In my view this hypothesis is problematic because in the real world where disturbances abound there are no such things as perfect instruments; i.e., they can always break down, in which case they must be repaired or replaced.
These, then, are in essence the reasons why I think that the hypothesis that humans have the capabilities of measuring devices is unrealistic; though it is logically coherent and though, when considered superficially, it has the appearance of a testable empirical hypothesis. Note that this is a variant of what I have called the Millean Quantity Objection (Trendler, 2009). Michell (2019) questions the power of the objection by stating that “mental phenomena are captured via experimental apparatus (viz. psychological test items), not with the precision physics displays, but with a useful degree of verisimilitude” (p. 141). This misrepresents the meaning of the objection: The question is not if psychologists need experimental apparatus to capture mental phenomena, but if humans themselves, as test participants, can satisfy the role of experimental or measuring machines, as prescribed by measurement theory.
On a final note, some clarifying words about the connection between scientific stagnation, measurability, and reproducibility may be permitted. My claim is not that in the history of psychology no real discoveries have ever been made. What I have in mind, when describing contemporary experimental psychology as a stagnant science, is what has been called the “neo-Galtonian research paradigm” (Lamiell, 2003, p. 185). Lamiell explicates that “what is actually analyzed through the statistical techniques proper to neo-Galtonian inquiry (i.e., the data analysis procedures issuing in the putatively explanatory models) is variation around [an] overall mean” (p. 185). 4 Since the advent of the so-called reproducibility debate, it should be clear to everyone that the number of real (i.e., replicable) effects claimed to have been discovered may be strongly inflated.
It is noteworthy that in the meantime the often scientifically questionable quality of “psychological knowledge” is also acknowledged outside the ivory tower of academia as a problem to be dealt with. As was already noted by Ziskin (1970): “psychiatric and psychological evidence … frequently does not meet reasonable criteria of admissibility and should not be admitted in a court of law” (as cited in Faust, 2012a, p. xiii). In particular, as a consequence of the introduction of the Daubert standard—which specifies guidelines for admitting scientific expert testimony—“there has been a dramatic increase in litigation concerning whether expert testimony in many different scientific disciplines should be admitted into evidence in courts of law. Psychological expert testimony is frequently the subject of such litigation, in both civil and criminal cases” (Petrosinelli, 2012, p. 36). The reason for this is the finding that, “[m]ental health professionals may claim that their field is a science, with all the weight and prestige connoted by that assertion. In many cases, however, the imputed knowledge of the discipline is based on foundations that are either nonscientific or represent weak or problematic science” (Faust, 2012b, p. 42).
In modern test theory, the problem of the lack of reproducibility and its connection to the question of measurability has been known for a long time. In particular, two scholars, Gerhard Fischer (1968, 1974) and Friedrich Sixtl (1980, 1981, 1982, 1985, 1993, 1998), have addressed the problem. Sixtl (1985), for instance, points out that the arithmetic mean—n.b., under the premise that the relevant psychological attributes are measurable (e.g., that numbers of items solved N is directly proportional to ability A)—“can indicate the real central value of a parameter” (p. 338) only if the influence of systematic disturbances is negligible. But since “every person represents a unique individual” (p. 338), it can be ruled out that systematic disturbances are in general under control. In consequence, whenever systematic disturbances are active, the mean does not represent the “true value” of a random distribution anymore, but it is “not further interpretable” (p. 322). 5
Furthermore, the means obtained by repeating the same experiment with different samples will unpredictably fluctuate depending on the unique composition of each sample. As Sixtl (1985) notes, depending on the distribution of the organism variable O in a sample, one can “produce almost any mean” (p. 321), so that with different samples even antithetical hypotheses may be found to be empirically “true.” The reason for this is that instead of depending on a specific value of O, the observed variations in reaction “depend on the distribution of the organism variable; they are therefore artifacts of the respective population or sample of individuals. This explains the lack in replicability of empirical findings in the behavioral sciences” (Sixtl, 1981, p. 63). 6 Accordingly, Sixtl calls the commonly shared view that the mean is “a reliable measure of a stable characteristic” (Speelman & McGann, 2013, heading 5), “the fundamental error of contemporary psychology” (Sixtl, 1998, p. 525) or the “myth of the mean” (Sixtl, 1993, p. 399). As argued, a solution to the problem of measurement intrinsically implies a solution to the problem of systematic error (Trendler, 2009).
These are, in short, the reasons why measurement matters. Unfortunately, the problem of measurability is not perceived as the primary cause of the failure to replicate, but what has been identified instead as the main issue is an inappropriate and dysfunctional use of established methods of statistical analysis (Asendorpf et al., 2013; Francis, 2012). Therefore, what will be found if the neo-Galtonian path is pursued—even if updated and refurbished (Asendorpf et al., 2013; Borsboom & Cramer, 2013; Epskamp, Rhemtulla, & Borsboom, 2017; Resnick, 2018; Zwaan, Etz, Lucas, & Donnellan, 2018)—is that the signals formerly believed to have been discovered, will eventually vanish in the noise. 7 But until then, many articles will be published, much taxpayer’s money will be spent, and great academic careers will be made and yet, justifiably so, the public’s perception of psychology (Ferguson, 2015) will not improve.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
