Abstract
The procedure of establishing a measure of an attribute consists of the assignment of numbers to objects whose attributes show some variability according to rules. These rules are chosen so that the assigned numbers contain some ‘information’ about the differing variants of the attribute. In the article, we discuss a heuristic, a scaling, and a representational approach. Within the heuristic approach, rules can be based on a verbal argument heuristically linking the variability in the attribute to differences in the measurements. In this case, the specific information that is represented by the measurements is very hard to determine due to the lack of a formal model. Within the scaling approach, a formal model is used to derive rules for the assignment of numbers to the variants of the attribute. From a scaling model, conclusions about the specific information assumed to be represented in the measurements can be derived. Both approaches depend on the assumption that there is something to measure, namely that the attribute that is going to be measured exists in a realm different from the numerical one. Within the representational approach, one tries to clarify what conditions must be met by an attribute to be considered measurable so that relations between the measurements can be interpreted as reflecting relations between the variants of the attribute. By specifying the conditions an attribute must meet to be measurable at all, measurement theory opens an alternative way to rules and thus to measurements. Following this approach, it is no longer necessary only to assume that there is some measurable attribute, but one can find out whether this indeed is the case. Moreover, the interdependence of the definition of an attribute and its measurability, as well as the role theory plays in defining certain attributes, can be clarified.
Introduction
Measure what can be measured, and make measurable what cannot be measured.
Although it appears doubtful that this quote could really be attributed to Galileo (Kleinert, 2009), it nevertheless addresses a fundamental scientific approach, at least in the natural sciences. The act of measurement is a central operation in science. Why is that the case and how can the obligation be fulfilled? In this article, we discuss the construction of a measurement instrument based on the representational theory of measurement for an attribute in order to discuss the role of measurement in theorising about the world. Why the representational theory of measurement? Because it is the established theory about the relationship between empirical attributes and numerical values that constitute measurements. The aim of the representational theory of measurement is to solve four problems, which we shall discuss in the course of our argument: a) Can an attribute be measured at all and what conditions must an attribute fulfil to be able to be measured? b) How freely can numerical values be assigned? c) Which propositions based on the measurements are valid also for the attribute? d) How can one assign specific numerical values to different expressions of the attribute in a practical way? As one can see from these questions, the representational theory of measurement is concerned with the fundamental questions that have to be asked when talking about measurements and how to establish them.
Following measurement theory in such a strict sense, measurement is based on a defined operation of comparing expressions of the attribute and results in numerical values which give information about the specific quality or quantity of an attribute of objects of the world. An attribute is defined by a certain comparing operation that results in measurements, allowing the empirical test of theories about that attribute. Thus, the question of measurability is also the question of which theoretical terms can be used in a meaningful way. Against this background, it is surprising that there appears to be only little debate on the theory of measurement in the social sciences. This is in striking contrast to the widely use of ‘measurements’ that is a tribute to the quoted approach with which we started this introduction.
In order to understand why this approach that advises to deal with aspects of the world by making them measurable is so important for science, it is necessary to understand what is meant by ‘measure something’ and by ‘make something measurable’ (for a discussion of these problems in sociology see Abell, 1968, 1969; Pawson, 1986; Saylor, 2013; for the same in psychology see Cliff, 1992; Mausfeld, 1993; Michell, 1997; Aftanas & Solomon, 2018). By doing so, one comes to realise the fundamental role measurement plays not only in deriving numerical representations of certain attributes of objects in the world but also with regard to the questions ‘What can be said about these attributes?’ and ‘What can count as an attribute at all?’. Since the question of how attributes are related to each other is fundamentally intertwined with what can be said at all about the attributes, the answers to these questions are fundamentally connected to any theorising about the world. At least, if one is going to agree that propositions about the relations of theoretical terms that represent distinguishable aspects of the world build the core of theories about the world.
For any social science that agrees to such a conceptualisation of theory, measurement is a core scientific concept just the same way as it is in the natural sciences. This does not imply that such a kind of social science is reducible to natural sciences because the theoretical terms need not to be derivatives of the theoretical terms from the natural sciences. Furthermore, the problem of measurement is not confined to the so-called quantitative branch of social science, where use of numbers is abundant and formal models of data are a very important tool. In the case of qualitative social science, one not necessarily needs numbers in order to label the categories of theoretical importance. Nevertheless, a categorical structure needs to be established and the different categories need to be labelled to indicate to which class of phenomena a certain case should be related to (for an example of classification issues from the history of sociology, see Holzhauser, 2015a). 1 Labels used for this purpose may be verbal but could also be numerical, since the numbers in this case are fulfilling (only) the function of naming a certain category, even if these categories can be ordered in a meaningful way. In the course of our argument we will see that the establishment of a categorical structure is a basic operation in the establishment of measurements according to measurement theory. Thus, the problems discussed are problems of any science looking for an empirical foundation of its theoretical terms, may it call itself quantitative or qualitative.
The interplay between theory, theoretical terms within a theory and the role of measurements therein can be analysed by means of a very simple example: constructing measurements for a certain attribute. We will use such an example to address the role of measurements in theorising about the world. By doing so, we are able to distinguish different approaches to measurement followed in the social sciences and to analyse fundamental problems associated with them. This will lead to a critical discussion of the use of the term of measurement and the use of measurements based on different concepts in the research practice.
In order to make it easy to follow this approach, an obstacle that may cause some more or less serious misunderstandings needs to be removed. The obstacle derives from the ubiquity of measuring instruments in everyday life. This ubiquity can lead to the impression that measuring something is really a very trivial operation. And it is (even though the handling of a measurement instrument can be far from trivial), but only, if you have a measurement instrument at hand that operates in a way in which the attribute to be measured is indeed measured by the measurement instrument. The supposed triviality of the whole operation very rapidly fades if one asks what is measured by a certain measurement instrument and insists that the answer could not be given by producing a verbal label (for instance ‘weight’) or a logically empty ‘explanation’ like ‘it is measuring what it is measuring’ (for instance ‘it is measuring how much I weigh’). It gets even more confusing if one asks what shall count as a measurement instrument (in the social sciences this seems often to fall together with the researcher or the person asking questions from a questionnaire) and how one can distinguish the measurement instrument from the person using it and from the object under observation.
To exemplify, let us assume we want to determine the height of a person. We can take a meter and measure the person’s height or we can just ask the person for it (which is, what is done for so many variables through questionnaires in the social sciences; for a discussion of an example of methodological problems associated with questionnaires, see Holzhauser, 2015b). The result may be, in both cases, – if we are lucky – the assignment of the same number. But only the first is a measurement of height through a meter. The second is an assignment of a number to a question, assuming that the person (or somebody else) did the measurement (or did not; without us and/or her knowing the procedure), with a measurement device (which we did not see, so maybe it has not been used at all), some time before (uncertain), and gives a presumed accurate account of that measurement (at least highly doubtable). Have you ever thought about why your doctor (if she or he is not lazy) does not just ask you for your height? It would be a lot easier, wouldn’t it? It should have been measured more than once in your life and known to you. But instead the (aid of the) doctor does measure it again probably at a bigger check-up (and yes, surprise, it changes a bit into different directions when we grow older – as children – and older – as adults). And, for the sake of the argument, to be clear here: reading the stated height of a person from her ID card does not count as a measurement of the height either – it would only classify as taking a number from one piece of paper and writing it to another. This may sound pedantic, but it is a crucial difference between measuring and not measuring but doing something else which we are also quite used to and often think and talk of as synonym to measurement. There is a reason why we trust physical measurement devices more than we trust what people ‘measure’, guess, assume or say. Telling people things is not measuring nor measurement. But even though physical devices seem to be naturally there to use them in our daily life, they are themselves constructed by the practical use of theory.
We believe, and this is one of the main issues this text is dealing with, that a non-negligible share of measurement problems in the social sciences are consequences of basic misunderstandings concerning the question of what a measurement instrument is, and what functions it fulfils in the research process, how it is constructed properly and how this construction is linked to the definition of what is to be measured.
Theoretically, measurement instruments do not result in measures but are based on measures that need to be established prior to the possible construction of the device itself. In other words, one needs to already have (defined) a measure (via one or more objects) to construct a measurement device. For example, James S. Coleman (1964) uses a balance to illustrate the measurement of mass and chooses one of the objects under comparison to be the ‘unit mass’ in order to establish a combination operation in his introduction. For a similar purpose of demonstration, we will use an even simpler example. Only if measurement instruments are based on measurements, they can be calibrated and only if that has been possible, measurement instruments are able to produce measurements in a strict sense.
Measurements in everyday life also usually come with units and the standard unit of length, the metre, is nowadays defined as ‘the length of the path travelled by light in vacuum during a time interval of 1/299 792 458 of a second’ (BIPM, 2019) – a certain distance that does exist in the real world, the length of the path travelled by light in vacuum during a time interval of 1/100 000 000 of a second would also be a possible unit, which is just not used to define the unit ‘meter’ for the attribute length. Throughout history there have been hundreds of thousands of different variant units to measure length as an attribute of things in the world (for a history of the meter see Alder, 2003). Length could be measured in other arbitrary units, other than meter, inch and so on, giving different numbers as a resulting length, but the comparison of the differences of the length of existing objects in the real world would still result in the same relations. Notice: While the units may be arbitrary, because they are just taken from a certain expression of the attribute for a specifically chosen object and thus result in different numbers assigned to different objects in different unit systems, the relations between these numbers in each system will not be arbitrary. They need to represent the relations between the objects in the real world. And in order to make sense of the concrete numerical value of a length measurement it has to come with a unit.
Although there have been so many different units and names for length, they all share two basic commonalities: first, all of them have theorised about and measured the same attribute of the world, which we call length (width, depth, distance). While second, none of them processed the measurement of this attribute without a comparison operation with given objects in the world. Since material things naturally occupy a certain space in the world, they do exist prior to and independently of their measurement. On the contrary, measurements as being representations of relational structures do not naturally exist. They can only be produced by establishing measurements, followed by the construction of measurement instruments which themselves depend on the same operation of comparison which is fundamental to measurement. This suggests that we can only define attributes of objects that are measurable based on the fact that they are comparable. The units in which length is measured are not ‘natural’ attributes of objects, but arbitrary choices singling out certain expressions of the attribute as being the ones given a measurement value of ‘1’.
But what if we do not see a natural material form of the attribute or maybe not see the object at all? What if we are not sure, if the attribute under investigation does exist at all, because we cannot access it (or at least not easily) for different reasons?
So, one central question seems to be how we measure something in the first place, which has no or a very unclear physical accessible form, such as psychological or sociological attributes. And how are we able to calibrate a measurement device constructed from such a measure? To change immediately the goal from discussing a physical measure such as length and its measurement instrument (e.g. a ruler) to measuring psychological and social phenomena would mean jumping to conclusions and would not be appropriate. If we look for answers to the question, if and how it is possible to measure (in a strict sense) the sorts of attributes that social scientists are interested in, we need to understand how measurement works in the first place, which is in the natural sciences. As a result, we need to clarify how to get measures of attributes before having measurement instruments.
Measurements are established through the definition of a number of procedures, which are used to compare the specific expressions of an attribute that itself is defined by these procedures. This means that the attribute itself needs to be at least to a certain degree theoretically understood or theoretically embedded so that a procedure of comparison can be described before one can construct a measurement instrument. Put simply, one needs to know what one is looking at (objects) and how one is looking at it (comparison procedure for different expressions of the attribute of the objects). Based on these comparisons, numerical values can be assigned to the different expressions in such a way that a numerical representation of the attribute can be constructed. Different types of numerical representations are usually named as different types of scales (nominal, ordinal, metric) and differ in the richness of information about the attribute. The type of numerical representation that can be constructed depends on the structure of the attribute, and the demands that an attribute has to meet in order to construct different types of scales can be formalized. This also means that in order to clarify what is meant when talking about the attribute one needs to specify a procedure by means of which objects can be compared with regard to the attribute. This procedure at the same time serves as an operational definition of the attribute.
From such a theoretical point of view, an attribute is defined operationally by a procedure that compares different expressions of the attribute, enabling a numerical representation of that attribute. The numerical representations of different attributes are then used in formal models of the world where dependencies or interdependencies of attributes are represented by mathematical structures, giving rise to formal theories about the world that in turn are connected to empirical observations by measurements. Hypotheses derived from such theories are then usually hypotheses about the expected values of certain measurements. If the operational definition of the attribute fails to produce numerical representations of the attribute and as a consequence the ‘measurement instrument’ does give rise to ‘measurements’ that produce meaningless numbers (meaning that in this case the structure of relations between the empirical expressions of the attribute cannot be represented in the structure of numerical relations between the numbers), anything that follows in the process of analysing and modelling these numbers is profoundly flawed. Measurements are what connects the world with theories and vice versa, and, moreover, what defines which attributes of the world are to be distinguished and thus can be theorised about in a meaningful way. No wonder measurements and their theoretical foundation are central to the natural sciences, and should be to any science, including the social sciences.
In the social sciences the common understanding of measurement differs markedly from this established and theoretically well founded meaning of measurement and merely refers to the assignment of numbers to objects according to some more or less arbitrary rules. We cannot discuss this point in more detail here but extended discussions can be found elsewhere (Cliff, 1992; Gane, 2011; Labovitz, 1972; Michell, 1997; Narens & Luce, 1993; Trendler, 2009; Trendler, 2013). The resulting ‘measurements’ are based on some kind of heuristic that makes it more or less plausible that the numbers contain some information about the relevant hypothesized attribute under investigation. It is assumed that the attributes exist before the attempt to measure them. They do not derive from an operational definition which is based on comparisons of the attribute in the first place but stem from the imagination of a theoretician or a practitioner, trying to understand some phenomena. Unfortunately, such a heuristic way of ‘measurement’ leaves it uncertain whether the proposed attribute to be ‘measured’ really exists, and, in the case where it does exist, how it can be measured without having constructed a sound measurement device.
Accordingly, there are some issues if a measurement procedure ‘produces’ ‘measures’ that represent something else than attempted (in the worst case nothing), or ‘produces’ ‘measures’, which one does not really know how to connect to well established measurements, and so on. Nevertheless, ‘measurement instruments’, like for instance psychological tests, questionnaires or social surveys, constructed under such a pragmatic approach may produce numbers that to some degree are helpful in some practical sense even if they are not measurements in the strict sense (e.g. Lampland, 2010). They may not be measurements, but they may possibly be used for prediction purposes or in the process of assigning different people to different tasks. Approaches in this tradition have been criticised by referring to the social consequences such ‘measures’ help to establish (Brighenti, 2018). But some more fundamental critique can be formulated by pointing out that these procedures are not sound in establishing measurements at all.
A critical analysis of the use of numbers in social decisions can be motivated by insisting that a number has to be a measurement to qualify as such. Creating a procedure which produces numbers is not sufficient to establish a measurement and consequently a measurement instrument. The procedure of producing numbers must deliver something much more important which is: appropriately construct a numerical representation of the structure of the world that one is aiming to investigate by theorising about the objects and their relations in the world. Thus, the question of measurability is in this sense also the question of which theoretical terms can be used in a meaningful way. Theorising about an attribute that fails to be measurable is at best speculative.
‘Measurement’ in the metaphorical sense may be used to pseudo-rationalise certain social practices and rituals by trying to evoke the impression of objectivity and ideology-free specification of differences between people or social entities in an empirically meaningful way. For instance, in psychology, personality tests are interpreted as if they describe certain attributes of individuals that constitute systematic differences between people. Analysed in some more detail, it appears that they just try to quantify the degree to which certain social labels are attributed to individuals. The social judgements depend on the way the person ‘speaks’ about herself or himself by means of completion of a questionnaire and the way others speak about people who are labelled in that way by means of the construction of that questionnaire and the selection of the appropriate statements that qualify as examples of that way (Buntins et al., 2016). In sociology, social status is surprisingly often ‘measured’ as a derived ‘measure’ through income (often defined as earnings from employment) and education (often defined as school degrees) of individuals without discussing whether such a derived ‘measure’ really can count as a measurement at all and what unit could be derived from the units of the basic measurements that are combined to produce the derived measurement. We cannot go into detail here, but at least we want to raise the sociologically interesting question of these measurements’ non-scientific functions, such as a pseudo-rationalised ‘measurement’ — without discussing the artificiality and social embeddedness that these sorts of ‘measures’ fulfil — as a social practice of science (for further discussion see also Brighenti, 2018; Lampland, 2010; Ogien, 2010), and raise awareness about the implications that come with this way of treating ‘measurement’.
However, measurements based on measurement theory are the basis of meaningful statements about the world. Meaningful statements and not-meaningful statements can be distinguished by specifying the richness of information that is available from the measurements by determining which type of scale can be constructed. If ‘measurement instruments’ are not based on sound measurements of operationally defined attributes they do not allow to distinguish meaningful propositions about the attributes of the world from speculations that may or may not be true or may not even have the status that allows to assign a truth value. We cannot touch all the problems of measurement in sociology or psychology and we do not intend to try, but as in the social sciences so little attention is paid to the basic question of what can really count as a measurement and how measurement instruments can be constructed, we aim to shed light on this basic question in order to clarify the interplay of theory, attributes, measures and measurement instruments.
This means we need not only to understand how to measure something with a given measurement instrument, but to clarify how to get measures of attributes before having established instruments too (and we will continue to do this through the very basic example of length, not because it is trivial – it is not – but because one can understand the principles of what we need to know theoretically to do it and to transfer the procedures needed into the social sciences).
While commonly the construction of a measurement instrument seems to be treated as a mere methodical or even practical concern – sufficiently well addressed as a practical procedure of assigning numbers to object according to rules –, in fact it is mainly a matter of theory. Theoretically, a measurement instrument is a device that assigns numerical values to objects in a way that these numerical values contain some ‘information’ about an attribute shown by the objects. The information which is contained in the numerical values depends on the rules that can be used to describe the procedure the measurement instrument follows to assign numbers dependent on the specific expression of the attribute. This also means that in order to clarify what is meant when talking about an attribute one needs to specify the procedural rules. Measurement theory (in a very broad sense) is concerned with these rules. To characterise measurement theory in such a broad sense includes ways of establishing measurements that lead to some very serious problems and it appears to be justified to question if numbers derived from these problematic procedures can be meaningfully called measurements at all. Therefore, a discussion of measurement theory in a broad in comparison to a narrow sense seems to be fruitful and an additional discussion of meaningful procedures and numbers appropriate.
Different approaches to measurement
The procedure to establish measurements for an attribute consists of the assignment of numbers to objects which show some variability in the attribute under consideration according to rules. These rules are chosen so that the assigned numbers contain some ‘information’ about the differing variants of the attribute that is to be measured. Thus, the rules establish a link between the attribute and the numbers which then are considered measurements of the attribute. If they are really linked, they allow conclusions to be drawn from relations between the measurements to relations between different variants of the attribute. In other words, the act of measurement is not the assignment of numbers but of meaningful numbers, and not of any or unclear meaning but of a theoretically defined meaning.
The last consideration leads to the answer to why sciences (at least the natural sciences) tend to value the establishment of measurements as an important part of the scientific endeavour. Specifying the relations between the measurements of different attributes constitutes a formal model of the relations of the underlying attributes in the world. This is seen as favourable in comparison to a purely verbal (heuristic) representation mainly because of two reasons: first, it is seen to allow a more precise characterisation of the kind of relation one assumes, and secondly, it appears to be easier to connect different theories by means of their formal models than by referring to verbal representations of the relations of the underlying attributes. Even if there is no theoretical account, formal models may be more informative in predictions, which may also be seen as a currency of scientific success.
There are different approaches to reach the rules that are needed to establish measurements. One can follow a heuristic approach, one can make use of scaling models or one can found the measurements according to measurement theory.
Following the first approach, the rules will be based on a verbal argument heuristically linking the differences in the attribute to differences in the measurements in a way that the measurements are not independent of the variants of the attribute. For instance, this path of reasoning is often followed when constructing questionnaires and using some kind of counting index as a measure of the attribute (for a different way of conceptualising what is going on see Buntins et al., 2016). In this case no formal model linking the attribute and the measurements is specified. The resulting measurements do not derive from an operational definition which is based on comparisons of the attribute in the first place but stem from the imagination of a theoretician or practitioner, trying to understand some phenomena. The measurements are established by arguing that it is reasonable to assume that variants of the attribute and measurements are linked in a more or less defined way. The specific ‘information’ that is represented in the differences between these measurements is at least very hard if not impossible to be specified due to the lack of a formal model that can be used to justify conclusions about the relation between certain differences between variants of the attribute and differences between the numbers considered as measurements. Nevertheless, it appears not to be uncommon to use heuristic rules as justification for establishing measurements (see for an example and some discussion Yusoff & Mohd Janor, 2014). At least in psychology this seems to be a very common practice, not only in the applied parts of this science, despite the serious concerns one may have about the heavy need of assumptions in this approach.
If the assumptions are specific enough to allow for building a formal model, the way in which an attribute and its measurements are linked can be specified by this model. Thus, it can be used to derive rules for the assignment of numbers to the variants of the attribute. From the model, conclusions about the specific information represented in the measurements can be derived. Such models are usually referred to as scaling models (for example Thurstone’s Law of Comparative Judgement: Thurstone, 1927).
For instance, a number of scaling models has been developed in psychophysics where the attribute (the sensation) is theoretically considered to be a function of a stimulus that itself can be described by measurements of physical attributes that have been established in advance. By specifying the so-called psychophysical function, one can derive a numerical representation of the sensations depending on physical stimuli and get an idea about the information that is contained in the measurements. Other examples of a more formal approach are the models from item-response-theory (for instance, the Rasch-model: Rasch, 1960) used in psychometrics.
The heuristic and the scaling approach are well-established research practices despite their dependence on assumptions that are not easily or not at all justifiable, even if they may be true. Nevertheless, measurements constructed under these approaches may be helpful in some practical sense. For instance, they can be used for prediction purposes or in the process of justifying differential behaviour in the context of social institutions that include the assignment of different consequences according to different measurements. To rely on measurements may in these cases evoke the impression of objectivity and ideology-free specification of differences in an empirically meaningful way and thus serve specific functions in social behaviour.
Both of these approaches have a fundamental problem that leads to our central question about the role of measurement in theorising. They depend on the assumption that there is something to measure. It is assumed that the attribute that is going to be measured exists in a realm different from the numerical one and that it is measurable. The first part of this assumption gets its justification from more or less specified theoretical propositions that state the existence of the attribute and give a more or less compelling argument for why this should be the case. The definition of the attribute is theoretically based and thought to be independent from the question of how measurements for the attribute can be established. Measurements for an attribute introduced this way are often established by constructing something considered as a measurement instrument following the heuristic approach (for instance, a questionnaire). The second part of the assumptions, namely, that the attribute is measurable, needs a bit more clarification and this clarification comes from measurement theory (in a much narrower sense than we have dealt with before).
Measurement theory (in its narrow sense) tries to clarify which conditions must be met by an attribute to be considered measurable so that numerical relations between the measurements can be interpreted as reflecting empirical relations between the variants of the attribute. Since measurement theory is concerned with the representability of the structure of the empirical relations by the structure of the numerical relations, it is – in this narrow sense – usually referred to as the representational theory of measurement (for more details and a formal treatment see Krantz et al., 1971; Luce et al., 1990; Narens & Luce, 1986; Roberts, 1979; Suppes et al., 1989; Suppes & Zinnes, 1963).
Thus, by specifying the conditions an attribute has to meet if measurable, measurement theory specifies what can be measured at all. The heuristic approach as well as the use of scaling models only specify how to choose more or less specific numerical values as measurements.
Moreover, measurement theory itself opens an alternative way to rules which can be used to assign numerical values and thus establish measurements. Following this approach, one needs not to assume that there is some measurable attribute but one is able to reach a conclusion whether indeed this is the case. Furthermore, we will see that the definition of an attribute can be founded on a sound base by referring to an operation that compares different objects with each other in a specified way. The interdependence of the definition of an attribute and its measurability as well as the role theory plays in defining certain attributes can also be clarified by following this route of reasoning. The conclusion we are heading to is that measurement is of such importance not because it is concerned with how to numerically represent attributes that we assume to be measurable but how to decide which attributes and therefore which theoretical terms are such so that we can use them in propositions about the world in a meaningful way (and what is meant by ‘meaningful’).
In the following section we are going through the steps needed to establish measurements based on measurement theory as well as construct a measurement instrument that can be used to produce these measurements. We use a very simple case as an example to demonstrate the principles that are important. We are going to see that the relationship between measurements and attributes may be exactly the opposite from what we expect it to be.
How to establish a measurement
A measurement instrument is a tool that connects the realm of empirical variants of an attribute with that of the numerical values that represent these variants. If a measurement instrument is used in the way it has to be used, the empirical expression of the attribute to be measured is confronted with an inbuilt reference expression of the very same attribute. The way the measurement instrument is constructed makes it produce a numerical value formally representing the variant of the empirical attribute. The confrontation procedure in this process is in the simplest case one that compares empirical expressions of the attribute to other empirical expressions of the same attribute built into the measurement instrument.
Comparing one object to another object in the world in a certain way is the basis of not only applying a measurement instrument but also of constructing one in the first place. A very simple example (Fig. 1) of what we are talking about may be illuminating, even if we are going to be breathtakingly informal:

A number of lines.
These lines are very simple objects, since, with little imagination we can see them as objects with only one distinguishable attribute. We distinguish them easily with regard to this characteristic by comparing them to each other. Imagine that there is no name for it yet.
Due to the simplicity of the objects, there are not so many procedures to possibly compare them. One of them would include the parallel alignment of the lines as the first step. How do we do that? We arrange them in a way that they do not intersect or touch at any point (even if we would extend the lines until imaginary infinity in either direction). In addition, we want our lines to be horizontal (but this is totally arbitrary, we could have chosen any other direction as well. The result looks like this (Fig. 2):

Lines aligned.
The second step in order to compare our objects involves the construction of a (virtual) line that is perpendicular to the direction in which we aligned the lines in the first step. This means our virtual new line intersects all the other lines (or their extensions into infinity) in a right angle (90 degrees). Then we let each line ‘start’ at this (virtual) line (Fig. 3):

Lines aligned and perpendicular to a common virtual line.
Now we can define our first empirical relation between a pair of lines (actually, we could have done that already after step 1, but it is easier to see now). We call them equivalent (meaning that they have the same expression of the attribute) when a second (virtual) line that is defined by the endpoints of the two of the lines we are comparing, is parallel (see above for the meaning of this) to the first (virtual) line. If that is the case for lines a and b, being two of our lines, we write:
If that would be the only way we could compare pairs of lines, we would only be able to construct a so-called nominal scale by assigning the same numerical values to objects equivalent to each other and not the same numerical values to objects not equivalent to each other. If we call f(a) and f(b) the numerical values (i.e. the measurements) of a and b, we could write:
Instead of stopping here, a second relation between pairs of lines that are not equivalent to each other (Fig. 4 and 5) can be defined on the basis of our comparison procedure. We call a line a being in an order relation with line b, when the second (virtual) line, defined by the endpoints, intersects the first virtual line on the side of line b, but not on the side of line a (Fig. 4).

A second virtual line helping with the order relation, version 1.

A second virtual line helping with the order relation, version 2.
If that is the case for a and b, we write:
We call a line b being in an order relation with line a, when the second (virtual) line, defined by the endpoints, intersects the first virtual line on the side of line a, but not on the side of line b (Fig. 5).
If that is the case for a and b, we write:
If we can establish not only an equivalence relation but also an order relation on our objects, we are able to construct a so-called ordinal scale by assigning the same numerical values to objects that are equivalent and if objects are not equivalent, we assign different numerical values in accordance with the order relation in the following way:
If we want to construct a metric scale, so that we can use the measurements in the usual formal models that allow us to perform calculations, we need something more than just the two relations we established so far. In this case, we need an operation in the empirical realm that is represented by an addition in the numerical realm. Following the reasoning of Helmholtz (1887) and Hölder (1901) an additive representation or a so-called extensive measurement requires an empirical operation that defines a specific way of combining two expressions of the attributes so that this operation is represented by adding two measurements in the numerical realm. This operation usually is called a concatenation.
If we concatenate objects a and b, we write:
With our lines, one way (but not the only one possible) of combining the objects that can be used as a concatenation, is as follows: if we take two of the aligned lines and let the endpoint of the first line be the starting point of the second line, we get a new line, that is a combination of these two lines, leading from the starting point of the first line to the endpoint of the second. Then, the resulting (or concatenated) line can be compared to other lines (concatenated ones or simple ones) the same way as simple lines can be compared with each other.
If there is a combination procedure, which can serve as a concatenation, we can assign numerical values in such a way that:
We now have what we need for a metric scale. Furthermore, usually there is one (often virtual) object (null) that is wisely assigned the numerical value of 0. This is the object that, if concatenated with any other object a, will result in assigning the same numerical value to the concatenated object as to the object to which it was concatenated:
If there is such an object, a ratio scale can be constructed by assigning numbers according to the rules described so far. The numerical values themselves will become interpretable, when one object (unit) is singled out and given the numerical value of 1, thus becoming the unit of that ratio scale. The choice of unit is arbitrary.
By constructing a so-called standard sequence by repeatedly concatenating unit with itself, we arrive at objects that can be used to calibrate a ruler, as a measurement instrument. The verbal label which we give to the attribute that has now become measurable is arbitrary. In our case, let’s call it ‘length’.
The relationship between measurement, attribute, and theory
We now have a relatively clear understanding of what is meant, when we are talking about ‘length’. We see that length is an attribute that is defined by how expressions of the attribute are compared to each other. We can imagine other ways to compare objects and that would lead to other attributes; how the objects are compared with each other defines the attribute we are talking about. If there is to be an attribute, there has to be an operational definition of this attribute by describing a specific way to compare the expressions of the attribute to each other.
We can see now that ‘length’ has been defined by making it measurable. As simple as it appears to be to measure height or length with established measurement instruments such as rulers in daily life, in fact it is simple only because one already has a theoretically and empirically founded measurement instrument at hand.
In other words, the way we establish measurements defines the way we construct the representation of the attributes of objects of the world and this leads to certain ways of theorising about the world. The different ways in which objects of the world can be compared to each other define the possible attributes that can be used to theoretically distinguish the objects from each other. Objects that cannot be compared to each other constitute members of different categories of objects that are defined by being not comparable in a certain way. Thus, the categorical structure fundamental to any theorising depends on the procedures used to establish measurements.
If we can find a way to compare the objects with each other and assign numbers to them so that for each and every pair of a and b the following three propositions are true, then the way to compare the objects with each other defines an attribute that can be measured on a metric scale.
Given that this is the case (and we chose null and unit according to the reasoning we presented above), we can construct a measurement instrument like the ones we routinely use in everyday life, which produces numerical values by comparing the empirical expression of an attribute with an inbuilt reference to the unit of the scale. With our lines, one of the simplest measurement instruments would be a ruler. The ruler is calibrated against lines whose lengths have been determined by setting a unit and assigning numbers to the objects so that the three propositions (10)-(12) given above hold true for each pair of objects.
In case only one proposition (10) can be made true for each and every pair of objects when compared to each other, the attribute can be measured on a nominal scale; in case propositions (10) and (11) but not (12) can be made true for each and every pair of objects when compared to each other, the attribute can be measured on an ordinal scale.
We even can derive the conditions which must be met by the attributes to be measurable on a given scale type. The solution of this problem is given by conditions that the relations that are determined by comparing objects must fulfil in order to be able to construct a certain scale.
Remember the nominal scale is based on the relation of equivalence. Such a nominal scale can only be constructed if the equivalence relation for the attribute holds true for the following conditions:
These three conditions have to be true for each and every triple of a, b and c. At first glance these conditions appear to be relatively trivial and one may be inclined to take them as naturally being true. Isn’t it obvious that in the case of (13) the attribute of an object has to be equivalent to that object’s attribute? It may very well be so, but only if the attribute is not changing during the establishment of measurements. This is a big issue in the social sciences where attribute expressions may change from one moment to the other or where it is at least very unclear what is happening to a certain attribute and its expressions while being measured. If the attribute and/or its expression changes with time, a at time t2 may not be equivalent to a at time t1. In the case of (14) this condition may not hold if the comparing procedure is not symmetrical and if interactions between the objects that change with that asymmetry occur. Condition (15) may cause problems if the boundaries between different expressions of the attribute are inherently fuzzy thus giving rise to inconsistencies with regard to condition (15).
Finding these conditions is important because they point out that an attribute may be measurable (on a certain type of scale or at all) only under some restrictions. There may be (and often are) restrictions such that conditions (13) – (15) hold only if some boundary conditions are met. Within these boundaries the conditions hold and thus the attribute is measurable on a certain scale. Boundaries may for example depend on space, time or category of object. If for example no boundary condition exists for space (or time), the measurements of the attribute are independent of space (or time) and can be compared across space (or time). Therefore, the importance of conditions like (13) – (15) does not primarily lie in determining whether they hold for a certain attribute (obviously this has to be determined under certain restrictions) but in determining under which conditions propositions based on the measurements are also invariant. Thus, these conditions are intimately connected to the question of to what extent a theory is valid before it becomes invalid.
Another problem measurement theory deals with is the question of how free we are in choosing the numerical values when constructing a certain scale. The answer is directly connected to the conditions about the relationship between the attributes and the measurements described in (10) – (12) and is usually presented by giving the permissible transformations that can be used to transform the arbitrary chosen numerical values assigned during the comparison stage of constructing the scale. The only permissible transformations are those that do not change the validity of the respective conditions, solving the first problem of whether the relations between different expressions of the attribute can be represented by the relations between the measurements.
For our nominal scale the definition of permissible transformations has to make sure that, if conditions (10) – (12) hold before the transformation, then they have to hold also after the transformation. That is the case if the transformation that results in the transformed numerical values f’ fulfils the following condition for each and every pair of a and b:
Having found the permissible transformations allows dealing with the question of which propositions based on the measurements are meaningful. We discussed that question before and pointed to the fact that only propositions whose truth value is independent of the particular numerical values chosen can be called meaningful. Only such propositions (meaningful ones) contain information about the attribute. We can now connect the solution to the question regarding meaningful propositions about the attribute to the question of permissible transformations: Meaningful propositions’ truth values are invariant across all permissible transformations. For measurements on a nominal scale such a meaningful statement could for example be ‘half of the objects show the same expression as object a’ or ‘none other than object a shows the same expression as object a’.
Having solved the fundamental problems of measurement for an attribute leaves us with the so-called ‘scaling problem’ which is concerned with how to assign specific numerical values to the objects. The theoretically sound method to do this is, for example for a nominal scale, to compare each and every pair of objects for which we want to assign a numerical value and determine whether the equivalence relation is valid for each and every pair and assign numbers according to the equivalence relation.
From the discussion about the criteria used to state equivalence (or the assignment of an order relation) with regard to the attribute, it should be obvious now that even in our very simple example one needs to have a theory about the space, in which our lines are located in, and its geometry. We can theorise the world in our simple example as a plane, in which the lines have only one measurable attribute, but still one needs to understand alignment, parallel, perpendicular, grade angles, intersecting, not to forget the whole mathematical treatment of these phenomena. The attribute of length cannot be defined without a reference to a whole bunch of theoretical propositions which are needed to define the way we compare these objects.
While length is a physical attribute with a physical expression which can be compared relatively easily, there may be other attributes such as weight (or mass) that are more complicated to define, since a different kind of way of comparing is needed. Sometimes, in order to be able to define an attribute, one has to use a setting (or a device in a certain setting, such as a balance in a gravity field) in which the effects of different expressions of the attribute can be observed, because the attribute itself is not observable. In this case the theoretical structure needed to give meaning to the ways of comparing and thus to the definition of the attribute may even be more complicated.
The numerical information that is encoded in measurements is quite different for different types of measurement scales (nominal, ordinal, metric). The actual numerical values on a nominal scale are meaningless, the only thing that matters about them is whether they are the same or whether they are different. The only information we can get from them about the attribute is whether expressions are equivalent or not. This is still not nothing; on the contrary, it is the basic operation that needs to be done, to be able to measure at all, because if we are not able to differentiate between objects with regard to the attribute, there is no distinguishable attribute that could be measured in any way. Even with the values from an ordinal scale, information about the attribute is very restricted, equivalency can be determined as with values from a nominal scale and additionally one can see whether an expression is ‘more of’, ‘greater as’ or whatever is defining the order relation, but that’s it. Here also, the actual numerical values are meaningless, meaning that one cannot calculate with them as one would naively assume for numbers. This numbers themselves get some meaning only if one is able to construct a ratio scale and defines the unit of that scale. In that case propositions about the actual numerical values are meaningful if the values are accompanied by their unit. This is the case with the usual and well-known physical measures like length. And that is the reason why these measures come with a unit.
The extent to which the numerical values of a given scale are meaningful determines those propositions that are meaningful with regard to the attribute from propositions that are informative only of the numerical values more or less arbitrary chosen. Only if the proposition’s truth value (which can be true or false, 1 or 0) is invariant with regard to the actual numerical values that have been chosen, the proposition tells us something about the attribute. If we are talking about an attribute that can be measured on a metric scale and a proposition’s truth value changes when we change unit, then the proposition tells us something about the chosen unit but not about the attribute that has been measured. Or to put it in other words, the proposition ‘The length of the line is 10 units’ is meaningless, because its truth value changes with the specified unit, while the sentence ‘The length of the line is 10 centimetres’ is meaningful. The set of theoretical and empirical propositions that can be regarded as meaningful statements about the attribute under consideration is restricted because it depends on whether the operations of comparing and concatenating can be defined in a way that results in meaningful numerical values (which are the numbers accompanied by the unit in which they are measured). Consequently, the construction of measurements restricts the systems of logically connected propositions we usually call theories in two ways. First, by defining the attributes about which we can theorise, and second, by restricting the set of meaningful statements about these attributes and their interconnections.
Moreover, if we examine the process of defining an attribute by constructing a measurement of it in some detail, we see that the construction itself is theory-laden. We need already theoretical insight and an understanding of the phenomenon and the system in which it is embedded to be able to define operations of comparison. We went through this process in some detail in the previous section by trying to construct measurements for a very simple attribute of very simple objects in order to now be able to see the dependence of the construction process on a whole set of theoretical terms (alignment, parallel, line, perpendicular, intersecting) that are needed to construct the measurement of length.
If we think a little bit further, it is possible to recognise that the whole theory of the geometry of the space in which we are going to measure lengths is intertwined with the process of defining the operations by which the attribute is defined. Length is not a simple attribute of objects per se but depends in its meaning on a whole theory of the space in which our lengths occur. Thus, the meaning of length is only understandable with reference to this theory. Given this important role of theory for the measurement of such seemingly simple attributes of objects in the world like length, shouldn’t it be much more of a task to understand and construct measurement instruments?
The good, the bad, and the ugly
In the introduction we distinguished between three ways of establishing measurements, the heuristic one, the scaling method, and the one based on measurement theory. We demonstrated the last one by using a very simple example. We did so in order to discuss the role of measurement in theorising about the world. We saw that the establishment of measurements as a central operation in the sciences – in a strict sense – is based on a defined operation of comparing objects. It results in numerical values, the measurements which give information about the specific quality or quantity of an attribute of objects of the world. An attribute is defined by a certain comparing operation which results in measurements, allowing the empirical test of quantitative theories about that attribute. Thus, the question of measurability is at the same time the question of which theoretical terms can be used in a meaningful way. Although social scientists seem to assume the process of measuring as being somehow more complicated in the social sciences, they tend to agree with this reasoning and ascribe authority predominantly to quantitative measures (Espeland & Stevens, 2008; Goldthorpe, 2001). Simultaneously, in the common practices of social sciences such as psychology and sociology there appears to be an overwhelming dominance of heuristically based measurements and a widespread ignorance of measurement theory (for more detailed accounts on this problem, see for example Cliff, 1992; Gane, 2011; Labovitz, 1972; Michell, 1997; Narens & Luce, 1993; Trendler, 2009, 2013).
Measurement is usually performed by means of a measuring instrument. As mentioned before, the omnipresence of such devices may lead to the impression that measurement is a relatively trivial operation. This indeed may be true if there is an established measurement instrument available. But what if it’s not the case? We have seen that the development of such devices is far from trivial and intertwined with the development of theory. We believe – and this is the reason why this text is dealing with it – that a lot of theoretical problems in the social sciences are consequences of basic misunderstandings concerning the question of what constitutes a measurement, and which functions it fulfils in the research process.
The common understanding seems to be that measurement is sufficiently well addressed as a practical procedure of assigning numbers to objects according to more or less arbitrary rules, often derived heuristically. Consequently, a major problem appears to be that the construction of measurement instruments does not seem to be treated as a theoretical concern, but more as a practical one, while in fact it is ‘a problem within the very fabric of […] theory’ (Abell, 1968, 1969). In other words, the act of measurement should not be the assignment of numbers but the assignment of meaningful numbers, and not of any unclear meaning but of a theoretically defined meaning. The answer to the question of how numerical values are assigned alone does not solve any of the fundamental problems of measurement. It merely constitutes an illusion of measurement or to make use of a term introduced by Richard Feynman, it is an example of ‘cargo cult science’ (Feynman, 1974).
In conclusion, the obligation to make measurable the theoretical terms from which we started restricts the set of theoretical terms to the ones where measurement is possible. Following our line of reasoning, the conclusion reaches out even further to the point that attributes that are not measurable are not even attributes at all since they haven’t been defined properly. If the introduction of theoretical terms is freed from the obligation to measurement, theoretical terms that have no reference to the world may be introduced, possibly leading to systems of propositions (a.k.a. theories) without any empirical meaning. The resulting science would be primarily concerned with the use of theoretical terms, weighing the pro and cons of using one term or the other, looking for consistent systems of propositions containing these theoretical terms and may show abundant theoretical discussions without the possibility to empirically decide any theoretical dispute.
If such a science would try to copy the successful model of the natural sciences with its formal models and calculations, it would be destined to also establish ‘measurements’ since without them, there is nothing that can be calculated and it would be obvious that there is no empirically meaningful interpretation of their formal models. Such a science would be inclined to ignore measurement theory since it would very possibly have its theoretical terms questioned and they would very possibly fail. So, other ways of establishing ‘measurements’ would be needed and the way to them would be the heuristically justified construction of ‘measurement instruments’ that produce numbers in a more or less regular way, eventually leading to the claim that measurement indeed is nothing else than a rule-obeying assignment. The discussion concerned with these instruments would be centred around criteria like reliability or validity, ignoring the problem of their unfoundedness as measurement instruments and whether there is anything at all that can be measured (for a discussion of some of the aspects mentioned here see Buntins et al., 2017; and Borgstede, 2019). If parts of that science would be a bit more sophisticated, they would prefer the use of scaling models, maybe trying to argue that – due to the inclusion of a formal model – this shares the same dignity as establishing a measurement following the demands of measurement theory, criticising maybe the sloppiness of the heuristic approach. Even the concept of the operational definition of attributes could be brought forward – although not very seriously – in that it would not lead to the conclusion that the attribute that is measured in a certain procedure (for instance, in a test) is nothing more than the way in which the subject reacts to that procedure (which would be a perfectly valid statement with regard to the demands of an attribute posed by measurement theory).
The formal models of such a science would very likely be models of the data produced by the so-called measurement instruments, preferably statistical models used for prediction purposes. This would also reflect the fundamentally practical interest the heuristic approach is usually following. The models would not be meant to be formal representations of theoretical propositions about the world but formal descriptions of the structure of the data used for practical reasons.
Even if we do not really know what such a science would be talking about, it still could be of some benefit. As mentioned before, the models from and about the data may be of some use in prediction. The establishment of measurements even in such a sloppy way may be of some help in justifying social practices and rituals by producing a kind of pseudo-rationalisation by trying to evoke the impression of objectivity and ideology-free specification of differences between people or social entities in an empirically meaningful way.
One may argue that a science that operates like the one described above is just not yet developed enough to be able to establish measurements in a sound way and one may point to the measurement problems that the natural sciences struggled with as well, on their way to success (for instance see Chang, 2004, for the historical accounts to ‘measure’ temperature). That may be the case and if so, we would like to argue that the progress to be made in such a science may be achieved by recognising the importance of measurement theory and its role in theorising about the world.
Let’s be pragmatic, shall we? – Well, we don’t think so
Why delve into this so deep? Why not just make a rule that gives numbers and go on?
Well, we can do that and often it is helpful in a more practical sense. Numbers in bank accounts are useful in assigning different consequences to different (groups of) people. Grades are useful in assigning different consequences to different students. Psychometric tests are useful in assigning different consequences to different people. Ratings are, etc.
Lots of social practices include the use of numbers that result from the application of certain rules. These numbers constitute equalities or inequalities between subjects or objects, they constitute ordinal structures like ranks or hierarchies between subjects or objects and they behave as if they were measurements of some kind of attribute. Whether they are or whether they are only numbers that may (or may not) show some kind of covariance with other attributes is the question of whether these numbers are measurements or whether they are (only) rule-derived numerical values which may (or may not) contain some indeterminate information about some undefined attribute of these subjects or objects and as so are pseudo-measurements.
Let’s have a look at a very common social practice with the background knowledge we just discussed. Let’s talk about grades. In Germany, the grades in schools usually range from 6 (‘well, not really good’) to 1 (‘well, it’s ok’ – well, we are obviously kidding with the descriptions of the expressions of the attribute to be measured here –, so, what were we measuring again? You get the idea and the problem? or ‘Very good’). The grades in different years and different classes are usually added and the arithmetic mean of such grades is of major importance in regulating access to educational resources (and further on to economic and social resources). But by adding grades and calculating the mean, it is implicitly stated that grades are metric measures of a one-dimensional attribute that differentiates between people. Furthermore, it is implicitly indicated that the differences between grades following each other in the ordinal structure correspond to the same differences in the attribute. By adding grades from different times, classes and teachers, one implicates that the conditions that are needed for a metric measurement of the proposed attribute are invariant with regard to variations in these boundary conditions. By comparing the mean of grades from different schools, federal states and so on, the same invariance is implicitly stated with regard to these possible boundary conditions.
None of these implicitly stated propositions is easily defendable by anyone.
Isn’t that a strange situation? Well – we think so.
We tried to argue that the justification for using numerical values in such cases should not include the claim that something has been measured. Such a claim would definitely be (to say the least) extremely difficult to defend. In our humble opinion social practices with important consequences need to be justified by reason. Measurement theory may be of some help to identify cases in which such a justification is still missing even when numbers, models and calculations try to obscure this fact.
Footnotes
Acknowledgements
We would like to thank the anonymous reviewers for their appreciation, criticism and recommendations, which helped to sharpen the argument and improve the article. Thanks to Andrea Mubi Brighenti and Peter Wagner for their effort, the opportunity to publish this article, and their patience with the authors.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
