Abstract
It is often argued that an order exists in a cross-tabulation when the table’s margins have such a structure. We can free ourselves from this point of view and clearly define an order on the table itself. As Louis Guttman noted previously in the case of scalogram analysis, one must often move rows and columns about to be able to create a scale. In this case, it is the order of the table’s structure which induces an order on the margins and not the reverse. However, Goodman and Kruskal, when they proposed the gamma index that defines the strength of an association in the ordered case, only use the margins’ order, and they have since then been followed by most researchers. One should return to the original intuition of Guttman and show that at least an approximate order is almost always present in a table. The ordered cross-tabulation generated by ordered questions is only one case among many others and conversely a table with a strong order structure induces an order on question modalities. With real examples, we show that the criteria are available to define an order on a table, that there are formalized methods to reveal the associated structure, that there are also different indices to measure the degree of association, and finally that there are tests to assess the level of significance.
Measure the degree of association between rows and columns in a cross-tabulation is an issue that has been discussed for more than a century. If we retain only the methods still used, we find Karl Pearson’s contingency coefficient (1904), Tschuprow’s coefficient (1925) and Cramér’s coefficient (1946). This issue was addressed in a series of four articles in the Journal of the American Statistical Association by Goodman and Kruskal (1954; 1959; 1963; 1972) who then assembled the articles together it in a book (1979). On the other hand, there is a question that has engendered little research: if there is an order structure on the rows and columns, then does the resulting table have a particular structure that can be identified? This question is addressed in this article’s first section, before the question of the degree of association. The result we obtain is a unification of all types of cross-tabulations, ordered or not, in a single type of table where cross-tabulations differ only by the intensity, whether significant or not, of the order we found. 1
Order Structure of an Ordered Margins Table
Louis Guttman in the fourth volume of The American Soldier (1950) laid the foundations of scalogram analysis with which he ordered a table crossing answers and individuals, seeking by alternative shifts between rows and columns to achieve a homogeneous form which he deduces an order he called a scale. It is the ordered table that induces an order on the margins and not the inverse. This technique was further developed by Bertin (1967). Before returning to this idea, we show through examples how the problem arises.
Let us suppose we have a 2 x 2 table AiBj where margins Ai and Bj have a defined order structure as follows A1 > A2 and B1 > B2. Consider the following example:
The margins are ordered, but the table itself does not necessarily have an order structure, as in the following case:
Indeed, for the first cell A1B1, we see that the product margins divided by the total equals 56, the expected frequency. We are therefore in the case of independence between rows and columns, yet ordered.
To change this situation, you can either add or subtract an individual in the cell A1B1. If added, it is assumed that there is an attraction between A1 and B1, and therefore, since this is a fixed margins framework, an opposition between A1 and B2 and between A2 and B1 and finally another attraction between A2 and B2. The elementary displacement is the following:
The table resulting from this change is as follows:
This table has an order structure that is defined by the structure of the signs of deviations from independence. Before considering a definition of the order structure, we can say that a 2 x 2 table is ordered when one of the diagonals has positive deviations and the other negative ones.
We can repeat elementary changes several times, but a maximum is reached when the cell A1B1 is 80 because of the margin constraint (and the A2B1 cell is thus equal to zero):
One has thus made 24 elementary changes or shifts that corresponded to a progressive decrease in the A2B1 cell from 24 to 0.
The elementary shifts with reverse signs leads to a table that has the reverse order for A and B:
And the maximum association in opposite direction of the association between A and B is:
In this case, 36 elementary shifts were necessary. There is a total of 60 shifts to which must be added the case of independence; that is to say, 61 possible situations (which corresponds to the smallest margins + one unit). As one cell defines the entire table for one single degree of freedom, we can summarize all possible cases as follows, taking as reference A2B2:
Maximum association Independence Max. inverse association
If A2B2 is between 37 and 60, the table is ordered in the direction of A and B, if A2B2 is between 35 and 0, the table is ordered in the opposite direction of A and B. All of these tables (except for independence) have an order structure defined by the two elementary displacements and their possible repetitions.
As the situation of independence plays a central role, we can, by subtracting the value of independence from the previous scale, consider the scale of deviations from independence, always for the A2B2 cell:
Max. association independence Max. inverse association
Consider for example the table of deviations for A2B2 = 50:
And the table for A2B2 = 30:
This scale also provides us with an index of intensity of the link or association by indicating a deviation from the maximum. For the cell A2B2 = 50, the difference is 14, compared to a maximum of 24, and is therefore 14/24 x 100 = 58.3 percent of the maximum, an index which will be called percentage of maximum deviation from independence or, in French, the PEM for Pourcentage de l’Écart Maximum (Cibois, 1993). For the cell A2B2 = 30, the maximum deviation is in the negative direction, -36, the difference is -6 which is -6 / -36 x 100 = 16.7 percent of the maximum, which is by convention given a negative sign to indicate that it is a negative deviation.
As we have shown (Cibois 1993), it is possible to extend this procedure of looking for a PEM to each of the table’s cells. All that is needed is to isolate the cell for which we want to know the intensity of association and reorganize all the other lines in a single line and all the other columns in a single column, all of which comes back to the 2 x 2 table.
In conclusion, as soon as a 2 x 2 table in not a situation of independence, it always has an order structure identifiable by the existing margin order. Since in general, the situation of independence rarely occurs with observed data, one can say that the order structure is practically the general case.
A Real Example - London 1911
We will now work on a real table from Kendal and Stuart (1961: 558), showing the results of a survey made in London in 1911 (London noted 4x6 2 ). The table shows the distribution of 1,725 school children who were classified (1) in rows according to their standard of clothing (Very well clad, Well clad, Poor but passable, Very badly clad), and (2) in columns according to their intelligence (Very able, Distinctly capable, Fairly intelligent, Slow but intelligent, Dull, Mentally deficient or slow and dull), respectively:
The following table is 3 rows and 3 columns (noted London 3x3) obtained by combining the columns in pairs and lines 3 and 4:
London 3x3 can be decomposed into the sum of two tables corresponding to independence and the deviations from independence:
Around the first diagonal where the differences are all positive (in bold), all differences are negative. However, the notion of a diagonal must be specified if the number of rows and columns are not equal, and even in the event of equality when the margin structure has distorting effects.
Number of Rows and Columns are Different
Let us form a new table (London 2x3) where the columns are grouped as above and 2-4 lines are grouped. We have the following decomposition:
We see that in column 2, the positive deviation is in the second row.
Constrained Margins
A new table, London 3 x 3 (London33B), is made, keeping the same column grouping but by making less balanced lines in the margins. It includes lines 1 and 2 (now CladA) and left the two remaining lines identical (POOR becomes CladB and VBAD becomes CladC). We have the following decomposition:
We see this time that the effect of the diagonal is still present, but it has been deformed (positive differences in bold). The high weight of the margin CladA pulled the diagonal of positive deviations to the right and upward. So, for reasons of size or for reasons of margin constraints, only the extreme diagonal cells (for a diagonal following the margin order) have always positive deviations. To go from one end to the other, the path of positive deviations may deviate more or less from the diagonal; positive deviations are always contiguous (laterally) or adjacent (diagonally). It is the existence of this “ridge” – where there are the positive deviations isolating all the negative differences – which will be the definition of a table with an order.
Definition: a table has an order when the diagonal, which connects (by lateral contiguity or diagonal adjacency) the extreme cells defined by the margin order, has positive deviations from independence. Negative deviations are on both sides of the diagonal.
Finally, let us decompose the original table of London 1911 whose deviations from independence are:
A “ridge” runs clearly from both ends of the diagonal, and it is more or less wide. All positive deviations situated on this line are contiguous and/or adjacent; all the negative deviations are located on either side of the ridge.
Reciprocal Situation
Let us now look at the problem initiated by Guttman and ask the reverse question: if we find an order structure in a table, what does this imply for its rows and columns? Take for example the following table: it is a table from a survey of political and union opinions of French workers in 1970 (Adam, 1970) from which we extract a table of confidence in unions depending on the union chosen during voting on the job.
The table rows are ordered respectively (“To defends your interests in labor disputes, you’re Very confident in them, Somewhat confident, Not confident, Not confident at all”), but the columns are responses to the question “in case of union elections in your firm, would you prefer to vote for a list led by FO (“Force Ouvrière”), CFDT (“Confédération Française Démocratique du Travail”), non-unionized workers, CGT (“Confédération Générale du Travail”), an autonomous or independent union, CFTC (“Confédération Française des Travailleurs Chrétiens”), you not vote at all?” Here is the observed table and the table of deviations from independence:
Graphically highlighting the positive deviations from independence, one can note a similarity of profiles between CGT and CFDT (positive differences for high degrees of confidence), between FO and CFTC (positive differences for intermediate degrees), and between Autonomous, non-unionized and non-voters (positive differences for the lowest degrees of confidence). We can reorder the table so as to find the order structure previously defined where positive deviations partition the table around the first diagonal, the negative differences being on either side:
It remains to define the order of the columns: as in France, the two unions CGT and CFDT are the protest unions while CFTC and FO positions are less radical and independent unions are most often unions created by employers and used to oppose union protests, we can reinterpret the question based on the responses obtained. The order on the unions shows the degree of opposition to the established order (Cibois, 1984: 20-21).
Searching for an Order Structure
The previous problem was particularly simple since there was already an order structure on the rows, and it was enough to make a few permutations on the columns to reset the order structure of the table. To address the generalized problem, we will use the technique of correspondence analysis since Benzécri (1976: 279-80) shows that if there is an order structure on the rows and columns, the first factor of a correspondence factor analysis manifests that order. We can verify it with the table above in the following bi-plot (Figure 1) with the first factor as the horizontal axis and second factor as the vertical axis.

French workers’ opinions on unions with the first factor as the horizontal axis and the second factor as the vertical axis.
We can complete this graphic with the representation of the intensity of ties by computing all PEM positive cells and then connecting the dots with a line whose thickness corresponds to the strength of the PEM (Figure 2).

Visualization of the strength of ties for local PEMs in the factorial plan.
With this example, we can specify the procedure for calculating the PEM for a cell (such a PEM is called “local”). We seek the degree of attraction between the row “very confident” and the CGT union. We reduced the table to a 2 x 2 table in which one can operate as before:
Observed deviation from independence is 37 – (208 x 318 / 844) = 58.9. Deviation from independence in the maximum case is 208 - (208 x 318 / 844) = 129.9. Local PEM is 58.9 / 129.9 x 100 = 45.3 percent. We proceed in this way for each cell of the table:
The order structure of the table indeed follows the correspondence analysis first factor (horizontal axis).
We now have a procedure for finding an order structure for any cross-tabulation. Now let us consider the degree of association.
Degree of Association
We are looking for an indicator giving us the degree of association between the order of rows and the order of columns. We start with the work of Goodman and Kruskal (1954) who took on the problem completely and proposed indices of association that were no longer based on the chi-square because “The fact that an excellent test of independence may be based on χ 2 does not at all mean that χ 2 , or some simple function of it, is an appropriate measure of degree of association” (1954: 740). Then we criticize this index and propose a generalization of the PEM.
Goodman and Kruskal’s Gamma
To present this indicator, we will reuse the London data first as the 2 x 2 table as follows:
On such a 2 x 2 table, Yule (1900) had defined a coefficient of association using cross products (850 x 219 = 186,150 and 119 x 537 = 63,903); if they are equal, there is independence and association coefficient Q is the ratio of the sum of their difference:
Goodman and Kruskal return to this idea of using the cross products: they call concordant pairs the cross product of the first diagonal 850 x 219 (and the symmetric 219 x 850) which, when moving from one cell to another, the rank order rises for both rows and columns. Symmetrically, they call discordant pairs when the rank order increases for the lines, but decreases for columns (or vice versa). This is the case in the second diagonal where from 119 to 537, we are going from cell Clad-Inf – Intelligence-Sup to cell Clad-Sup – Intelligence-Inf: we go onward in the order of clothing, but downward in order of intelligence. It is therefore a case of discordant pairs. Formally, Goodman and Kruskal (1954: 749) define the cases (in proportion) as follows:
Πs= Pr {a1 < a2 and b1 < b2; or a1 > a2 and b1 > b2}, same order
Πd =Pr {a1 < a2 and b1 > b2; or a1 > a2 and b1 < b2}, discordant order
Πt = Pr {a1 = a2 or b1 = b2}, ties are equal.
The case of equality here corresponds to the pairs 119-850, 119-219, etc. and pairs corresponding to the identity 119-119, etc. They are not taken into account in the calculation of Gamma.
Goodman and Kruskal's Gamma is defined as γ = (Πs - Πd) / (Πs + Πd), which in the case of a 2 x 2 table corresponds to Yule’s Q. But they generalize: to understand what happens, let’s return to data of London 2 x 3.
If we take the pair of cells in opposition in the first diagonal, we see that if we start with 268, compared to 233, it goes in the order of rows and columns. But it is also the case for 268 to 322 and 620 to 233. Let us visualize these concordant and discordant pairs:
We calculate Gamma from product values of concordant pairs and discordant pairs, and we have:
The rationale for these calculations by Goodman and Kruskal is as follows: Suppose that two individuals are taken independently and at random from the population. Each falls into some (Aa, Bb ) cell. (…) If there is high association one expects that the order of the a’s would generally be the same as that of the b’s.
Taking the products of the concordant pairs is equivalent of counting the pairs of individuals in situations of order, and making the products of the discordant pairs is equivalent of counting the pairs of individuals that are not in a position of order. The more the situation resembles that of the total order, the greater the association. Several coefficients use counts of the number of pairs: Kendall's Tau, Stuart’s Tau-C, Somers’ asymmetric D. When the rows and columns do not have an order structure, these techniques cannot be used and we observe that users often return to indicators derived from chi-square, despite of the criticism of Goodman and Kruskal.
The difficulty with this procedure is that the search for concordant and discordant pairs does not take account of the observed structure of the table and is based only on the order of rows and and columns, whereas an order structure may exist. We overcome this difficulty by ordering the rows and columns with the first factor of a correspondence analysis, which always gives an order that we can use for developing an index of association derived from the PEM.
The Global PEM
The proposed general association coefficient is a measure of association between rows and columns and assumed that:
If an order structure is known for the rows and columns, it can be observed empirically and conversely.
If an order structure has not been identified, it may however exist, even if the order is not very pronounced.
The coefficient can be used to determine the degree of the association for a table cell and for the entire table.
As recommended by Goodman and Kruskal above, it will not use the chi-square.
Its value will be zero in case of independence.
It will vary between -1 and 1 from dependence in one direction to dependence in the other (the sign is conventional). Values close to the maximum must correspond to situations that occur empirically.
The index values must be comparable from one table to another, even if they differ for the size of the populations or concerning the numbers of rows or columns.
The principle of the coefficient should be simple to understand, even if it is the result of lengthy operations that cannot be done by hand in the elementary cases.
As the situation of independence is well defined and is still indicating no association, as a principle to measure the association, we consider (in the logic of local PEMs) the ratio of the sum of the positive deviations from independence observed, to the sum of the positive deviations in the case where the link would be at its maximum.
Let’s consider the table ordered by the first factor of the correspondence analysis of the survey on confidence in unions. Returning to the table of deviation from independence, we see that the sum of the positive deviations from independence is equal to 176.26.
We must now define the maximum. We have an ordered table of which we retain only the margins: by the fact that the table is ordered, the diagonal of the positive differences either starts from the attraction between CGT and “very confident”, or from the cell Non-voting - “no confidence at all”. The choice of starting point is irrelevant and leads in both cases to the same result. Let starts from the cell at the top left. All CGT, which are 317, cannot be “very confident in the unions” because the corresponding margin is only 208, but conversely all “very confident” can be put in the CGT cell. There remain 317 - 208 = 109 CGT that we will put in the adjoining cell (laterally) the nearest “somewhat confident”. The table will be as follows:
All numbers in row 1 and column 1 are now distributed. In line 2, there is 337 (margin) - 109 (CGT) = 228 “reasonably confident”. They can be divided into CFDT (87), CFTC (23), FO (78); there are 40 that will be put in “autonomous”. The entire second line is placed, and we go back to the column where there are still 101-40 = 61 autonomous to be placed which we will place in the adjoining “not very confident” cell. All autonomous are placed, but there are still the 129 - 61 = 68 “not very confident”, which will be placed in non-unionized workers, whose 23 and all remaining non-voters will be “no confidence at all”. This gives the final table (which could be obtained with the same algorithm starting from the “Non-voting” - “Not at all confident” cell. The solution is unique and the algorithm is used in the program in the Appendix.
We can easily verify that the sum of the positive differences from independence in the case of this maximum table is 464.53. The global PEM is the ratio of the two sums (positive differences observed, differences in the case of maximum), in percentage: 176.26 / 464.53 x 100 = 37.9 percent.
On real data from tables, as it is always possible to order the data according to the first factor of the correspondences analysis, we can say that there is always a structure of order and it is always possible to calculate a global PEM. This result may seem hazardous because in some cases, this order can be entirely due to a random structure of data that are not actually ordered. We are going to confront this situation in the following case where we know a priori that there is no order on the rows and columns.
Compatibility of Astrological Signs for Married People
We study a case where the order structure is absent and we submit it to the procedures for searching for an order structure. Below is a table that was constructed to show the meaninglessness of astrology (data presented in Cibois, 1997). For a population of 68,000 married couples, we construct a table of 12 rows and 12 columns, the rows corresponding to the astrological signs of the men and the columns for those of the women. We note at the intersection of a row and a column, the number of couples for given signs.
If we do a correspondence factor analysis of this table, one could be disturbed by the factor graph (Figure 3) that highlights similarities between signs which are outlined below by ovals. Indeed, 10 out of 12 signs are nearby (the only clear exception being the sign of Taurus, Aquarius is less clear)

Astrological signs for married people – Diagonal effect.
This can be explained if we examine the deviations from independence: here we retained only the positive deviations greater than 9. We see that all deviations from the diagonal are positive, which explains previous proximities:
However, if we look at all the individual PEMs, we see that these diagonal deviations are in the same order of magnitude as the others and they are also fewer:

Astrological signs for married people – All PEMs.
If the first factor (horizontal) of a correspondence analysis really offers an order, the order corresponds to a virtual absence of ties because the global PEM is equal to 2.0 percent. Another clue that the table is very close to independence is provided by the first eigen value of the correspondence analysis which is very low and equal to 0.0006. As for the diagonal effect, it is explained by the fact that people who believe in astrology believe that people of the same sign attract each other. It is therefore a small but self-fulfilling and noticeable effect.
Before studying the problems of significance of the results, let us compare the different indices used for different examples discussed here:
The Gamma index has been calculated on the last two tables, assuming they had an order structure obtained by the first factor of a correspondences analysis. In this case, Gamma yields results similar to those of PEM which are equally interpretable.
Although criticized for its use of the chi-square, Cramér's V reacts like the other indices, but on another range; like the others, it is very weak when it is close to independence (zodiac signs).
We call the Cramér percentage the index defined by Cramér himself and not as it has been interpreted by other authors who then took the square root of it. Indeed, Cramér said (1946) that “φ2 / q-1 [q is the smallest dimension of the table] may be used as a measure, on a standardized scale, of the degree of dependence between the variables” (1946: 282). The proportion of this maximum can be read as a percentage. We note that this index is very pessimistic.
The significance of the PEM depends on the significance of the table from which it came. When the PEM is calculated on a non-significant table, we cannot exclude the situation of independence and therefore the nullity of the PEM. This is the case here for the last table.
Concerning the ranges of PEM use, experience shows that interesting PEM's range between 10 percent and 50 percent. The stronger ties are often indicating a redundancy between indicators. When the tie is less than 10 percent, it may be the result of chance and the chi-square test can show this.
Conclusion
The global PEM can be used as an indicator of the intensity of a link between rows and columns in all cross tables. If there is an existing order on the rows and columns, it will be found by the first factor of the factorial analysis of correspondences. If this is not the case, it may be necessary to challenge the order defined a priori or understand why there is a difference. If we trust the order defined previously, we can then use it to calculate the maximum table. If the order determined by the first factor of the factorial analysis of correspondences is not interpretable, we are then in a situation related to a structure of random deviations and the PEM will probably be small and the table not significant in the sense of the chi-square.
The PEM has the advantage of not being calculated with an index derived from the chi-square (in opposition to Cramér's V). It does not assume there is an order on the table margins, but identifies such an order, if there is one (in opposition to indices calculated from matched pairs). The minimum corresponds to independence and the maximum is well defined and is realistic in the sense that a value close to 100 percent can actually be observed (if we cross two indicators of the same dimension). It does not depend on the size or the number of rows and columns. For detailed analysis of a table, it can be used for each cell in its local version. And it’s easy to understand.
The PEM is available in the Trideux 3 and Modalisa 4 softwares. Programming does not pose special problems: one will find in the Appendix the program in R made by Nicolas Robette. 5
