Improving Likert Scale Raw Scores Interpretability with K-means Clustering

Abstract

French

In this article, by applying k-means clustering, cut-off points are obtained for the recoding of raw scale scores into a fixed number of groupings that preserve the original scoring. The method is demonstrated on a Likert scale measuring xenophobia that was used in a large-scale sample survey conducted in Northern Greece by the National Centre for Social Research. Applying split-half samples and fuzzy c-means clustering, the stability of the proposed solution is validated empirically. Testing its performance against three single indicators of xenophobia shows that it differentiates well between non-xenophobic and xenophobic respondents. The proposed method may be easily applied to facilitate interpretation by providing a more concise and meaningful “profile” of Likert scale (or subscale) raw scores especially the negative and positive ends of the scale for evaluation and social policy purposes.

Keywords

Likert Scale K-means Cluster Analysis Fuzzy C-means Cluster analysis Xenophobia

Introduction

Rensis Likert developed a scale in 1932 for measuring attitudes that has been widely used in social sciences, educational, medical and health research (Likert, 1932). The items (opinion statements) comprising a Likert scale (or subscales) are normally assigned five response categories scored from 1 to 5 and usually labelled strongly agree, agree, neither agree nor disagree, disagree, strongly disagree, although a different number of categories could be used, such as seven or nine (Hartley and Betts, 2009; Moser and Kalton, 1975). Having assessed the psychometric properties of the scale or subscales and ascertained their reliability and validity, the respondent’s attitude is usually measured by summing up his or her responses for each of the items. Likert scaling theory requires the items to be worded alternately as positive and negative in the case of theory development (subscales are not predetermined by theory as dimensions). In the case of theory testing (subscales are predetermined by theory), this requirement is applied within each subscale (see for instance Adorno et al., 1950; Marshall and Hays, 1994). Therefore, to make the total score meaningful, the scoring of the positive or negative items is reversed, depending on the definition of the negative-positive ends of the overall scale or subscale (Moser and Kalton, 1975).

The most popular method used to compute Likert scale (or subscale) raw scores is summing up its defining items based on the results of exploratory factor analyses (theory development) and, exploratory and confirmatory factor analyses (theory testing) performed to assess the scale’s construct validity. Composite scales may also be constructed based on the results of factor analyses performed to assess the scale’s construct validity by applying one of four methods: regression, Bartlett, Anderson-Rubin and Thompson factor scores (Thompson, 2005). However, these composite scales do not preserve the variation in the original data as does the method of summing up the items (DiStefano et al., 2009). In most exploratory research situations, summing up the items is extensively used and, when factor analysis shows that simple structure is present, then its application is straightforward (DiStefano et al., 2009).

Though Likert scales (or subscales) are easily constructed by summing up their defining items, the resulting raw scores are difficult to interpret or compare between scales because, even if the same number of response categories is assigned to the items, raw scores will range according to the number of the defining items included in the scale. Various solutions to this problem are adopted, depending on the context and the use of the scale. In health and medical research, a common practice is to average the items, so that scale scores range according to their original scoring (usually from 1 to 5), or to transform them on a scale ranging from 0 to 100 (Stewart and Ware, 1992). In these cases, the scale’s descriptive statistics are usually presented as they compare to various demographic and social characteristics (see for example Marshall and Hays, 1994). In educational and psychological testing, raw test scores are usually rescaled by applying one of three methods: percentiles, standard and standardized scores, and normalized scores (American Educational Research Association, 1999; Murphy and Davidshofer, 2001; Nunnally and Bernstein, 1994; Streiner and Norman, 2003). However, these methods require normative data and therefore they are but rarely applied to attitude scaling. To the best of our knowledge, only Carter (1996), investigating racial identity attitude measures, constructed a Likert scale by both summing up and averaging the items and transformed the raw scores using percentile norms. In attitude scaling, a “profile” of Likert scale raw scores is usually obtained by applying cluster analysis to items or factors and reporting descriptive statistics and further statistical analyses findings based on the resulting clusters (see for example Carter, 1996; Whittaker and Neville, 2009; Worrell et al., 2006). In this respect, cluster analysis “has come to describe grouping people on the basis of the similarity of their profiles (score vectors)” (Nunnally and Bernstein, 1994: 599). However, this type of analysis is applied in a similar fashion to factor analysis with clusters treated as dimensions measuring underlying constructs (Everitt et al., 2011; Nunnally and Bernstein, 1994). In this article, we propose k-means clustering to identify natural groupings of a large data set in order to produce a concise representation of the respondents’ scores and transform the overall Likert scale (or subscale) raw scores into meaningful groupings. By setting k equal to the number of response categories and applying a simple transformation, the original scoring is preserved as more meaningful to the resulting arbitrary raw scores improving the conceptual interpretation especially of the negative and positive ends of the scale crucial to evaluators and policy makers.

K-means clustering, developed by MacQueen (1967), is an iterative partitioning method of cluster analysis commonly used to classify a data set into a pre-specified number (k) of groups based on a Euclidean distance measure (Everitt et al., 2011; Lorr, 1983; see also Wu, 2007). There are various methods proposed in the literature for deciding on the number of groups (Chiang and Mirkin, 2010; Everitt et al., 2011; Lorr, 1983). Spanierman et al. (2006, 2009), following Gordon (1999), applied hierarchical clustering to identify the number of cluster groups before performing k-means clustering. In other applications, a different number of groups are usually employed and the resulting solutions are compared based on Hartigan’s rule (see for instance Neville and Lilly, 2000) and/or interpretability (see for example Thalhammer et al., 2001). However, since our main concern is to preserve the original scoring for facilitating interpretation, as in the case of averaging items (DiStefano et al., 2009), k is empirically defined as the number of response categories assigned to the items. The proposed method results in a transformed overall scale (or subscales) that may be easily analysed when a detailed ‘“profile’ of scale scores” (Streiner and Norman, 2003: 108) is of the utmost importance for social policy purposes.

The method is demonstrated empirically using Likert scale data from a large-scale sample survey intended to measure xenophobia, conducted in Northern Greece (Macedonia) by the National Centre for Social Research (Michalopoulou et al., 1998). Symeonaki and Michalopoulou (2011) applied k-means clustering to this data, defining different patterns of xenophobia in a similar way to that of factor analysis, and fuzzy partitioning was used for the recoding of raw scores on a single overall Likert scale. In the present article, k-means clustering is applied for transforming the raw Likert scale (or subscale) scores.

For the development of the scale, 18 items were used rated from 1 to 5. Symeonaki et al. (2015) presented in great detail the scale’ s construct validity and reliability assessment that indicated a single overall scale constructed by summing up the 15 items included in the analysis. By applying k-means clustering on the overall scale raw scores, we identify five clusters that correspond to the rating of the items. A simple cross-tabulation of the scale by the clusters provides the cut-off points for the recoding of the raw scores into five groupings.

The proposed method is validated empirically by comparing the cluster centres of the overall sample to those resulting from randomly splitting the sample into two halves (Everitt et al., 2011). Also, fuzzy c-means clustering is applied to investigate ambiguities in the proposed solution. It is a data clustering technique wherein each data point belongs to a cluster to some degree that is specified by a membership grade. This method was originally introduced by Bezdek (1981) as an extension on previous clustering methods and shows how to group data points in multidimensional space into a pre-specified number of different clusters. The central idea in fuzzy clustering is the non-unique partitioning of the data into a collection of clusters (Everitt et al., 2011).

The transformed scale may easily be applied to obtain a detailed demographic and social “profile” of raw scale scores. The performance of the proposed solution is assessed by using three single questions treated in the literature as indicators of xenophobia (Eurobarometer, 1989; Leong and Ward, 2006; Thalhammer et al., 2001; see also Welch et al., 2007).

Method

Procedure and Participants

Given the lack of an appropriate sampling frame, a stratified three-stage quota sample of 1,200 individuals completed the questionnaire. The total sample consisted of 601 men (50.1 percent) and 599 women (49.9 percent). Half of the participants (49.9 percent) were 18 to 45 years old (mean age = 45 years; standard deviation = 16.7), 73.1 percent were married, with secondary or lower education certificate (78.6 percent) and 54.4 percent were economically active (for a detail presentation see Michalopoulou et al., 1998; see also Symeonaki et al., 2015).

The survey was conducted according to the International Statistical Institute (1985) code of ethics.

Measures

For the development of the Likert scale measuring xenophobia, 18 items were used rated from 1 to 5. Symeonaki et al. (2015) presented in great detail a re-assessment of the scale’s psychometric properties based on current theory and practice (see also Michalopoulou et al., 1998). Preliminary item analysis resulted in the exclusion of three items. Exploratory factor analysis performed for investigating the scale’s structure resulted in three underlying dimensions. However, inspection of the subscales’ internal consistencies indicated that these subscales were not warranted and should be combined in an overall scale that was found to be both valid and reliable (Cronbach’s alpha = .856; split-half reliability coefficient = .779). The overall scale was computed by both summing up and averaging the 15 items included in the analysis with low and high scores indicating non-xenophobic and xenophobic attitudes, respectively.

Also, three single questions treated in the literature as indicators of xenophobia were used to assess the performance of the proposed solution. These questions measured xenophobia based on the perception of the number of non-EU nationals in one’s country (there are too many, there are many but not too many, there are not many) and the reactions (disturbing, not disturbing) to the presence of “others” of another nationality or religion.

Statistical Analysis

K-means clustering was applied to the overall scale computed by both summing up and averaging the 15 items included in the analysis. The number of clusters was set to five (k = 5) in order to preserve the original scoring and its conceptual interpretation. Cross-tabulating the overall scale raw scores by their cluster membership indicated the cut-off points for the recoding of the raw scores into five groupings in order to provide a better representation of scale scores. The method was validated empirically by comparing the cluster centres for the total sample to those resulting from the two randomly split half samples. Also, fuzzy c-means clustering was applied to investigate ambiguities in the proposed solution.

The transformed scale was cross-tabulated for the sake of brevity with only certain demographic and social characteristics as well as the three questions used as relevant indicators to assess its performance.

Statistical data analysis was performed using IBM SPSS Statistics Version 20. For the implementation of fuzzy c-means clustering, the Fuzzy logic toolbox of MATLAB R20126 was used as well as the clustering GUI tool.

Results

Table 1 illustrates the cluster centres with k-means and fuzzy c-means clustering (total sample) and k-means clustering (half samples 1 and 2) for the overall scale constructed by both summing up and averaging the items. The locations of the cluster centres differ only slightly between the various analyses. In all cases of k-means clustering, analysis of variance, used only for descriptive purposes, resulted in significantly different mean values between the five groups. These results indicate that the proposed solution is stable.

Table 1.

Cluster centres for a given number of clusters (=five) with k-means and fuzzy c-means

		Cluster centres
Method of clustering	N	1	2	3	4	5
Overall scale (S: 15-75)
K-means	1,200	25.61	35.87	45.64	55.11	65.12
Fuzzy c-means	1,200	26.15	36.26	46.61	54.93	64.84
K-means	612	25.11	35.93	45.77	54.67	64.35
K-means	588	26.98	36.85	46.40	55.66	65.37
Overall scale (A: 1.07-5.00)
K-means	1,200	1.71	2.38	3.04	3.67	4.34
Fuzzy c-means	1,200	1.74	2.42	3.10	3.66	4.32
K-means	612	1.67	2.40	3.05	3.64	4.29
K-means	588	1.80	2.46	3.09	3.71	4.36

The method of constructing the overall scale measuring xenophobia and the range of raw scores are presented in parentheses: S = summing up the items; A = averaging the items.

Table 2 presents a more analytical brief demographic and social “profile” of the transformed raw scale scores using k-means clustering. As shown, age, education, self-rated political orientation and church attendance frequency play a central role in the formation of xenophobic and non-xenophobic attitudes towards the “other”. The respondents demonstrating xenophobic attitudes are older, less educated, attending church and with rightwing political inclinations, confirming relevant empirical results and theory (Eurobarometer, 1989; Thalhammer et al., 2001; Welch et al., 2007). Therefore, the proposed clustering solution classifies scores in a meaningful way, differentiating well the xenophobic from the non-xenophobic respondents, and thus providing a concise “profile” of scale scores, consistent with the three indicators of xenophobia.

Table 2.

The demographic and social “profile” of the Likert scale measuring xenophobia transformed raw scores using k-means clustering

Variables		Xenophobia transformed scale
		scores (%)					Total
		(−) 1	2	3	4	5 (+)	%
Age*
	18-29	12.8	27.4	36.1	17.5	6.2	100.0
	30-44	13.2	26.9	27.5	22.3	10.1	100.0
	45-59	7.8	25.6	26.4	22.5	17.8	100.1
	60-80	4.8	10.0	28.0	32.5	24.7	100.0
Education*
	Illiterate/uncompleted primary education	3.7	9.9	17.3	42.0	27.2	100.1
	Primary or lower secondary education certificate	5.3	17.0	30.4	28.1	19.2	100.0
	Secondary or higher education certificate	13.1	29.0	32.6	17.2	8.2	100.1
	Higher education or postgraduate degree	27.0	38.2	22.5	9.0	3.4	100.1
Left/Right scale of self-rated political orientation**
	1-2 (Left)	19.7	30.3	21.2	15.2	13.6	100.0
	3-4	14.1	29.2	28.1	20.3	8.3	100.0
	5-6	9.9	22.1	29.8	24.5	13.7	100.0
	7-8	6.2	20.8	31.5	25.3	16.3	100.1
	9-10 (Right)	4.2	10.2	36.4	28.8	20.3	100.1
Church attendance*
	Every Sunday or more often	3.7	17.0	30.3	28.3	20.7	100.0
	Twice or three times a month	6.8	17.6	29.2	26.8	19.6	100.0
	A few times in a year	9.7	25.7	29.8	22.7	12.1	100.0
	Only at Easter	19.8	28.5	28.4	16.4	6.9	100.0
	Never	26.2	21.4	28.6	16.7	7.1	100.0
Perception of non-EU nationals in one’s country*
	There are too many	6.2	20.1	29.7	26.8	17.2	100.0
	There are many but not too many	15.2	29.4	29.4	17.9	8.1	100.0
	There are not many	46.7	16.7	26.7	3.3	6.7	100.1
The presence of “others” of another nationality^a
	Disturbing	3.2	8.8	26.3	31.5	30.2	100.0
	Not disturbing	13.9	30.6	31.4	18.9	5.3	100.1
The presence of “others” of another religion*
	Disturbing	3.3	11.8	26.9	32.1	25.9	100.0
	Not disturbing	13.6	29.5	31.4	18.3	7.3	100.1

Overall scale frequency distribution^b		9.7	22.5	29.5	23.7	14.6	100.0

*N=1,088; **N=1,018; ^a N=1,073; ^b N=1,090. The raw scores of the overall scale measuring xenophobia constructed by both summing up and averaging the items were transformed as follows: 16-30 = 1; 31-40 = 2; 41-50 = 3; 51-60 = 4; 61-75 = 5 and 1.07-2.00 = 1; 2.01-2.67 = 2; 2.68-3.34 = 3; 3.35-4.00 = 4; 4.01-5 = 5. All the cross-tabulation results are significant at p < .001.

Conclusions

Likert scaling is central to attitude measurement in social survey research. The present article aims to present a new method for transforming Likert scale (or subscales) raw scores. Having ascertained the psychometric properties of the overall scale (or subscales), the proposed method can easily be applied to construct a transformed single overall scale (or subscales) that provides a more concise interpretation of raw scores, preserving the original scoring as is the case in most explanatory social research situations.

The proposed method is demonstrated on a Likert scale that was used in a large-scale sample survey for measuring xenophobia. Transforming the raw scale scores by applying k-means clustering showed that scores are classified in meaningful groupings that differentiate well the xenophobic from the non-xenophobic respondents, thus providing a concise “profile” of scale scores, consistent with three indicators of xenophobia. Based on these groupings, a detailed demographic and social “profile” of raw scale scores was obtained, consistent with theory.

This article contributes to the growing number of studies on measuring social phenomena by demonstrating how k-means clustering may easily be applied in transforming Likert scale (or subscales) raw scores to facilitate interpretation by providing a more concise and meaningful ‘profile’ of raw scores especially the negative and positive ends of the scale for evaluation and social policy purposes (for example towards immigration).

Footnotes

Acknowledgement

Grateful acknowledgement is made to Professor Clive Richardson for his comments.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Adorno

Frenkel-Brunswik

Levinson

Sanford

(1950) The Authoritarian Personality. New York: Harper.

American Educational Research Association (1999) Standards for Educational and Psychological Testing. Washington, DC: American Psychological Association.

Bezdek

(1981) Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum Press.

Carter

(1996) Exploring the Complexity of Racial Identity Attitude Measures. In: Impara

(series ed.) and Sodowsky

Impara

(vol. eds) Multicultural Assessment in Counseling and Clinical Psychology. Lincoln, NE: Buros Institute of Mental Measurements, 193–223.

Chiang

Mirkin

(2010) Intelligent Choice of the Number of Clusters in K-means Clustering - An Experimental Study with Different Cluster Spreads. Journal of Classification 27(1): 3–40.

DiStefano

Zhu

Mîndrilă

(2009) Understanding and Using Factor Scores - Considerations for the Applied Researcher. Practical Assessment, Research & Evaluation 14(20): 1–11.

Eurobarometer (1989) Racism and Xenophobia. Brussels: Commission of the European Communities.

Everitt

Landau

Leese

Stahl

(2011) Cluster Analysis (5th edition). Chichester, UK: John Wiley.

Gordon

(1999) Classification. Boca Raton, FL: Chapman & Hall.

10.

Hartley

Betts

(2009) Four Layouts and a Finding - The Effects of Changes in the Order of the Verbal Labels and Numerical Values on Likert-type Scales. International Journal of Social Research Methodology 13(1): 17–27.

11.

International Statistical Institute (1985) Declaration on Professional Ethics. Available at: https://www.isi-web.org/index.php/news-from-isi/151-ethics1985

12.

Leong

Ward

(2006) Cultural Values and Attitudes toward Immigrants and Multiculturalism - The Case of the Eurobarometer Survey on Racism and Xenophobia. International Journal of Intercultural Relations 30(6): 799–810.

13.

Likert

(1932) A Technique for the Measurement of Attitudes. Archives of Psychology 140: 5–55.

14.

Lorr

(1983) Cluster Analysis for Social Scientists - Techniques for Analyzing and Simplifying Complex Blocks of Data. San Fransisco, CA: Jossey-Bass.

15.

MacQueen

(1967) Some Methods for Classification and Analysis of Multivariate Observations. In: Le Cam

Neyman

(eds) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability - Vol . 1. Berkley, CA: University of California Press, 281–297.

16.

Marshall

Hays

(1994) The Patient Satisfaction Questionnaire Short-form (PSQ-18). Report no P-7865. Santa Monica, CA: Rand Corporation.

17.

Michalopoulou

Tsartas

Giannesopoulou

Kafetzis

Manologlou

(1998). Macedonia and the Balkans - Xenophobia and Development (in Greek). Athens: National Centre for Social Research-Alexandria Publications.

18.

Moser

Kalton

(1975) Survey Methods in Social Investigation. London: Heinemann Educational Books.

19.

Murphy

Davidshofer

(2001) Psychological Testing - Principles and Applications. New Jersey: Prentice-Hall.

20.

Neville

Lilly

(2000) The Relationship between Racial Identity Cluster Profiles and Psychological Distress among African American College Students. Journal of Multicultural Counseling and Development 28(4): 194–207.

21.

Nunnally

Bernstein

(1994) Psychometric Theory. New York: McGraw-Hill.

22.

Spanierman

Poteat

Beer

Armstrong

(2006) Psychological Costs of Racism to Whites - Exploring Patterns through Cluster Analysis. Journal of Counseling Psychology 53(4): 434–41.

23.

Spanierman

Todd

Anderson

(2009) Psychological Costs of Racism to Whites - Understanding Patterns among University Students. Journal of Counseling Psychology 56(2): 239–52.

24.

Stewart

Ware

Jr (eds) (1992) Measuring Functioning and Well-being - The Medical Outcomes Study Approach. Durham and London: Duke University Press.

25.

Streiner

Norman

(2004) Health Measurement Scales - A Practical Guide to Their Development and Use. Oxford: Oxford University Press.

26.

Symeonaki

Michalopoulou

(2011) Measuring Xenophobia in Greece - A Cluster Analysis Approach. Paper presented at the 14th ASMDA International Conference, Rome, 7-10 June.

27.

Symeonaki

Michalopoulou

Kazani

(2015) A Fuzzy Set Theory Solution to Combining Likert Items into a Single Overall Scale (or Subscales). Quality & Quantity 49(2): 739–762.

28.

Thalhammer

Zucha

Enzenhofer

Salfinger

Orgis

(2001). Attitudes towards Minority Groups in the European Union - A Special Analysis of the Eurobarometer 2000 Opinion Poll on behalf of the European Monitoring Centre on Racism and Xenophobia (Technical report). Available at the SORA Institute for Social Research and Analysis. Available at: http://ec.europa.eu/public_opinion/archives/ebs/ebs_138_tech.pdf.

29.

Thompson

(2005) Exploratory and Confirmatory Factor Analysis - Understanding Concepts and Applications (2nd printing). Washington, DC: American Psychological Association.

30.

Welch

Sikkink

Loveland

(2007) The Radius of Trust - Religion, Social Embeddedness and Trust in Strangers. Social Forces 86(1): 23-46.

31.

Whittaker

Neville

(2009) Examining the Relation between Racial Identity Attitude Clusters and Psychological Health Outcomes in African American College Students. Journal of Black Psychology 36(4): 383–409.

32.

Worrell

Vandiver

Cross

Jr Fhagen-Smith

(2006) Generalizing Nigrescence Profiles - Cluster Analyses of Cross Racial Identity Scale (CRIS) Scores in Three Independent Samples. Counseling Psychologist 34(4): 519–47.

33.

(2007) An Empirical Study on the Transformation of Likert-Scale Data to Numerical Scores. Applied Mathematical Sciences 1(58): 2851–2862.