Abstract
Shame may increase HIV risk among stigmatized populations. The Personal Feelings Questionnaire–2 (PFQ-2) measures shame, but has not been validated in Spanish-speaking or nonclinical stigmatized populations disproportionately affected by HIV in resource-limited settings. We examined the psychometric properties of the Spanish-translated PFQ-2 shame subscale among female sex workers in two Mexico–U.S. border cities. From 2016 to 2017, 602 HIV-negative female sex workers in Tijuana and Ciudad Juarez participated in an efficacy trial evaluating a behavior change maintenance intervention. Interviewer-administered surveys collected information on shame (10-item PFQ-2 subscale), psychosocial factors, and sociodemographics. Item performance, confirmatory factor analysis, internal consistency, differential item functioning by city, and concurrent validity were assessed. Response options were collapsed to 3-point responses to improve item performance, and one misfit item was removed. The revised 9-item shame subscale supported a single construct and had good internal consistency (Cronbach’s α = .86). Notable differential item functioning was found but resulted in a negligible effect on overall scores. Correlations between the revised shame subscale and guilt (r = .79, p < .01), depression (r = .69, p < .01), and emotional support (r = −.28, p < .01) supported concurrent validity. The revised PFQ-2 shame subscale showed good reliability and concurrent validity in our sample, and should be explored in other stigmatized populations.
Keywords
Shame is a painful self-focused dysphoric emotion often associated with guilt and depression that can evolve, in part, from repeated negative social evaluations and societal stigma (Dickerson et al., 2004; Tracy et al., 2007). Shame can cause an individual to feel insignificant or powerless, and lead to expectations of disapproval from others (D. W. Harder & Greenwald, 1999; Tangney et al., 1992; Tracy et al., 2007). Shame also often undermines self-worth and can cause an individual to want to hide or disappear (Tracy et al., 2007). While shame is not inherently a problematic or maladaptive emotion (Taihara & Malik, 2016), high levels of shame may lead individuals to develop maladaptive coping strategies (Rahim & Patton, 2015; Tracy et al., 2007), such as substance use (Dearing et al., 2005; Rahim & Patton, 2015; Tracy et al., 2007; Treeby & Bruno, 2012) and sexual risk taking (Rahim & Patton, 2015), to escape from or suppress feelings of worthlessness. More maladaptive responses to shame may, in turn, result in additional shame and increase vulnerability to depression (Andrews et al., 2002; Kim et al., 2011) and suicide (Tracy et al., 2007).
Among stigmatized populations, shame is a powerful psychological phenomenon that can profoundly impact an individual’s risk of adverse health outcomes, including HIV and other sexually transmitted infections (STIs) as well as their timely linkage to HIV/STI services (Fortenberry et al., 2002; Foster & Byers, 2016; Hutchinson & Dhairyawan, 2018; Ma et al., 2017; Wahed et al., 2017). Female sex workers (FSWs) are a highly stigmatized group that often faces significant social rejection resulting in high levels of shame and related internalizing (e.g., low self-esteem, self-isolation) and externalizing (e.g., substance use, condomless sex with clients) coping behaviors (Strathdee et al., 2012; 2015). Shame is modifiable (de Hooge et al., 2011; Lickel et al., 2014), and thus may provide an innovative intervention target to reduce HIV/STI morbidity. However, we currently lack a valid and reliable tool for assessing shame among socially marginalized populations, such as FSWs.
The Personal Feelings Questionnaire–2 (PFQ-2) is a revision of the Personal Feelings Questionnaire (PFQ) and is a self-report, adjective-based scale developed among college students to measure the frequency of shame and guilt experiences (D. H. Harder & Zalma, 1990). Numerous studies have examined psychometric aspects of the PFQ-2, including factor structure, internal consistency, test–retest reliability, differential item functioning (DIF), as well as concurrent, convergent, and discriminant validity within clinical and nonclinical samples across diverse settings (supplementary Table S1, available online). While most studies consider the PFQ-2 in its original English form, psychometric properties have also been reported for versions of the PFQ-2 translated into Italian (Di Sarno et al., 2019), German (Rüsch et al., 2007), and Croatian (Eterović et al., 2019). To our knowledge, the PFQ-2 has not yet been evaluated in Spanish. While one study examined the reliability and validity of the PFQ-2 within a population of gay and heterosexual men at risk for HIV infection (Bybee et al., 2009), more studies are needed to evaluate the psychometric properties of the PFQ-2 in low- and middle-income countries (LMIC) and among other stigmatized groups in nonclinical settings. Such work could help inform future research investigating the effect of shame on HIV/STI risk within these populations.
We examined the psychometric properties of the Spanish-translated 10-item PFQ-2 shame subscale in a sample of FSWs in Tijuana and Ciudad Juarez, Mexico using methods from Item Response Theory (IRT) and Classical Test Theory. Sex work is quasilegal in these Mexico–U.S. border cities, which cross major drug-trafficking routes into the United States and attract thousands of sex tourists annually. These conditions foster a complex HIV/STI risk environment characterized by poverty, substance use, gender inequity, and violence, which often precipitates women’s entry into sex work and can be exacerbated by the shame and stigma associated with sex work (Shannon et al., 2015; Strathdee et al., 2012; Strathdee & Magis-Rodriguez, 2008). FSWs in this setting are particularly vulnerable to HIV/STIs, with HIV prevalence (5.95%) among FSWs along the Mexico–U.S. border estimated to be over 30 times that of reproductive-aged women in Mexico (0.19%; Baral et al., 2012; Patterson et al., 2008). We hypothesized that the PFQ-2 shame subscale would be unidimensional given findings from previous studies (Eterović et al., 2019; Ferguson & Crowley, 1997; D. H. Harder & Zalma, 1990) that evaluated the factor structure of the PFQ-2 and concluded that a two-factor structure where shame and guilt items loaded predominately onto separate factors was reasonable. We anticipated that the shame subscale would have good internal consistency and also theorized a priori that there could be DIF by city due to sociostructural differences in sex work environments between Tijuana and Ciudad Juarez (Andrade et al., 2019; Beletsky et al., 2012; Ramos et al., 2009) that could result in systematically different responses to specific items. Lastly, we hypothesized that the PFQ-2 shame subscale would have moderate to strong positive correlations with guilt and depression, and a moderate to strong inverse correlation with emotional support.
Methods
Study Population and Design
This analysis used data from the baseline visit of a randomized controlled trial conducted between January 2016 and January 2017 to evaluate the efficacy of a 24-month, theory-based text messaging behavior change maintenance intervention for sexual risk reduction among FSWs in Tijuana (n = 302) and Ciudad Juarez (n = 300).
As previously described (Patterson et al., 2019; 2020), time–location sampling was used to recruit potential participants at known sex work locations (e.g., brothels, street corners, and bars) in each city. Individuals interested in participating were invited to undergo an eligibility screener at each city’s study site (i.e., an unmarked office building in Tijuana’s red-light district and a clinical setting in downtown Ciudad Juarez). Both study sites were staffed by local Spanish-speaking individuals with experience working with FSWs and other socially marginalized populations. Individuals were eligible to participate if they met the following criteria: cisgender female; at least 18 years of age; exchanged money, drugs, or other goods for sex in the past month; had condom-unprotected vaginal or anal sex with a male client in the past month; HIV-negative; willing to receive antibiotic treatment if positive for chlamydia, gonorrhea, or syphilis; and owned a cell phone. After providing written informed consent, participants underwent HIV testing to confirm that they were HIV-negative, completed a baseline interviewer-administered survey, participated in an interactive counseling session regarding sexual risk reduction, and were randomized to receive either text messages for behavior maintenance (intervention arm) or text messages about general health (control arm).
All study procedures were approved by the ethics committees at Xochicalco University in Tijuana, Salud y Desarrollo Comunitario de Ciudad Juárez (SADEC)-Federación Mexicana de Asociaciones Privadas (FEMAP) in Ciudad Juarez, and the University of California, San Diego.
Data Collection
The survey was translated from English into Spanish by a bilingual, native Spanish-speaking member of the research team with PhD-level training in Public Health. The translation was then checked for clarity and back-translated by another bilingual, native Spanish-speaking team member. Since scale validation was not the primary focus of the main study, translation and back-translation were not independent but instead collaborative processes. The survey was then administered one-on-one to study participants in Spanish by native Spanish-speaking interviewers using computer-assisted personal interviewing. The baseline survey took approximately 50 minutes to complete and collected information on shame, other psychosocial factors, and sociodemographics.
Shame
The Spanish-translated 10-item PFQ-2 shame subscale was used to measure frequency of shame experiences (e.g., “How often do you feel humiliated?” and “How often do you feel disgusting to others?”) using 5-point Likert-type responses (0 = never, 1 = rarely, 2 = some of the time, 3 = frequently but not continuously, 4 = continuously or almost continuously; Table 1; D. H. Harder & Zalma, 1990; D. W. Harder et al., 1992). A summed score was calculated from item responses, with higher scores indicating greater levels of shame (range = 0-40).
Spanish Translation of the PFQ-2 Shame Subscale and Descriptive Statistics, n(%).
Note. Percentages may not sum to 100 due to rounding. M = mean, PFQ-2 = Personal Feelings Questionnaire-2, SD = standard deviation.
Response options in English: 0 = never, 1 = rarely, 2 = some of the time, 3 = frequently but not continuously, 4 = continuously or almost continuously. bResponse options in Spanish: 0 = nunca, 1 = raramente, 2 = algunas veces, 3 = con frecuencia pero no continuamente, 4 = continuo o casi continuamente.
Guilt
Unlike shame which is a global self-focused appraisal, guilt is a negative emotion resulting from specific behaviors whereby an individual feels remorse or regret over a specific act or actions (D. W. Harder & Greenwald, 1999; Tangney et al., 1992). The Spanish-translated 6-item PFQ-2 guilt subscale was used to measure frequency of guilt experiences (e.g., “How often do you feel regret?” and “How often do you feel you deserve criticism for what you did?”) using 5-point Likert-type responses (0 = never, 1 = rarely, 2 = some of the time, 3 = frequently but not continuously, 4 = continuously or almost continuously; D. H. Harder & Zalma, 1990; D. W. Harder et al., 1992). A summed score was calculated from item responses, where higher scores indicate greater levels of guilt (range = 0-24). The guilt subscale demonstrated good internal consistency in the current sample (Cronbach’s α = .83; McDonald’s ω total = .88; McDonald’s ω hierarchical = .76).
Depression
Depression refers to persistent feelings of sadness, hopelessness, or loss of interest and was measured using the 21-item Beck Depression Inventory–II (BDI-II; e.g., “In the past 2 weeks, how often have you felt sadness?” and “In the past 2 weeks, how often have you felt suicidal thoughts or wishes?” Beck et al., 1996). Participants indicated their level of agreement using 4-point Likert-type responses with phrasing specifically adapted to each scale item (e.g., 0 = I do not feel sad, 1 = I feel sad much of the time, 2 = I am sad all the time, 3 = I am so sad or unhappy that I can’t stand it). A summed score was calculated from item responses with higher values indicating more depressive symptomology (range = 0-63). The BDI-II demonstrated good internal consistency in the current sample (Cronbach’s α = .94; McDonald’s ω total = .95; McDonald’s ω hierarchical = .82).
Emotional Support
Emotional support was measured via the 7-item Pearlin Emotional Support scale assessing support from family and friends (e.g., “There are people in your life that make you feel good about yourself” and “The people close to you let you know they care about you”; Pearlin et al., 1990). Participants were asked to provide their level of agreement using 4-point Likert-type responses (1 = strongly disagree, 2 = disagree, 3 = agree, 4 = strongly agree). A mean score was calculated from item responses, where higher values indicate greater emotional support. The Pearlin Emotional Support scale demonstrated good internal consistency in the current sample (Cronbach’s α = .96; McDonald’s ω total = .97; McDonald’s ω hierarchical = .89).
Sociodemographics
Sociodemographic data were collected on participants’ age (in years), highest level of education completed (1 = cannot read or write, 2 = some grade school without certificate, 3 = grade school, 4 = some secondary school without certificate, 5 = secondary school, 6 = some high school without certificate, 7 = high school, 8 = some university but no title, 9 = university with title, 10 = advanced degree [doctorate or masters], 11 = knows how to read and write without formal education), and marital status (1 = married, 2 = separated or filing for divorce, 3 = divorced but not remarried, 4 = widowed but not remarried, 5 = common-law marriage, 6 = single). Average monthly income in Mexican pesos (MXN) over the past 6 months was measured by asking participants “What was your average monthly income over the past 6 months, before taxes, including legal and illegal income?” (1 = no income, 2 < MXN $1000, 3 = MXN $1000-$1499, 4 = MXN $1500-$1999, 5 = MXN $2000-$2499, 6 = MXN $2500-$2999, 7 = MXN $3000-$3499, 8 ≥ MXN $3500). Income was examined as a dichotomous variable (<$3,000 MXN vs. ≥$3000 MXN) based on the national monthly well-being lines representing approximate federal poverty limits for urban areas during our study period (Social Consejo Nacional de Evaluación de la Política de Desarrollo, 2019). Number of children was assessed by asking participants “How many children do you have?” and primary sex work venue was measured by asking participants “What is the main type of place where you work?” (1 = bar or nightclub, 2 = street, 3 = brothel, 4 = hotel or motel, 5 = customer’s vehicle, 6 = massage parlor, 7 = other).
Statistical Analysis
Sample Characteristics
Descriptive statistics were used to examine shame levels and characterize the study population overall and by city using means and standard deviations for continuous variables and frequencies for categorical variables.
Item Performance
We used IRT methods to examine the item-level properties of each of the 10 items that make up the PFQ-2 shame subscale. More specifically, we used option characteristic curves (OCCs) to examine the discrimination and difficulty of the response options within each item to assess whether items were useful in measuring levels of shame among participants (Baker, 2001; DeVellis, 2016). OCCs graphically display an item’s performance by plotting the relationship between the standard normal shame scores and the probability of endorsing each response option (DeVellis, 2016) and were estimated for each item with nonparametric kernel-smoothing using the ksIRT function in the KernSmoothIRT package (Mazza et al., 2014) in R. An item was considered to be well-performing if its response options had good discrimination across levels of shame severity (demonstrated by the steepness of the response curves) and difficulty (visualized by the crossing of two response curves where endorsing one response option became more likely than endorsing another).
Confirmatory Factor Analysis
We theorized that the PFQ-2 shame subscale would have a one-factor structure based on previous studies (Eterović et al., 2019; Ferguson & Crowley, 1997; D. H. Harder & Zalma, 1990) that determined that a two-factor structure was reasonable for the full PFQ-2 scale, with shame and guilt items loading predominately onto separate factors. Confirmatory factor analysis (CFA) was used to examine the interrelationships among the items and determine whether the PFQ-2 shame subscale had an underlying one-factor structure. Given the ordinal nature of the item response options, CFA for ordinal data was conducted (Li, 2016; Rhemtulla et al., 2012) using the cfa function in the lavaan package (Rosseel, 2012) in R. Model fit was assessed using diagonally weighted least squares estimation of the Root Mean Square Error of Approximation (RMSEA), Comparative Fit Index (CFI), and Tucker–Lewis index (TLI), with a RMSEA value less than 0.10, and CFI and TLI values greater than 0.90 indicating an acceptable fit.
Internal Consistency
Internal consistency was estimated using three indicators of reliability: Cronbach’s α (Cronbach, 1951) as well as McDonald’s ω total and ω hierarchical (McDonald, 2013; Zinbarg et al., 2005). We included Cronbach’s α to allow for the comparison of our results with those from other studies, given its widespread use in the published literature. However, using Cronbach’s α to asses reliability has some limitations, including the strong assumption that the scale is tau-equivalent (i.e., each item contributes equally to the total score such that factor loadings are identical across items; McNeish, 2018; Sijtsma, 2009). In common scenarios where a scale is congeneric (i.e., the items measure the same latent construct but to differing degrees), Cronbach’s α would underestimate the scale’s true reliability, especially in shorter scales with less than 10 items (Graham, 2006). Cronbach’s α can also be inflated by the number of items and the number of redundant items in a scale (Streiner, 2003). For these reasons, we considered McDonald’s ω total and ω hierarchical as alternative reliability coefficients. McDonald’s ω total is another lower bound estimate of reliability and assumes a congeneric model, which is useful for scales that may be multidimensional because it estimates the proportion of the total variance due to a primary, general factor as well as other factors (Revelle & Zinbarg, 2009). McDonald’s ω hierarchical, on the other hand, is useful for estimating the proportion of the total variance that is due only to one primary, general factor (Revelle & Zinbarg, 2009; Zinbarg et al., 2005). All three reliability coefficients were calculated using the omega function in the psych package (Revelle, 2019) in R.
Differential Item Functioning by City
Due to documented sociostructural differences between the sex work environments in Tijuana and Ciudad Juarez (Andrade et al., 2019; Beletsky et al., 2012; Ramos et al., 2009), we assessed DIF by city to determine whether FSWs with the same level of shame responded differently to each scale item depending on their city of residence. DIF was assessed using Likelihood Ratio tests first using an exploratory approach testing each item with all other items as anchors and finally using specific item sets identified with minimum DIF as anchors. The first anchored model was created by freeing each item of parameters and using the remaining items as anchors. The final anchored model was created using two anchor items with the largest discrimination parameters and similar performance in both reference and examined groups. We then estimated the effect size of significant DIF across items (Meade, 2016). We also plotted the expected scale scores to determine the impact of any notable item-level DIF on the scale-level scores, specifically whether items with notable DIF could still be retained or should be removed based on their effect on overall scores (Reise & Revicki, 2014).
Concurrent Validity
Concurrent validity was assessed by calculating the correlations between shame (PFQ-2 shame subscale) and (1) guilt (PFQ-2 guilt subscale), (2) depression (BDI-II), and (3) emotional support (Pearlin Emotional Support scale) using the proc corr statement in SAS.
All analyses were conducted using R (Version 3.6.1) or SAS (Version 9.4).
Results
Sample Characteristics
On average, participants were 38 years old (standard deviation (SD) = 10.34) and had three children (SD = 1.73; Table 2). Less than half of our sample had completed secondary school (42%), 89% reported an average monthly income of $3,000 MXN or more (approximately US$150) in the past 6 months, 58% reported being single, and most reported that their primary sex work venue was either in the street (37%) or in a hotel or motel (38%).
Characteristics of Female Sex Workers in the Mexico–U.S. Border Region by City (N = 602), n (%).
Note. Percentages may not sum to 100 due to rounding. M = mean, SD = standard deviation.
p value from chi-squared test or Fisher’s Exact test (if categorical) or two-sided t tests (if continuous). bScored using the Personal Feelings Questionnaire–2 (PFQ-2) shame subscale (D. H. Harder & Zalma, 1990; D. W. Harder et al., 1992). cScored using the Personal Feelings Questionnaire–2 (PFQ-2) guilt subscale (D. H. Harder & Zalma, 1990; D. W. Harder et al., 1992). dScored using the Beck Depression Inventory–II (Beck et al., 1996). eScored using the Pearlin Emotional Support Scale (Pearlin et al., 1990).
Item Performance
When examining item-level performance, the OCCs from the 10-item subscale with 5-point responses suggested that two response options (1 = rarely; 3 = frequently but not continuously) had poor separation across all items, overlapping substantially within standard normal shame scores. In particular, the probability of endorsing “rarely” was never higher than the probability of endorsing other response options across any level of shame for all 10 items. These results suggested that these two response options did not differentiate well between participants with varying levels of shame. Additionally, one item (“stupid”) performed particularly poorly with a lack of clear separation between response options and low discrimination across levels of shame. Across all items, collapsing the 5-point responses to 3-point responses (collapsing responses 1 [rarely] and 2 [some of the time] as well as collapsing responses 3 [frequently but not continuously] and 4 [continuously or almost continuously]) resulted in clearer separation between response options and improved item performance. Figure 1 displays how item performance improved after collapsing to 3-point responses using a representative item (“ridiculous”). Therefore, in subsequent analyses, we only considered the collapsed 3-point responses.

Option characteristic curves for a representative item (“ridiculous”) using the original 5-point responses (left) and after collapsing to 3-point responses (right).
Confirmatory Factor Analysis
During CFA of the 10-item subscale with 3-point responses, the factor diagram indicated that one item (“stupid”) did not load onto the primary trait. This misfit item was removed given its low factor loading (factor loading = 0.30) and poor item performance as indicated by the OCCs. CFA on the revised 9-item shame subscale with 3-point responses (henceforth referred to as the revised shame subscale, range of scores = 0-18) indicated that a one-factor structure was a reasonable fit for the data (RMSEA = 0.07; CFI = 0.99; TLI = 0.99). All items had moderate to strong factor loadings ranging from 0.58 to 0.82 (Guadagnoli & Velicer, 1988).
Internal Consistency
The revised shame subscale had an overall sample mean of 5.65 (SD = 3.98) and a Cronbach’s α of .86. McDonald’s ω total and ω hierarchical were .89 and .65, respectively.
Differential Item Functioning by City
In the first round of DIF testing using the model with all other items as anchors, “self-consciousness” and “disgusting to others” had the largest discrimination parameters and nonsignificant p values of 0.06 when comparing item parameters across groups. In the second round of DIF testing, the model was anchored by these two items and indicated that there was statistically significant DIF by city in five (“embarrassed,” “humiliated,” “childish,” “feelings of blushing,” and “laughable”) of the remaining seven items (Table 3). The five items with statistically significant DIF also had notable effect sizes that were at least 0.4 in magnitude. A visual inspection of the item characteristic curves from DIF items did not indicate significant differences in discrimination. To determine the impact of the notable item-level DIF on the scale-level scores for the revised shame subscale, we plotted the expected scale scores for each city and found that the comparable magnitudes but opposing directions of the DIF resulted in a negligible impact on the overall scores (Table 3).
Two-Anchor Model Testing Differential Item Functioning (DIF) by Study Site in the Revised Shame Subscale, and the Impact of DIF on Overall Scores.
Used in model as anchor item.
Concurrent Validity
The revised shame subscale was significantly correlated with guilt (r = .79, p < .01), depression (r = .69, p < .01), and emotional support (r = −.28, p < .01).
Discussion
We conducted a psychometric evaluation of the Spanish-translated PFQ-2 shame subscale among FSWs in Tijuana and Ciudad Juarez, Mexico using methodology from IRT and Classical Test Theory to evaluate the properties of individual items and the characteristics of the scale overall, respectively. We hypothesized a priori that the shame subscale would be unidimensional, have good internal consistency, and possible regional DIF due to known differences between the sex work environments in Tijuana and Ciudad Juarez. We also hypothesized that the shame subscale would have moderate to strong positive correlations with guilt and depression, and a moderate to strong inverse correlation with emotional support.
We found in our item-level analysis that two response options were less effective across all scale items at discriminating between participants with differing levels of shame, and observed an improvement in item performance after collapsing from 5-point to 3-point response options. In line with our a priori hypothesis, CFA of the revised shame subscale indicated that a one-factor structure was reasonable, which is similar to other studies (Eterović et al., 2019; Ferguson & Crowley, 1997; D. H. Harder & Zalma, 1990) that conducted factor analysis on the full PFQ-2 scale and found that the shame items predominately loaded onto the same latent trait. Similar to other studies (Barr & Cacciatore, 2007; Di Sarno et al., 2019; Dorahy et al., 2013; Gambin & Sharp, 2018; D. H. Harder & Zalma, 1990; Rüsch et al., 2007; Tignor & Colvin, 2019), we also observed that the revised shame subscale had good internal consistency and could therefore be considered a reliable measure of the frequency of shame experiences in our sample. Our observed sample mean of 5.65 for the revised shame subscale appeared to be low given the possible range of scores (range = 0-18). This observed mean value from the revised subscale would have been comparable to a mean of 12.56 using all 10 items with 5 response options (range = 0-40). That mean value is similar to the means reported within nonclinical samples in three studies (Di Sarno et al., 2019; Eterović et al., 2019; Rüsch et al., 2007) that translated the PFQ-2 into other languages, but lower than the means reported within nonclinical samples in other studies (Barr & Cacciatore, 2007; Bybee et al., 2009; D. H. Harder & Zalma, 1990; D. W. Harder et al., 1992; D. W. Harder & Greenwald, 1999; Tignor & Colvin, 2019) that used the PFQ-2 in its original English form. The lower levels of shame observed in our sample may indicate that the Spanish-translated adjectives included in the PFQ-2 shame subscale, which were developed and tested in English among undergraduate students, may not hold the same negative emotional magnitude (e.g., “self-consciousness”) or be as relatable (e.g., “childish”) to the experiences of shame among our sample of FSWs in Mexico.
As we hypothesized, we found significant DIF by city, which indicates that FSWs with the same level of shame differed in their responses to particular scale items depending on whether they lived in Tijuana or Ciudad Juarez. While we observed DIF by city, another study (Di Sarno et al., 2019) among a nonclinical sample in Italy found DIF by gender in an Italian-translated version of the PFQ-2. When considered with our findings, this highlights the importance of evaluating DIF in psychometric analyses conducted among different populations than the population in which the scale was developed. While FSWs in Tijuana and Ciudad Juarez share many similar experiences of poverty, violence, substance use, and social rejection, differences between the sex work environments (e.g., prevalence of street-based sex work, gentrification, migration patterns, and policing practices) in these two cities may account for the DIF that was observed in our sample (Andrade et al., 2019; Beletsky et al., 2012; Ramos et al., 2009). Additionally, the DIF that we observed may also be due to the ways in which the scale was translated from English to Spanish. Even directly translated words or phrases may not have the same linguistic or cultural meaning within or across regions as in the original language, which may affect the way in which participants understand the translation. While we found notable DIF in five items, the expected scale scores suggested that these items could be retained because they had effect sizes of comparable magnitudes but in different directions, resulting in biases that seemingly canceled at the scale-level (Reise & Revicki, 2014). Ultimately, while the opposing directions of the item-level DIF resulted in a negligible effect on overall scores in our sample, the significant regional DIF that we found highlights the need for studies using the shame subscale in comparable settings to consider controlling for regional invariance.
We also found that the revised PFQ-2 shame subscale demonstrated good concurrent validity with the PFQ-2 guilt subscale, the BDI-II, and the Pearlin Emotional Support Scale, which is consistent with the trends reported in other studies using the 10-item shame subscale with 5 response options in different settings. The revised shame subscale’s strong correlation with the PFQ-2 guilt subscale in our sample was similar to the correlations reported among clinical samples in the United States (Averill et al., 2002; Gambin & Sharp, 2018) and undergraduate students (Tignor & Colvin, 2019). While we expected these related constructs to be correlated and our results are comparable to those reported in other studies, the strength of their correlation may be indicative of redundancy or poor discrimination between these two subscales. Our findings highlight the need for future work to further examine shame and guilt as distinct constructs in comparable samples. The moderately strong correlation that we observed between the revised shame subscale and the BDI-II was similar to that reported in a clinical sample in the United States (Gambin & Sharp, 2018) and stronger than correlations reported with other versions of the BDI in clinical (Averill et al., 2002; Crossley & Rockett, 2005) and nonclinical (Bybee et al., 2009; D. H. Harder & Zalma, 1990; D. W. Harder et al., 1992) samples. The concurrent validity that we observed suggests that the subscale’s latent shame construct functioned as hypothesized and that the revised shame subscale can be leveraged as a measure of the frequency of shame experiences in future studies examining the impact of shame on HIV/STI risk and other important health outcomes in comparable populations.
Overall, our findings suggest that the Spanish-translated, revised PFQ-2 shame subscale is a reliable measure of the frequency of experiences of shame with good concurrent validity in our sample of FSWs in Tijuana and Ciudad Juarez. Our findings are similar to studies that found that the shame subscale items predominately loaded onto a single latent trait, and also align with studies that reported that the shame subscale demonstrated good internal consistency and supported concurrent validity with other measures. Our comprehensive psychometric evaluation expanded on previous literature by reporting the psychometric properties of the PFQ-2 shame subscale in Spanish and within a highly stigmatized population in a LMIC. Our findings should be replicated among other socially marginalized populations in comparable Spanish-speaking settings. We also expanded on previous literature by using methods from IRT to evaluate and improve the performance of individual scale items, as well as examine measurement invariance through DIF analysis. We observed from the OCCs that two response options performed poorly and item performance improved after collapsing from 5 to 3 response options, which suggests that participants in this setting had difficulty differentiating across the spectrum of the original 5-point Likert-type responses. Translating the original response options into Spanish may also have led to linguistic or cultural differences that made the spectrum of responses less relatable to FSWs in this setting. DIF is a common phenomenon that can bias estimates depending on the magnitude and directionality of the invariance in the sample, which makes DIF analyses important in comprehensive psychometric evaluations to identify and potentially mitigate this type of measurement bias. Prior studies that did not evaluate measurement invariance in their samples and subsequently did not control for notable DIF may have reported potentially biased estimates of the frequency of shame experiences or biased associations between shame and other health outcomes.
Our study has several limitations. First, the PFQ-2 shame subscale evaluated in this analysis is a measure of frequency of shame experiences and should not be interpreted as an assessment of shame specifically resulting from sex work. Second, the convenience sampling methods we used to recruit FSWs may limit the generalizability of our findings. However, these sampling methods are considered an effective recruitment strategy within stigmatized populations and were critical to our ability to recruit FSWs who are socially marginalized in Tijuana and Ciudad Juarez. Third, our interviewer-administered surveys may have introduced social desirability bias if participants were less forthcoming about reporting their feelings of shame in the presence of the interviewer. To minimize the potential for this type of bias, all interviewers had extensive experience working with stigmatized populations in this setting and were trained to create a nonjudgmental environment. Fourth, our recommended revision to the shame subscale is preliminary given that the revised 9-item shame subscale with 3 response options was not cross-validated in a holdout sample or a new sample. We have provided the groundwork for future psychometric studies to perform this important work in similarly stigmatized and overlooked populations, including examining how the misfit item in our study (“stupid”) performs in other comparable samples. Fifth, the scales used to evaluate concurrent validity were validated in English, but not validated after they were translated to Spanish by our research team as the assessment of their psychometric properties was outside of the scope of this analysis. We did, however, find that these measures of concurrent validity maintained good internal consistency in our sample. Sixth, we were unable to examine correlations of the PFQ-2 shame subscale with other shame measures because the main study was not designed to validate the PFQ-2 shame subscale and data on other shame measures were not collected. Future studies should examine the correlations between the PFQ-2 shame subscale and other measures related to frequency of shame experiences or temporary state of shame to evaluate the convergent validity of the PFQ-2 shame subscale in comparable populations and settings.
To our knowledge, this is the first psychometric analysis of the PFQ-2 shame subscale in Spanish and among FSWs in a LMIC. Our findings indicate that the Spanish-translated, revised shame subscale is a reliable measure of the frequency of shame experiences with good concurrent validity in a sample of FSWs in the Mexico–U.S. border region. These results suggest that the revised subscale may be a useful measure of shame among FSWs and should be explored and replicated in stigmatized populations in other settings. The significant DIF detected by city indicates that future analyses using the revised shame subscale should consider potential regional DIF.
Supplemental Material
sj-pdf-1-asm-10.1177_1073191120981768 – Supplemental material for Psychometric Evaluation of the Personal Feelings Questionnaire–2 (PFQ-2) Shame Subscale Among Spanish-Speaking Female Sex Workers in Mexico
Supplemental material, sj-pdf-1-asm-10.1177_1073191120981768 for Psychometric Evaluation of the Personal Feelings Questionnaire–2 (PFQ-2) Shame Subscale Among Spanish-Speaking Female Sex Workers in Mexico by Cristina Espinosa da Silva, Heather A. Pines, Thomas L. Patterson, Shirley Semple, Alicia Harvey-Vera, Steffanie A. Strathdee, Gustavo Martinez, Eileen Pitpitan and Laramie R. Smith in Assessment
Footnotes
Acknowledgements
The authors thank the study participants and research staff, without whom this study would not have been possible. The authors would also like to thank Dr. David Strong for his methodological expertise and support.
Authors’ Note
The code for this study is available in the online supplementary material and the data used for these analyses are available on request from the corresponding author, Dr. Laramie R. Smith.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Institute on Drug Abuse - grant numbers K01DA039767 (LRS), K01DA040543 (HAP), R01DA039071 (TLP), and T32DA023356 (CEDS).
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
