Abstract
For gray or achromatic objects, brightness is a relatively simple transformation where very low luminance levels are perceived as black and higher levels are perceived as white. For chromatic objects, the transformation is more complex, depending on color purity as well. This influence of color purity on a color’s perceived brightness is a well-established phenomenon known as the Helmholtz–Kohlrausch (H-K) effect. We investigated gender differences in the H-K effect by measuring brightness (via direct brightness matching [DBM]) and luminance (via heterochromatic flicker photometry [HFP]) at five wavelengths (450, 520, 560, 580, and 650 nm) perceived as blue, green, green-yellow, yellow, and red hues. We compared DBM/HFP ratios between 13 males and 18 females. Based on previous evidence of a female advantage in chromatic processes, we hypothesized that DBM/HFP ratios would be higher in female subjects. While HFP measures were essentially the same between male and female subjects, DBM measures and DBM/HFP ratios were significantly higher for female subjects than males. There were no significant effects of contraceptive use based on a post hoc comparison. We also derived simple models of brightness as a function of luminance and saturation, which further suggest gender dimorphism in the H-K effect.
There have been numerous investigations over the past several decades into gender differences in visual processing. The balance of the evidence from these suggests that females have visual systems better tuned for stationary and chromatic (particularly red-green) stimuli, and—more equivocally—males are better tuned for transient, low contrast, achromatic stimuli (see reviews: Alexander, 2003; Handa & McGivern, 2015). Support for gender dimorphism has accumulated from models such as dynamic visual acuity (Ishigaki & Miyao, 1994), saturation (Foutch & Peck, 2013b; Nichols, 1885), sorting colored caps (Bimler et al., 2004), duration of color afterimages (Hoynga & Wallace, 1979; McGuiness & Lewis, 1976), and contrast thresholds of stimuli biased toward separate parallel processes (Foutch & Peck, 2013a). Evidence for a male advantage in achromatic luminance and/or spatial vision is more equivocal, with support from studies (Abramov et al., 2012; Linn & Peterson, 1985; Oen et al., 1994) but others showing no gender difference (Foutch, 2018; Solberg & Brown, 2002; Zaroff et al., 2003).
Understanding the basis for these differences starts with understanding how light is transformed by the visual system. Visible radiation has two basic properties—radiant energy (Q e ) and wavelength (W). These physical properties are transformed into the psychophysical properties of luminance (L), wavelength (W), and purity (P). For self-luminant or aperture color objects, the perceptual attributes are brightness (B), hue (or the “name” given to the color), and saturation (colorfulness of an object as a proportion of its perceived brightness). While these are complex transformations that occur through two anatomically and distinct pathways (Bassi & Lehmkuhle, 1990; Leventhal et al., 1981; Perry et al., 1984), contributions from achromatic and chromatic pathways can be quantified by methods such as HFP and DBM.
HFP experiments involve reducing the flicker seen when temporally alternating a monochromatic light spatially matched against a reference broadband light, and the luminance signal for each wavelength is calibrated psychophysically by a linear addition of contributions from long- (L) and medium-wavelength-sensitive (M) cone photoreceptors (Lennie et al., 1993). After an observer adjusts the intensity to minimize flicker perception, relative luminosity (RL; the ratio of the reference luminance to that of the color stimulus) is calculated for each wavelength. Although rates vary for different wavelengths, one study has suggested that the chromatic system is only fast enough to detect seven alternations per second (Ikeda & Shimozono, 1978). Beyond that alternation rate, equally bright flickering stimuli will appear to be continuous. HFP then isolates achromatic channels by staying below the critical flicker rate of the achromatic system (<35 Hz; Hecht & Verrijp, 1933) while avoiding influence of the chromatic channels (>7 Hz). Generally accepted flicker rates for HFP are 18 to 25 Hz (Ikeda & Shimozono, 1978). HFP measures have played an important role in forming the standard luminous efficiency function—V(λ), frequently used in calibrating visual displays (Lennie et al., 1993)—because they are additive. That is, if color C results from the mixture of two colors, A and B, the luminance of C should be equal to the sum of luminance A and luminance B (Wagner & Boynton, 1972). HFP measures are also transitive. This means that if two colors match in perceived luminance to a reference white, then A and B should also match in perceived luminance.
Direct brightness matches are not additive and are experimentally derived by simultaneously presenting two stimuli side by side in a bipartite field separated by a small—but important—gap. Without the gap, observers may simply minimize the perceived border between the two stimuli. In doing so, they ignore the brightness of the color fields, and the matches are additive and based on luminance (Kaiser et al., 1990). In DBM, the observer instead adjusts the intensity of the color field until it matches the reference field in perceived brightness. While less repeatable (Roufs, 1978) and more subject to daily shifts in criteria (Yaguchi et al., 1993) than HFP measures, additivity failures from the contribution of chromatic channels enhance brightness, particularly in the more saturated blue (short wavelength) and red (long wavelength) portions of the spectrum (Guth & Lodge, 1973). Brightness is less enhanced at certain wavelengths but also in less saturated color objects containing more white or in certain combinations of colors such as prismatic yellows or green-yellows. This complex interaction of wavelength and saturation on perceived brightness is known as the Helmholtz–Kohlrausch (H-K) effect (reviewed by Donofrio, 2011; Figure 1). As shown in Figure 1, colored lights near the ends of the visible spectrum (e.g., reds and blues) appear brighter than those in the middle (e.g., green-yellows and yellows) when matched for luminance. In the limit, a completely unsaturated white light containing equal mixtures of blues, greens, and reds would be the dimmest of all stimuli when matched in luminance to a colored object. A simple summary explanation of the H-K effect is that colored lights appear brighter than white lights of the same luminance.

Demonstration of the Helmholtz–Kohlrausch Effect. The upper portion of the figure shows five different colored patches (loosely representing those used in the current experiment) against a gray background. The patches are all converted to gray scale in the lower portion and have approximately the same luminance, depending on monitor calibration. For most individuals, the blue and red patches will appear the brightest followed by the green patch. The green-yellow and yellow (or amber) patches will be the dimmest.
The H-K effect is, however, anything but simple. It varies among individuals (Kurtenbach et al., 1997) and will also be affected by the lightness and color of surrounding objects, the effect being most extreme when colored lights are viewed in a completely darkened space (Donofrio, 2007; Lotto & Purves, 1999). The H-K effect has been shown both theoretically and experimentally to be derived from the ratio of chromatic to achromatic activation (Nayatani, 1997, 1998; Schanda & Bukta, 1999). It is reasonable then that it may differ between men and women, especially taken together with previous evidence of a female advantage in chromatic processes. The present study tested the hypothesis that the H-K effect is stronger in women by comparing DBM, HFP, and DBM/HFP ratios collected at five wavelengths—450 nm, 520 nm, 560 nm, 580 nm, and 650 nm (perceived as blue, green, yellow-green, yellow, and red hues). DBM findings were considered measures of total brightness, while HFP findings were measures of luminance. DBM/HFP ratios were used as estimates of the H-K effect, hypothesized to be higher in female subjects.
Methods
Subjects
We based an a priori power analysis on a very high effect size (η2 > 0.7) found in a previous investigation into gender differences in color contrast thresholds (Foutch & Peck, 2013b). To achieve a power (1-β) of 0.80 in a one-way analysis of variance (ANOVA) using a very high expected effect size (η2 = 0.6), a statistical significance level (α) of .05, and a single fixed factor with two groups (male and female; k = 2), we needed at least 12 males and 12 females. As the current study uses a more powerful repeated-measures design, we considered this to be a conservative estimate. Subjects were then recruited by convenience, and informed consent was obtained from 13 males and 21 females. All volunteers were naïve to color research and were paid for participating. Subjects were eligible if they were between 18 and 45 years old, had best-corrected visual acuity of 20/25 or better in each eye, and had normal color vision tested with pseudoisochromatic plates and Farnsworth D-15 color panels. For convenience in recruiting, use of hormone-releasing contraception (HRC) was permissible and considered as a variable in a separate within-females analysis. Moderate tobacco use was allowed for any subject if the number of pack years (number of packs smoked per day × number of years smoked) was below 10 years, the amount previously shown to affect color sensitivity or discrimination (Bimler & Kirkland, 2004; Erb et al., 1999). Ultimately, no tobacco users volunteered for the study. Exclusion criteria included a self-reported history of neurological or psychiatric disorders, use of medications or nutritional supplements known to affect color vision, or family history of color defective vision. In addition, female subjects were ineligible if they were menopausal or pregnant. Because females heterozygous for color vision defects (CVDs) may not be detected by pseudoisochromatic plate or D-15 testing, they were also tested for abnormal color discrimination with the Medmont C-100 (Medmont International PTY Ltd., Nunawading, Australia) and Nagel anomaloscope. Based on all these criteria, three female volunteers with color-deficient fathers were excluded. The institutional review board of the University of Missouri–St. Louis approved the experimental protocol, completed by 13 males (ages: 25.7 ± 5.2; range: 22–42 years) and 18 females (ages: 26.1 ± 4.7; range: 22–40 years). Seven female participants (ages: 26.0 ± 6.2; range: 23–40 years) reported use of some form of HRC. Eleven female subjects (ages: 26.2 ± 3.7; range: 22–33 years) were not HRC users.
Calibration
A three-channel optical Newtonian-view system was used to produce a 2-degree, circular field for both experimental tasks. A uniform circular field was used for HFP, while a side-by-side bipartite field was used for DBM. Splitting the output light from a 1,000-watt Xenon arc lamp (color temperature, 5,800 K) through an antireflective window formed two illumination channels. The intensity of the test channel was adjusted with an iris aperture and a motorized, computer-controlled set of dual counter-rotating variable neutral density filters. A motorized, computer-controlled narrow band-pass interference filter wheel produced each of the five test wavelengths. The intensity of a second spectrally broad reference channel was also adjusted using a variable neutral density filter and an iris aperture. The reference channel was then further split into two channels via a front surface mirror that could be translated in and out of the reference beam. With the mirror in place for the HFP task, the test and reference beams were spatially merged yet temporally separated via a remote-controlled mirrored optical chopper rotating at 18 cycles/second (Hz). The test and reference beams alternately illuminated the 1.9 cm diameter end of an acrylic cylinder. The other optically frosted end of the cylinder served as a diffuse circular viewing screen.
For the DBM experiments, a fixed front surface mirror reflected the reference beam onto the left viewing half of a bipartite viewing field, separated from the monochromatic test field by a 0.5-mm thick sheet of aluminum. The fused and whole cylinders were part of a calibrated, mounted set that was remotely translated depending on task. A chin rest was used to position each subject 54.5 cm from the viewing end of the optic, which subtended 2° of visual angle.
Because time did not permit a direct measurement of the stimuli after each trial, stimulus luminance was calculated from filter wheel settings (from 50° to 330°) for each of the five wavelengths. These output angles were modeled to luminance using a spectroradiometer (Photo Research, Inc., Syracuse, NY, USA) for every 10th degree shown on the output controller according to the following equation:
Each morning of data collection, the luminance in the reference channel was adjusted to 5.0 cd/m2. This corresponds to a low photopic level, ensuring reliable, uniform cone contributions to both achromatic and chromatic systems (Lee, 1999) while staying in the recommended range for accurate HFP measures (De Vries, 1949).
Experimental Procedure
A single experimental session began by adapting each subject to the background room luminance (∼ 0.5 cd/m2) for at least 5 minutes. Subjects then practiced DBM and HFP matches, while the Xenon bulb could warm-up for at least 30 minutes. The practice time concluded when the range of four trials at each wavelength fell within one standard deviation of the mean for both DBM and HFP tasks.
For each trial, subjects adjusted the intensity of the test stimulus until it matched the reference field (for DBM) or the flicker sensation was minimized (for HFP), and the filter settings were automatically transferred into an Excel spreadsheet (Microsoft Corporation, Redmond, WA, USA). Custom software then remotely decreased the filter wheel setting by a randomly determined value between 50 and 100 degrees (approximately 1–2 log units of luminance, depending on the wavelength and current intensity). Four trials were performed for each wavelength (order randomized) for both the HFP and DBM tasks.
Analysis
The filter settings were converted into luminance values using the calibration regression equations. For each task (DBM and HFP), luminance values from the four trials were averaged at each wavelength, and RL was determined by dividing the reference luminance (5 cd/m2) by the average luminance of the color stimulus. RL was analyzed across wavelengths by repeated-measures ANOVAs with task and gender as fixed factors.
DBM/HFP values were calculated as the simple ratio of these values for each wavelength. To confirm a significant contribution across wavelengths to the latent trait (contribution of chromaticity to brightness) being studied, we performed a confirmatory factor analysis on DBM/HFP ratios overall and for males and females separately. The ratios were then analyzed across wavelengths by repeated-measures ANOVAs with gender as a fixed factor. To determine the effects of HRC on all measures, we repeated all analyses separately for female subjects with contraceptive use as a fixed factor. All analyses were performed using SPSS (IBM Corp., Armonk, NY, USA).
Results
Ratios of Normalized Versus Nonnormalized DBM and HFP Measures
If DBM and HFP measures are normalized for individuals (i.e., dividing HFP and DBM measures by the maximum measures for each subject to yield a HFP and DBM value of “1” at the maxima), the variability in the ratios may be too constrained to be affected by changes in levels of luminance and/or saturation. However, there is value in normalizing the measures when comparing them graphically between subjects. To test which ratio is most appropriate, a confirmatory factor analysis was performed for all subject data first using DBM/HFP ratios of normalized measures. Two factors were extracted, one composed of variance from the 450 and 580 nm stimuli (34% of the variance) and a second composed of variance components from the 560, 580, and 650 nm stimuli (an additional 20% of variance). Variance in measures from the 520 nm stimulus did not contribute significantly to either factor for ratios of normalized measures. However, a single extracted factor was composed of variance components from all wavelengths (50% of the overall variance) when using DBM/HFP ratios of nonnormalized measures. In addition, a second factor composed of variance components from the 520 and 650 nm stimuli was extracted for females but not males. Based on these results, ratios of nonnormalized measures (hereafter referred to as DBM/HFP ratios) were more suitable estimates of the H-K effect.
Gender Findings
The main effects of task, F(1, 29) = 5.46, p = .027, and wavelength, F(4, 26) = 40.3, p < .001, and their interaction, F(4, 26) = 23.3 p < .001, were all significant. The main effect of gender, F(1, 29) = 8.77, p = .006, and the interaction of task and gender, F(1, 29) = 16.9, p < .001, was also significant (refer to Figure 2). When RL measures were analyzed separately by repeated measures across wavelengths, the effect of gender was significant for DBM, F(1, 29) = 19.8, p < .001, but did not reach significance for HFP, F(1, 29) = 4.01, p = .055. Females did have significantly higher RL measures at each wavelength, except for HFP at 450 nm, t(29) =–0.590, p = .560, and DBM at 450 nm, t(29) = 1.09, p = .286.

Relative Luminosity (by Task and Gender) Across Wavelengths. The abscissa values are the five wavelengths used in the study but are offset in the curves to help visualize the error bars (Indicating ± 1 SEM).
DBM/HFP ratios were then analyzed across wavelengths by repeated-measures ANOVAs with gender as a fixed factor. The main effects of wavelength, F(4, 26) = 81.2, p < .001, and gender, F(1, 29) = 16.9, p < .001, were both significant (refer to Figure 3). Although the interaction of gender and wavelength was not significant, F(4, 26) = 1.52, p = .226, additivity failures seen in DBM measures are known to vary across wavelengths. Therefore, it was worthwhile to investigate the simple effects of gender on DBM/HFP ratios at each wavelength. Significant gender differences were found at 450 nm, t(29) = 2.75, p = .010; 520 nm, t(29) = 3.22, p = .003; 580 nm, t(29) = 2.93, p = .007; and 650 nm, t(29) = 2.69, p = .012. The gender difference was not significant at 560 nm, t(29) = 1.20, p = .241.

DBM/HFP Ratios (by Gender) Across Wavelengths. Error Bars Indicate ± 1 SEM.
Effects of Contraceptive Use
All analyses were repeated for females with HRC as a fixed factor. The main effects of task, F(1, 14) = 9.06, p = .009, and wavelength, F(4, 11) = 26.3, p < .001, and their interaction, F(4, 11) = 13.0, p < .001, were all significant. The main effect of neither contraceptive use, F(1, 14) = 0.274, p = .609, nor the interaction of task and contraceptive use, F(1, 14) = 0.005, p = .943, was significant (refer to Figure 4). When analyzed separately, both HFP and DBM measures appeared to be equivalent between HRC users and nonusers.

Relative Luminosity of Female Subjects (by Task and Contraceptive Use) Across Wavelengths. The abscissa values are the five wavelengths used in the study but are offset in the curves to help visualize the error bars (Indicating ± 1 SEM).
DBM/HFP ratios were then analyzed across wavelengths by repeated-measures ANOVAs with contraceptive use as a fixed factor. These ratios were essentially equivalent across and at each wavelength (see Figure 5). A post hoc power analysis using the calculated effect size (0.27) of the HRC comparisons revealed that we needed at least 55 subjects in each group to achieve a power of 0.80. This would have been too impractical.

DBM/HFP Ratios of Female Subjects (by Contraceptive Use) Across Wavelengths. Error Bars Indicate ± 1 SEM.
Discussion
The major finding of this study was a stronger Helmholtz–Kohlrausch effect across wavelengths in women than in men. The effect was small—approximately 0.25 to 0.37 log units—but it was clear and statistically significant. There was no effect of HRC on any measures, but the comparisons were too statistically underpowered to draw any inferences. While we did not attempt to experimentally establish a mechanism for the gender findings, we believe it is the first report of such dimorphism in the H-K effect.
Reliability of Findings
Our HFP measures appear similar in shape to standardized luminosity functions, indicating an additive luminance mechanism assumed to be an achromatic process. Our direct brightness measures for all averaged subjects are also very similar in shape to other studies of heterochromatic brightness matches (Floyd et al., 2004; Harrington et al., 2005; Ikeda & Shimozono, 1978; Wagner & Boynton, 1972). While these previous studies did not compare genders, their results—as do ours—implicate the role of chromatic or opponent processes in increased brightness during brightness matches.
It is well known that subjective matches are easier to make in HFP than in DBM (Ikeda & Shimozono, 1978). Ives (1912) further referred to DBM measures as “uncertain … unsatisfactory … (and) at first attempt impossible.” Somewhat ironic in Ives’s argument is that DBM becomes easier when the colors being matched are more similar. That is, while the H-K effect is reduced for more unsaturated color mixtures (such as green-yellows and yellows), these colors are easier to match in brightness to a broadband reference stimulus than are more saturated blue and red stimuli. This was the case for many of our observers who experienced a “glow” when viewing red and, more so, blue stimuli. In early descriptions of the H-K effect (Kohlrausch, 1935; König, 1947), this effect is referred to as “Farbenglut” (or “color glow”).
Previous authors have used various methods to control for the variation in direct brightness measures. Kurtenbach et al. (1997) used the average of six brightness matches, each obtained by the method of ascending and descending limits. Floyd et al. (2004) employed a short (8 minute) adaptation and practice period then used a staircase procedure that produced 10 reversals. The first six reversals were ignored, and they averaged the last four to determine the threshold. However, Ikeda and Shimozono (1978) used a very similar method to that of the present study. The only differences being that subjects adjusted the intensity of the reference stimulus, and five adjustments were averaged (instead of four). Harrington et al. (2005) used nearly the exact method as described in the present study but described an increased signal-to-noise ratio in their brightness matches when compared with their flicker measures.
We attempted to control for using the method of adjustments by first adapting subjects to a very low background room luminance then allowing them to practice for 30 minutes. This practice was monitored, and subjects needed to demonstrate consistent flicker and direct matches at each wavelength before starting the single experimental session. In addition, subject brightness “expectations” were controlled by randomly resetting the color intensity after each trial (see the Methods section for more details).
Despite this practice, our results do show moderate variability in DBM measures. Figure 6 shows all DBM matches for a randomly selected male subject. The percent variation in DBM measures (i.e., standard deviation of four trials/mean of four trials) for all subjects ranged from 1% to 25% (depending on the wavelength) and is plotted in Figure 7. (While we found no effect of HRC on any measures, we have plotted data for HRC users separately for convenience in viewing the data for 18 female subjects.) Even the higher variations (i.e., 25%) could not account for our gender difference in DBM/HFP ratios.

Luminance Values for DBM Matches Found on Four Trials for One Male Subject. The variability in DBM matches was typical of that seen across subjects.

Percent Variation in DBM Measures Across Wavelengths for All Subjects.
The percent variations between subjects in HFP, DBM, and DBM/HFP values ranged from 30% to 200% depending on the wavelength and task and are plotted in Figure 8. The between-subject variations are lowest for all measures in the middle wavelengths and highest for the blue stimulus. While these values seem high, they are consistent with the findings of Sanders and Wyszecki (1958) who found very high between-subject variation in brightness/luminance ratios for blue and red stimuli.

Percent Variation in DBM Matches Across Wavelengths Between Subjects.
Data for all tasks and subjects are plotted in Figures 9 to 11. There is certainly more variability in DBM measures and DBM/HFP ratios than in HFP measures. However, DBM measures and DBM/HFP ratios are consistently higher in female subjects, and the within-subject variation does not appear to limit our inferences about gender differences.

Flicker Photometry Data for Male (A), Contraceptive Users (B), and Noncontraceptive Users (C). Subject ages are shown for comparison. Overall, HFP measures were equivalent between males and females across wavelengths. HFP Data for HRC users appear to be more tightly spaced, but mean values were not significantly different that for Non-HRC Users.

Direct Brightness Data for Male (A), Contraceptive Users (B), and Noncontraceptive Users (C). subject ages are Shown for Comparison. While there is overlap between males and females, mean values are clearly higher for female subjects. The effect appears to be even higher in contraceptive users, but the within-females comparisons by contraceptive use did not reach statistical significance.

DBM/HFP Ratios for Male (A), Contraceptive Users (B), and Noncontraceptive Users (C). Subject ages are shown for comparison. As with DBM measures, there is considerable overlap between males and females, but mean values are clearly higher for female subjects. The effect appears graphically to be equivalent between HRC Users And Nonusers.
Modeling Brightness as a Function of Luminance and Saturation
The lack of a statistically significant interaction of gender and wavelength on DBM/HFP ratios implicates gender differences in parallel processes for colors other than red and blue in the present study (see Figure 3). In fact, the ratios were higher for women for all wavelengths other than 560 nm. These findings are consistent with at least one report implicating gender in chromatic differences across wavelengths (Foutch & Peck, 2013b). The simple effects at each wavelength deserve some consideration. First, the increased DBM/HFP ratios at 450 and 650 nm are not surprising, if in fact the effect is due to increased activation of chromatic mechanisms. After all, chromatic contributions involved in direct brightness matches cause additivity failures that are most pronounced at short and long wavelengths (Guth & Lodge, 1973).
A classic theme in brightness–luminance relationships is that they do not vary linearly (Fechner, 1860) rather according to the well-known power law:
These simple models are also gender dimorphic, with female subjects’ DBM/HFP ratios depending more on saturation than do male subjects. In addition, the experimental data fit better to the model for females than for males (see Figure 12). Considering our findings of an increased H-K effect in females, the stronger relationship in females is not surprising.

A Simple Model of DBM/HFP Ratios as a Function of Saturation. Female subjects’ DBM/HFP ratios appear to depend more on saturation than do male subjects.
The most generalized form of Sagawa’s equivalent luminance model that we could fit with our data is
The models are again different between genders, with the brightness (DBM) depending more on saturation in females. However, the model fits better to the male experimental data than for females. Poorer fit of the experimental data to a photopic model (i.e., a = 1), increased deviation of the rod-cone coefficient (a) from 1.00 in female subjects (0.91 compared with 0.95 in males), and the shift in peak female DBM measures to lower wavelengths (see Figure 2), all point to a possible contribution of rod-mediated processes in females that is less present in males at this high-mesopic/low-photopic reference light level. This possibility deserves more study.
We then calculated predicted DBM values and DBM/HFP ratios for both models using mean experimental HFP measures. These are shown, along with experimental data for comparison, in Figure 13.

DBM/HFP Ratios Modeled by Saturation Only (Dotted Lines) and by HFP and Saturation (Dashed Lines). The Black Lines Represent Females, and the Gray Lines Represent Males. The models for females are simply shifted from each other by a constant term and appear to better predict our experimental green-yellow, yellow, and red ratios while they underestimate ratios for blue and green. On the other hand, the models for males are different from one another, with the saturation-only model (dotted gray line) predicting most ratios well. The HFP and saturation model (dashed gray line) underestimates green and green-yellow ratios but overestimates ratios for red.
Possible Mechanisms
While the present design does not provide the means to fully differentiate between anatomical mechanisms for the gender brightness differences, there are important considerations.
Mesopic Light Levels
Perhaps the factor that most limits our findings is the high-mesopic/low-photopic level for our reference stimulus (5 cd/m2). We primarily chose this level because it theoretically should produce the most stable flicker (De Vries, 1949) and brightness (Lee, 1999) matches. Ikeda and Shimozono (1978) proposed the following log-linear combination of photopic and scotopic functions: log(meso[λ]) = x log SR(λ) + y log SA(λ) + z, where meso(λ) is the theoretical calculated mesopic efficiency, SR(λ) is the measured luminous efficiency function at the reference level, SA(λ) is the measured luminous efficiency at the adaptation luminance level, and x, y, and z are dimensionless regression coefficients. Applying this (Ikeda–Shimozono) formula for the adaptation (0.5 cd/m2) and reference levels (5 cd/m2) used in this study, x and y would be very close to 1 and 0, respectively, indicating that photopic luminous efficiency can be approximated, within a constant term, when collected at reference levels of 5 cd/m2.
Physiological Correlates
A possible mechanism for the increased variation in male HFP data at 450 nm is macular pigment optical density (MPOD). We know that MPOD levels vary significantly between individuals, and higher levels increase the intensity needed in blue stimuli to match reference stimuli (Hammond et al., 1996). Although MPOD was higher and more varied in males than in females, we were missing data from 11 subjects and unable to fully model the effects of MPOD on brightness across wavelengths. However, regarding the magnitude of our gender difference in the H-K effect, no previous MPOD models would predict any effect on the 650 nm stimulus. In addition, there is no reason to believe that MPOD levels would affect DBM and HFP measures differently, and our small but clear differences in DBM/HFP ratios would be unaffected by MPOD differences. This is an area for further investigation.
The effects of age on RL and color discrimination have been explained in part by lenticular changes that reduce light intensity, especially in the short wavelength portion of the visible spectrum (Knoblauch et al., 1987, 2001; Kraft & Werner, 1994, 1999). The context of these investigations varied, limiting the applicability to our results. We did, however, find negative correlations between age and all RL measures across wavelengths for all subjects as well as males and females, separately. The only significant correlation was for HFP at 450 nm (r = –.44, p = .035) for all subjects. The flicker results at 450 nm were similar between males (r = –.42) and females (r = –.34) but did not reach statistical significance for either. Findings at other wavelengths were less robust, as was the DBM reduction at 450 nm for older subjects (consistent with Kraft & Werner, 1994). In addition, males and females were basically age-matched in our sample. While we cannot fully address the question of age or lens effects, it is doubtful that age-related lenticular changes played a significant role in the DBM/HFP ratio differences in our relatively young population (aged 22–42 years).
Heterozygous Carriers for CVDs
It is generally accepted that the cone photoreceptor mosaic varies a great deal among genotypically normal individuals, but there is a lack of support for gender differences (Kimble & Williams, 2000). In addition, previous investigations have shown that even large individual differences in L/M cone ratios often result in virtually no color perception differences (Brainard et al., 2000; Neitz et al., 2002). More interesting is the role of heterozygous carriers for CVDs who were possibly included in the present study. Trichromatic vision in CVD carriers has been characterized as reduced (Bimler & Kirkland, 2009; Crone, 1959; Harris & Cole, 2005; Hood et al., 2006; Krill & Schneiderman, 1964; Lang & Good, 2001; Schmidt, 1955), essentially normal (Miyahara et al., 1998), and advantaged (Mollon, 1986), and it is possible that the screening regimen in the present study was insufficient to exclude all. However, 3 (out of 21 female) volunteers were daughters of color-deficient men and were disqualified. This is consistent with the percentage (15%) of carriers in the female population (Jordan & Mollon, 1993; Neitz & Jacobs, 1986). Even if carriers were present, it is doubtful that any advantage afforded them in color discrimination could significantly contribute to the overall gender effect on chromatic contribution to brightness across wavelengths. Considering that L- and M-cone mechanisms are minimally sensitive to short wavelengths, the robust effect of gender on DBM/HFP ratios at 450 nm cannot be accounted for by photopigment variations in CVD carriers.
Hormonal or Cyclical Effects
It is possible that differences found in the current study could be hormonal or cyclical. After all, estrogen can act directly and quickly on nerve cell membranes to increase blood flow, change regional activation patterns, or modulate neurotransmitter release or production (Smith & Zubieta, 2001; Toker et al., 2003; reviewed by Markou et al., 2005). An investigation of the effect of hormonal contraceptives on color discrimination in dental students revealed significantly more green, pink, and red color discrimination errors in contraceptive users (Da Silva et al., 2015). This contrasts with our findings, as we found no differences between normal cycling women and hormonal contraceptive users across wavelengths. However, there was a clear—though not statistically significant—increase in brightness of the green (520 nm) stimulus in HRC users (see Figure 4). This at least partly implicates the role of female sex hormones in the current results. Alternatively, Abramov et al. (2012) hypothesized that testosterone plays a role in their findings of sex differences in color sensations, possibly causing sex differences in lateral geniculate nucleus to cortex connectivites. Perhaps our clear and robust reduction in DBM measures for males across wavelengths implicates testosterone’s role in the increased additivity failures seen in males in our study. Direct hormonal measures and their effects on the chromatic contribution to brightness deserve further study.
Color Preferences
In a study of sex differences in hue preference among 208 British and Chinese subjects, color space basis functions contained negative red-green weightings for men and positive for women in both cultural groups (Hurlbert & Ling, 2007). Although the link is tenuous, these results are consistent with the findings in the current study of a second factor of DBM/HFP ratios in females composed of positive red and green weightings and further demonstrate a possible cross-cultural red-green advantage in females. There is extensive literature concerning the role of color preferences (reviewed by Palmer & Schloss, 2015). Cross-culturally, adults most prefer blue hues and least prefer green-yellows and yellows (Palmer & Schloss, 2010). This is consistent with the shape of the H-K effect found in the current study, but there is equivocal evidence regarding the role of saturation and lightness on color preference. Palmer and Schloss (2010) also found that adult subjects preferred more luminous and/or more saturated blues and green-yellows over “dark” ones. However, they found the opposite for reds and greens. They also found, most in contrast to the present findings, that men preferred more saturated colors and women preferred paler or desaturated colors. It has also been suggested that overall gender differences in visual system organization may be due to different life experiences (Jones, 1998; reviewed by Tsodyks & Gilbert, 2004) or socialization (Greene & Gynther, 1995; Parke & Sawin, 1976) between genders. While some authors’ findings have hinted that at least a portion of color preferences are innate (Palmer & Schloss, 2010; Teller, 1979), others have found that color preference curves are dramatically different between infants and adults (Taylor et al., 2013). It is possible then, that the large gender findings of the present study are in part due to differences in chromatic experiences afforded boys and girls. However, previous evidence is equivocal, and the present study did not make direct measures that weigh the argument in either direction.
Conclusions
The current findings provide robust support for the notion that the Helmholtz–Kohlrausch effect is more extreme in females. We successfully modeled brightness as a function of luminance and saturation, and the models differ for men and women, but we did not establish a mechanism for the difference. The present study does clearly establish brightness differences between adult men and women based on stimulus and experimental paradigms biased toward chromatic or achromatic processing. The advantage then is possibly due to a gender dimorphic interaction between parallel visual processes. Developing research and clinical tools to further model gender and, possibly, hormonal or cyclical effects on chromatic processing should be a priority.
Footnotes
Acknowledgements
This study was conducted at the University of Missouri–St. Louis, and the authors would like to thank Michael Howe and John Redd for their assistance with the hardware and software used in the experimental tasks. The authors would also like to thank two anonymous reviewers who made this work significantly better.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
