Abstract
Objective:
The objectives of this report are to review the publications resulting from National Institutes of Health (NIH)-funded phase 3 trials monitored by NIH for inclusion and to address the quality of the research conducted and the validity of the sex/gender-specific or sex/gender difference analyses reported.
Methods:
For intervention trials enrolling both women and men, this review links reports to NIH of completed enrollment to publications of trial results. Each publication was then reviewed for a variety of reported characteristics based on established measures of quality, bearing on whether or not the research will permit valid analyses of sex/gender differences.
Results:
Publications from 268 trials reported an overall average enrollment of 37% (±6% standard deviation [SD]) women, at an increasing rate over the years 1995–2010. Only 28% of the publications either made some reference to sex/gender-specific results in the text or provided detailed results including sex/gender-specific estimates of effect or tests of interaction.
Conclusions:
Efforts at including women in clinical research have increased the information captured relative to women's health. Sex/gender-specific information has been captured and should be available to other researchers for further analysis, including individual patient data meta-analyses. Improved reporting and disseminating sex/gender-specific results will allow sex/gender-specific inferences and healthcare decisions.
Introduction
Although the National Institutes of Health (NIH) policy first suggested and later urged the inclusion of women in clinical research, it was not until the NIH Revitalization Act of 1993 (P.L. 105) that Congress mandated that the NIH ensure and monitor the inclusion of women in clinical research. That mandate was soon revised to ensure inclusion of minorities. Once implemented, 1 that mandate required NIH-funded investigators to report annually the numbers of women and minorities included in each clinical research project. Since 1997, NIH has reported to Congress on the progress in inclusion of women and minorities in clinical research. 2
The requirements for inclusion apply to all NIH-funded clinical research, with additional requirements on phase 3 research.
3
Each NIH-funded phase 3 clinical research study must have a sex/gender inclusion plan, which is judged Acceptable based on one of the following criteria: Available evidence strongly indicates significant sex/gender differences of clinical or public health importance in intervention effect, and the study design is appropriate to answer two separate primary questions—one for males and one for females—with adequate sample size for each sex/gender. Available evidence strongly indicates there is no significant difference of clinical or public health importance between males and females in relation to the study variables. Representation of both sexes/genders may be required. There is no clear-cut scientific evidence to rule out significant differences of clinical or public health importance between males and females in relation to the study variables, and the study design includes sufficient and appropriate representation of both sexes/genders to permit valid analyses of differential intervention effect. One sex/gender is excluded from the study because: 1. Inclusion of these individuals would be inappropriate with respect to their health, or 2. Inclusion of these individuals would be inappropriate with respect to the purposes of the research (e.g., the research question addressed is only relevant to one sex/gender).
When no clear-cut scientific evidence exists to rule out significant differences of clinical or public health importance between males and females (criterion C), each clinical research study should contribute new knowledge relevant to their health, to future research hypotheses, to practice guidelines, or to health policy. Russek-Cohen and Simon 4 suggested that most trials will fall under criterion C: that potential to contribute new knowledge will only be realized with the publication of research results. The original guidelines conclude that “since a primary aim of research is to provide scientific evidence leading to a change in health policy or standard of care, it is imperative to determine whether the intervention or therapy being studied affects women or men or members of minority groups and their subpopulations differently.” 5 This review of publications from NIH-funded phase 3 clinical research addresses the reporting of sex/gender participation and outcomes, the potential for valid analyses of differential intervention effect, and the reporting of sex/gender differences.
Materials and Methods
Unlike systematic reviews of the published literature in a given clinical area or earlier reviews of the inclusion of women in clinical research, 6 –8 which typically begin with a search of the literature, the process of selecting publications for this review began with the identification by the NIH Office of Extramural Research of all NIH-defined phase 3 clinical research studies that reported enrollment to the NIH Tracking and Inclusion System. That tracking process began in 1994, as mandated by the NIH Revitalization Act of 1993.
Only those phase 3 clinical trials that completed enrollment between 1994 and 2007, enrolling both women and men, were included in this review. The review excluded studies enrolling groups rather than individuals, for example, enrolling schools, dental practices, or families. The review also excluded ancillary studies and follow-on studies to earlier trials, as the sex/gender distributions enrolled were determined by the parent study, thus not independent observations. Multicenter studies that reported center-specific enrollment to the Tracking and Inclusion system separately for each center were included only once. Some of the studies had reported completion of enrollment but were yet to be published. This was because of ongoing long-term follow-up or possibly because of the combined effect of negative results and publication bias. Some of the studies were excluded from this review because of single sex/gender inclusion criteria or lack of intervention (an observational study). A thorough literature search for any published report on each study used the investigators' names, the study name, the NIH grant/contract number, and the ClinicalTrials.gov number.
One or two abstractors, including abstractors from the NIH Office of Research on Women's Health or from the NIH Tracking and Inclusion Committee, reviewed each publication and responded to a standard questionnaire (Appendix) to ensure the abstraction of consistent information from each publication. They identified characteristics of each publication, adapted from numerous methods addressing research quality, 9 –16 that would have direct bearing on the ability to product valid analyses. Those characteristics included the target population, the data collection processes, study design including rationale for the sample size, details of reported results, and any indication of a data repository. From the compiled abstraction results, descriptive statistics summarized those characteristics for all publications reviewed.
Results
Almost 4000 unique NIH-funded phase 3 grants or cooperative agreements were the source of the studies reporting completed enrollment in the years 1994–2007, as identified by the NIH Office of Extramural Research. As a result, 268 trial publications were reviewed and summarized. The triage from enrollment completion reports down to unique randomized trials is shown in Figure 1 below:

Selection process. CT, Clinical Trial; NIH, National Institutes of Health.
Distribution of journals
The publications reviewed were published in a variety of peer-reviewed journals (Fig. 2). The most frequently occurring publications appeared in the New England Journal of Medicine, the Journal of Clinical Oncology, and the Journal of the American Medical Association.

Journal distribution of publications reviewed. AnnIntMed, Annals of Internal Medicine; ArchOphth, Archives of Ophthalmology; CA, Cancer; CID, Clinical Infection Disease; Circ, Circulations; DiabCare, Diabetes Care; IntRadOn, International Journal of Radiation; JAMA, Journal of the American Medical Association; JCO, Journal of Clinical Oncology; JNCI, Journal of the National Cancer Institute; JUrol, Journal of Urology; NEJM, New England Journal of Medicine; Ophthal, Ophthalmology.
Year of publications
The publications appeared between the years 1995 and 2010 (Fig 3). The distribution of dates of publication was a function of the length of time for most trials to be completed. In the early years of NIH inclusion tracking, some of the trials reporting completion of enrollment had been designed and initiated enrollment before implementation of the 1993 NIH Revitalization Act.

Number of publications by year of publication.
Distribution of funding institutes and centers
The majority (47%) of the NIH-funded phase 3 clinical studies were funded by the National Cancer Institute (NCI), the National Heart, Lung and Blood Institue (NHLBI), and the National Institute of Allergy and Infectious Diseases (NIAID) (Fig. 4). Other institutes and centers funded fewer phase 3 clinical studies during this period, but all followed the same process for tracking of inclusion of women and minorities.

Distribution of funding institutes or centers. NCCAM, National Center for Complementary and Alternative Medicine; NCI, National Cancer Institute; NEI, National Eye Institute; NHLBI, National Heart, Lung, and Blood Institute; NIA, National Institute on Aging; NIAAA, National Institute on Alcohol Abuse and Alcoholism; NIAID, National Institute of Allergy and Infections Diseases; NIAMS, National Institute of Arthritis and Musculoskeletal and Skin Diseases; NIDA, National Institute on Drug Abuse; NIDCD, National Institute on Deafness and Other Communications Disorders; NINDS, National Institute of Neurological Disorders and Stroke.
Distribution of sample sizes and percent women enrolled
The average sample size for all the publications reviewed is 772 (median 379) participants. The sample sizes for the clinical studies ranged from 9 (a study stopped for futility 17 ) to 24,335 in the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT) trial. 18 The average percent of women enrolled over all the studies reviewed is 37% (95% confidence interval [CI] 31%-43%). Within the individual studies, the percent of women enrolled ranged from 1% to 92% (both extremes of the range because of early stopping before reaching the planned enrollment 19,20 ).
Study design characteristics
Most (78%) of these publications provider some rationale for the sample size, including, among other parameters, the assumed clinically important effect size and power. The publications reviewed did not address measures to prevent selection bias affecting participant enrollment. Insufficient detail about the participant selection to judge the potential for bias was seen in only a few (<5%) of the publications. One publication reported testing for the potential for selection bias, concluding that the test gave “some assurance that any treatment group imbalances on baseline factors and observed treatment group imbalances on baseline factors and observed treatment effects are not due to selection bias.” 21
Allocation to treatment group was almost always (>99%) by random allocation, often described in detail, for example, permuted blocks. One instance of nonrandom allocation occurred, 22 with all participants receiving all three interventions in nonrandom order, and patients who did not complete all three were excluded, introducing some selection bias. Blinded or masked allocation to treatment group was reported in 39% of these trials, and 31% reported double-blinding, including blinded adjudications of outcome. Blinding or masking was not possible in some, for example, surgery vs. medical interventions. Efforts to disguise differences detectable by taste, smell, or sight were considered based on the descriptions of those efforts in the publications. There was little evidence of poor allocation concealment; 87% were judged adequate in the reporting of allocation concealment. Allocation via sealed envelopes is not often used in contemporary trials but was used in at least one trial in the operating room for allocation immediately before to surgery. 23
Only two of the published reports failed to mention any Institutional Review Board (IRB) review or informed consent of the participants. A few of the 268 studies reviewed (9%) were stratified by sex/gender at treatment allocation.
Reporting enrollment by sex/gender and race/ethnicity
All the publications included in this review were known to have enrolled both women and men, not based on information provided in the publication but as reported by investigators to NIH through the Tracking and Inclusion System annual reports. However, only 93% of the publications reviewed reported the sex/gender distribution. Over the period of review, the percent of women enrolled in the published NIH-funded phase 3 clinical studies was 37%. The observed percent of women reported in publications was unstable because of a minimal numbers of publications in the earlier years but has been increasing over the period of this review to >40% (Fig. 5) in the most recent full years of this review. Only 66% of the publications reviewed reported the distribution of race/ethnicity despite reporting that information annually to NIH through the Tracking and Inclusion System annual reports.

Annual proportion of women enrolled and trend over time.
Analysis of sex/gender differences
This review identified 28% (75 of 268) of the publications with any reference to sex/gender-specific results. In some cases (<10%), the information was limited to a single sentence indicating no sex/gender differences were observed, without quantitative sex/gender-specific estimates published. Of the 75 publications reporting any reference to sex/gender-specific results, 29% were subgroup analyses providing sex/gender-specific estimates, and 71% included a sex/gender covariate or interaction in a regression model.
Discussion
The publications summarized here serve as the basis for inference and extrapolation of research results used to diagnose, treat, and prevent diseases in women as well as men. They also serve as the basis for planning the next generation of clinical research studies. The inclusion of both women and men in these studies should lead to more fully informed healthcare decisions; however, the reported results are limited in the level of detail directly applicable to women.
This report has limitations in its processes and scope. Some of the NIH-funded phase 3 clinical trials that had completed enrollment were not yet published, some may have appeared in publications not indexed by PubMed or other sources, some were published only as abstracts, and some may have been missed. Investigators occasionally may have followed the initial publication of results by later secondary analyses publications addressing sex/gender differences or preceded by a publication with more extensive details on the design and conduct of the study. The publications excluded, for example, studies randomized by group, may contribute to the information on sex/gender differences.
The high impact of the journals in which these NIH-funded phase 3 research studies were published reflects the critical hypotheses being addressed, the strength of evidence presented, the robustness of the study design, and the relevance to clinical practice and to health policy. The concentration of most of these publications in a few journals means that the editorial policies of those journals have strongly influenced the available sex/gender-specific information about what is included in the publication and what is eliminated. Very few scientific journals require or encourage the analysis of sex/gender differences in their publication guidelines. One exception is the Journal of the National Cancer Institute, which has exhibited recognition of the importance of sex/gender factors by providing the following highlighted instruction to authors: 24
Where appropriate, clinical and epidemiologic studies should be analyzed to see if there is an effect of sex or any of the major ethnic groups. If there is no effect, it should be so stated in Results.
Several examples illustrate this constraint. The discussion of the rationale for the sample size may be considered superfluous when the trial is completed or when the rationale was reported in an earlier publication describing the trial design. The rationale and target sample size, however, do provide readers with information relevant to their own interpretation of results. Presentation of the distribution of minorities by treatment group is another example of information that is known yet not reported in publication. With the availability of supplemental material online from many of these journals, the historic page limitations should not prohibit providing the information to readers.
Stopping a trial earlier than planned, either for futility or for early evidence of benefit or harm, is problematic in providing information on sex/gender-specific results. Even studies stopped for futility or for early evidence of benefit or harm provide information to the reader on interpreting the results presented. There is potential information for researchers planning future studies, such as differential rates of recruitment, loss to follow-up, or missing data by sex/gender. At least 5 of the 268 publications reviewed mentioned stopping recruitment before reaching the target sample size because of recruitment challenges, that is, stopping for futility. The example presented earlier of stopping for futility (Jatoi et al. 17 ) reported only 1 woman in 9 enrollees. With early stopping for benefit or harm, rarely is the reader provided information by sex/gender on recruitment rates, loss to follow-up, or missing data. For example, if there are disparate rates of recruitment, with one sex/gender recruited faster than another, the cumulative enrollment patterns will resemble Figure 6A, whereas similar enrollment patterns will resemble Figure 6B.

Cumulative enrollment over time by disparate or similar rates of enrollment by any dichotomous baseline characteristic (e.g., by sex/gender).
In the case of both disparate enrollment rates and early stopping, the decision to stop will be based on proportionally more information from one sex/gender than the other. Similarly, if the missing data are disproportionately from one sex/gender, the analytic mechanisms employed to address missing data may be better suited to one sex/gender than the other. Usually, only those individuals conducting or monitoring the trial are familiar with this level of detailed information. The published information from the trial may be more relevant to one sex/gender than the other, but the reader is not sufficiently informed to make that judgment.
Given that the rationale for the sample size should have been provided at some point in the NIH funding process, the percent of publications providing that rationale is expected to be high. Although 78% of these publications overall provided a sample size rationale, it is not an aspect reported uniformly in the literature; that rationale was provided in much less than half of the publications reported by Sherer and Crawley 25 in the ophthalmology literature. Without the rationale, the target sample size, or the reason for ending the trial, the reader would not know if the study were stopped at a “random high.”
The NIH Revitalization Act of 1993 did not require specific proportions of women or men in a given trial or separate sample size calculations for women and men. As a result, secondary analyses of sex/gender-specific estimates or tests of sex/gender differences will have less precision or power than that projected for the primary analysis of the full study. Because of the limitations in power for tests of sex/gender differences, the aspects of research quality in design and conduct are even more important in assessing whether or not the research will permit valid analyses.
The proportion of women enrolled in a particular study was a function of the prevalence of the disease, disorder, or condition under investigation among women, as well as a function of the specific sites funded in a multicenter trial. For example, in the early years of the HIV/AIDS epidemic, the rate of known infections was much lower among women than men. This is reflected in the proportion of women enrolled (e.g., an early study 26 of primary prophylaxis against fungal infections in HIV-infected patients enrolled only 5% women).
A major concern in the assessment of validity of sex/gender-specific or sex/gender difference estimates is whether or not the study results have been influenced in some systematic fashion, that is, biased, leading to erroneous inferences. The selection of study participants, the process of assignment or allocation to treatment, and the individuals with knowledge of that assignment are the aspects of study conduct that can impact potential bias. An example of selection bias is the study of those who successfully completed an earlier trial, which is one reason for excluding follow-on studies from this review.
The available approaches to blinding have been discussed by numerous authors, including the report of Boutron et al., 27 which has detailed descriptions in the online appendices. Poor allocation concealment is one of the seven potential sources of bias identified by Lewis and Warlow. 28 The changes in clinical trial infrastructure in recent years make the use of automated random allocation methods more feasible. Careful attention to the published description of the randomization process, blinding, and allocation concealment is essential to assessing the validity of any analyses from a randomized trial. Randomization, without stratification, is sufficient to ensure balance of the sex/gender distribution in each treatment group for sample sizes >100. A large study of homocysteine to prevent recurrent ischemic infarction 29 used treatment allocation stratified by sex/gender (and by clinic and age), which further ensured the balance of 37% women assigned to each treatment group. Balance of body mass index (BMI) (mean of 28.1 kg/m2 for men and 28.6 kg/m2 for women) across treatment groups, as well as numerous other measured and unmeasured factors, was assured by the randomization process. This study 29 also introduced, 5 months into the enrollment period, sex/gender-specific homocysteine eligibility criteria. Of those eligible, 80% of the men enrolled, but only 75% of the women enrolled. The authors did not address this difference, explore distributions by site or other parameters, explore possible causes, or suggest solutions for future studies. They did report finding no differential treatment effect by sex/gender.
Another reason to consider stratified randomization is to establish a priori the important subgroups for later analyses. In the overall review of publications, very few of the studies stratified by sex/gender at treatment allocation reported sex/gender-specific results, suggesting that the stratification was employed primarily for balance at baseline. Stratified randomization is not required for the sex/gender-specific analysis results to be valid.
An essential step in conducting valid analyses of differential intervention effects is to estimate the effect in each subgroup. NIH-funded phase 3 clinical research studies are not identified within the NIH Tracking and Inclusion System by whether primary results are provided separately by sex/gender. This information is available only through review of the published results. The subgroup analysis approach considers which subgroups show evidence of greater (or lesser) treatment effect. The covariate analysis adjusts the estimate of treatment effect for potential differences in effect within the covariate groups, or testing of treatment-covariate interactions. As Byar 30 urged, when these are exploratory analyses, the results should be reported skeptically as hypotheses to be directly investigated in subsequent studies. Russek-Cohen and Simon 4 offer a novel, adaptive design to consider in two stages, both sex/gender by treatment interactions and estimate sex/gender-specific treatment effects.
Any discussion of an observed sex/gender differences should address the original design of the trial, including specific power to detect such differences (Fig. 7). When sex/gender difference analyses are exploratory, the range of possible explanations for the observed difference should be discussed. A good example of such a presentation is in the publication of results from a lung cancer chemotherapy trial 31 :

Hazard ratios for death according to the subgroup analysis [Figure 3. From Sandler et al.31 Reprinted with permission. ©Massachusetts Medical Society.]
Exploratory analyses of the treatment groups according to baseline characteristics showed that bevacizumab was beneficial in all the subgroups assessed, with the possible exception of survival among women.…Possible explanations for this finding include imbalances between the two groups with respect to known or unknown prognostic factors, imbalances in the use of second- and third-line therapies, statistical chance, or a true sex-based difference.
Availability of data for further analyses
The recent emphasis on making data from NIH-funded phase 3 clinical research studies widely available prompted the tabulation within this review of that reported availability. Although none of the publications reviewed indicated the availability of a public use dataset, such datasets may exist. For example, the Combining Medications and Behavioral Interventions (COMBINE) study
32
reports on its website (
As suggested by Rochon et al., 35 the reporting of sex/gender information from clinical trials should include data presented separately by sex/gender to allow for meta-analyses and the analysis of treatment by sex/gender interactions. Presentation of those data would lead into discussions within the publication of differences in responses by sex/gender. They also suggest that reports of clinical trials should include in their discussions sex/gender-related limits to which the results can be generalized, that is, external validity. Too few publications included quantitative estimates of sex/gender-specific effect measures to assess further the validity of the estimates. In many cases, when there were no observed sex/gender differences, a sentence in the text reported that observation with no additional effect estimates. This practice not only limits the information provided the reader but also hampers any later attempts at sex/gender-specific estimates from meta-analyses. Sex/gender-stratified results on efficacy and adverse effects are scarce, 36 limiting the development of sex/gender-specific evidence-based guidance. 37,38 Because of 1993 NIH Revitalization Act, those sex/gender-specific data are being captured but not yet fully reported.
Conclusions
The impact of the 1993 NIH Revitalization Act clearly has been to increase the enrollment of women in clinical research, particularly in phase 3 clinical trials. That inclusion, however, has not led to detailed reporting of intervention effects in women, nor has it contributed to the extent possible to the available evidence indicating whether there is or is not a “significant difference of clinical or public health importance between males and females in relation to the study variables.” 3 Those limitations in the reported trial results in women continue to constrain inferences specific to women and to constrain the design of future research involving both women and men.
The publications of NIH-funded phase 3 clinical trials are of sufficient quality based on the published details of the trials to produce valid estimates of sex/gender-specific outcomes or of sex/gender differences. The information needed to evaluate the quality of the research, however, was not necessarily reported with as much detail as possible. Much information on the trial design and conduct is not reported, even though that information was captured during the course of the trial. To date, very few NIH-funded phase 3 clinical trials have final datasets available to other researchers, limiting the use of those datasets for planning future research or for IPD meta-analyses. With the current emphases on registration of clinical trials, publishing results with supplemental material accessible online, and archived datasets for reanalysis by independent researchers, the information on sex/gender-specific outcomes or sex/gender differences may become more widely available.
Clinical research intended to inform treatment or prevention decisions for both women and men depends on well-chosen study designs that maximize efficiency, careful conduct with attention to the ultimate inferences, and dissemination of results using all available channels. Just as advances in inclusion of women in clinical research required attention and efforts far beyond the investigators conducting the studies, reporting of sex/gender-specific results will require that consumers of the research results demand that more of the available information be disseminated to them.
For a complete list of the publications included in this review, see
Footnotes
Acknowledgments
This work was completed while the author was serving as a Visiting Scientist with the NIH Office of Research on Women's Health, under an InterAgency Personnel Agreement. The author is grateful for the insights from many discussions with my colleagues at the ORWH, particularly Dr. Vivian W. Pinn who supported and encouraged this review, and Ms. Angela Bates, who provided valuable historical perspective. Dr. Sarah E. Fowler provided substantive comments on an earlier draft of this article. The following people contributed to the abstraction from published clinical research reports: Martha Barnes, David Contois, Dawn Corbett, Nida Corry, Clarissa Douglas, Francine Hill, Indira Jevaji, Shahnaz Khan, Wlodek Lopaczynski, Castilla McNamara, Dennis Mangan, Lynn Morin, Joan Nagel, Joanne Odenkirchen, Emilee Pressman, Karen Salomon, Dorothy Sanders, Shamala Sriniva, Tiina Urv, Charles Wells, Elyse Wiszneauckas, Kim Witherspoon, and Diane Yerg.
Disclosure Statement
The author has no conflict of interest to report.
