Abstract
A sample frame is the listing of the units from which a sample is to be selected. When deciding upon a file to serve as a source for a sample frame for a survey, perhaps the most important consideration is the extent to which the target population will be covered by the frame. However, other issues also come into play such as the accuracy of contact and other information appearing on the file as well as its cost. The American Medical Association Masterfile has long been considered the preferred choice for surveys of physicians, although it does have drawbacks. Here we consider two alternative files, discussing their relative advantages and disadvantages. For surveys of physician practices (or other organizations that employ physicians), there have been no files that are obvious choices to serve as the basis for a sample frame. Here we discuss data collection on physician practices from an analytic perspective and consider how sampling physicians to obtain practice level data may be a desirable approach.
Keywords
The primary focus of this article is on the use of physician listings as the source of sample frames for surveys of physicians or physician practices (with possible applications to other organizations that employ physicians). The establishment of such sample frames was a major topic of discussion at the workshop convened by the National Cancer Institute (NCI) in November 2010, where many methodological issues related to large-scale surveys of physicians and medical groups were addressed (see Klabunde et al., 2012).
For the sample selection of physicians, the American Medical Association (AMA) Masterfile of Physicians has long been regarded as the “gold standard” as a source for sample frames because of the high level of coverage of physician target populations it can be expected to provide. However, many researchers have found disadvantages to its use, mainly due to out-of-date information. Two alternative listings of physicians were considered at the workshop and will be discussed in some detail here. The relative merits of the three files will be evaluated as well as aspects about which there is some uncertainty. In addition, there will be a discussion of some things to consider when developing a sample design for a survey of physicians.
Another source of concern discussed at the workshop was the lack of files that might serve as the source of a sample frame for surveying medical groups (physician practices, clinics, etc.). Establishing such a frame can be costly and ensuring high coverage of a target population of medical groups is challenging to say the least. In this article, an alternative approach is proposed, discussed in the context of surveys of physician practices: practices are selected by sampling physicians and then obtaining survey data about or from the practice in which they work. Estimation and sample design strategies are addressed, assessing estimation strategies at both the physician and practice level.
This article has three main sections. The first focuses on the establishment of a sample frame for selecting a sample of physicians. A discussion of the elements that constitute a useful sample frame and related issues precedes consideration of three physician files that could be used as the source of a frame. The second discusses some additional issues to consider in the development of a sample design for a survey of physicians. The third considers sample design, analysis, and related issues for a study where the focus is on the practice for which a physician works. The usefulness of physician level estimates associated with the practices in which they work as well as practice level estimates is considered in terms of issues related to sample design, analysis, and survey operations.
Establishing a Sample Frame for a Survey of Physicians
The Definition of a Sample Frame and Related Issues
A sample frame is generally thought of as a file from which a sample is selected. The file may be listings that are electronic, paper, file cards, and so on. For physician studies, lists of physicians often serve as the frame.
However, Kish (1965a) points out that a sample frame may also include “procedures that can account for all the sampling units without the physical effort of actually listing them.” For instance, the National Ambulatory Medical Care Survey (NAMCS) is carried out annually by the National Center for Health Statistics (NCHS) and through 2011 has been an area probability sample of physicians, sampling areas nationwide and then physicians within the sampled areas. With an area probability sample, to paraphrase an additional point made by Kish, the frame can be viewed as a set of maps but need not be constructed by mapping out the whole target population. Sometimes, if there is evidence that listings available at a national level provide lower than desired coverage of the target population for a nationwide study, an area population sample is selected and available listings are supplemented with information obtained within sampled areas, expanding the sample frame beyond the original listings. (As an aside, beginning in 2012, the NAMCS sample design dropped the area sample design, becoming a list sample from listings of physicians and community health centers (see Hing & Shimizu, 2012).
As can be seen from the above discussion, there can be many different issues to consider in developing a sample frame as part of the sample design for a study. The focus of this article is on surveys of physicians and physician practices as well as listings that can serve as the basis of sample frames for the sample selection of physicians.
We will first consider what an ideal sample frame might look like, building on observations made by Kish (1965a) for dealing with frame problems and sampling from imperfect frames.
Features of an Ideal Sample Frame
There are a number of desirable features for an ideal sample frame of physicians. We will focus on the following: Complete coverage of the target population under study is provided. All units on the frame are eligible for the study. Each unit on the frame is represented exactly once (no duplication). All contact information is accurate. Useful auxiliary information for sampling and weighting is available on the frame. The cost of the file from which the frame is constructed is low.
Coverage
If a sample frame is incomplete (i.e., missing members of the target population(s) for a study), the frame undercovers the target population(s), raising concerns about the potential for bias in resulting study estimates. If study estimates of analytic interest include totals, bias due to undercoverage is of particular concern since resulting estimated totals can be expected to understate the actual totals. If estimates of interest include population means and proportions, bias due to undercoverage can arise to the extent that those not on the frame differ from those that are. This concern about bias depends on both the extent of such differences and the percentage of the target population not appearing on the frame.
Suppose, for example, a listing of those practicing family medicine (FM) contains 80% of all FMs, 85% of FMs in health maintenance organizations (HMOs), and 92% of FMs are office based. Questions that might be posed in evaluating the use of such a listing as the basis for a sample frame include “What is the target population for the study at hand?” and “What is the expected impact of not fully including certain elements of the population under study?”
Note that steps that are undertaken to reduce survey costs can result in undercoverage. A file taken in its entirety may provide complete or virtually complete coverage of a target population of physicians. However, it may also include records for many physicians who are not members of the target population. To screen out ineligible physicians in the field can be costly, and it may appear desirable to use auxiliary information available on a frame to eliminate expected ineligible physicians prior to sample selection, relying on the accuracy of the information on the sample frame. For example, suppose a file contains a flag indicating whether or not a physician is office based and a study is focused solely on office-based physicians. If the office-based information is out of date to some degree, two prices will be paid. First, some physicians sampled will turn out not to be office based, adding to screening costs but not bias concerns. However, and more importantly from a bias perspective, some physicians dropped from the sample frame because they were not flagged as office based, may actually be office based, and thus members of the target population for the study. Such an exclusion raises the potential for biased estimates. A trade-off to be evaluated then, in developing the sample design, is the expected cost savings obtained through excluding specified records from the frame versus the increase in bias potential resulting from the exclusion. If it is known that coverage will only be slightly reduced by such exclusions, the expected cost savings might be worth a possible concern about potentially increasing bias. However, if the reduction in coverage is large or there is some degree of uncertainty about the extent of the reduction, lower costs may come at the price of study estimates of uncertain quality.
Whenever exclusions of sample frame records are contemplated based on data appearing on a sample frame, consideration should be given to the expected accuracy of the data being used for exclusionary purposes. Some information may be considered more reliable than other. For instance, date of birth might be considered highly likely to be accurate while geographic location of work place (office, hospital, etc.) less so.
Survey Eligibility
In addition to high coverage, it would, of course, be nice if the only records on the sample frame for a study were members of the target population for that study. For most studies of physicians or physician practices, unfortunately, this will not be the case. Studies generally focus on particular subpopulations of the full population physicians, so it is important to consider the precise definition of the target population when choosing between possible sample frames. Moreover, physicians may retire, move from an office-based practice to a hospital as place of employment or vice versa, or otherwise change their eligibility for a study.
Concerns here include cost, bias, and analytic objectives. The cost issues include screening out sampled physicians who are not of analytic interest to the study (not members of the target population) and developing a set of screening questions that successfully distinguish between members and nonmembers. Issues of bias can arise if the screener is less than successful, perhaps due to issues of ambiguity or complexity. Members of the target population may be inadvertently excluded from study participation and/or nonmembers included. Modifications to analytic objectives can sometimes help to reduce the potential for bias or survey costs. This might involve altering the definition of the targeted population somewhat, for instance.
If the sample frame contains both the telephone number and the address of physicians, one approach to attempt to increase the collection of accurate eligibility information in an efficient fashion is to call the telephone number and collect from the physician’s office only the screening information highly likely to be accurate (e.g., whether or not the physician is still working at the location or if she or he has died or retired). Other potential eligibility related information such as number of hours per week spent seeing patients or, in some cases, perhaps even specialty might be only definitively obtained from the sampled physician. Such an approach would help avoid coverage reduction resulting from inappropriately screening out physicians eligible for the study while both eliminating ineligible physicians from further consideration and identifying physicians who need to be traced.
Duplication on the Sample Frame
Regardless of the source file for a sample frame, there is always the possibility of multiple records representing the same person, physician, practice, business, and so on. As a result, it is generally useful to attempt to identify the presence of duplicates prior to sample selection. Depending on the file, there may be more or less information available for unduplication purposes. For lists of physicians, name and contact information are routinely available, and there may be other variables such as telephone number, date of birth, age, gender, and so on, which may prove helpful. Exact matching is feasible with variables such as telephone number, date of birth, and age. Because of varying ways of storing name and address on an electronic database, exact matching may be somewhat problematic, although there exists software for standardizing such information (e.g., converting the alphabetic strings “Street,” “St.,” “St,” etc., into a single alphabetic string representing the word “street”). There also exists software for undertaking “probabilistic matching” where, after providing input on the relative importance to be placed on the successful matching of various variable values (including character strings), an overall value associated with the probability that two records represent the same person or entity is produced. The user also specifies a “threshold” value to which this overall value is compared. If the threshold value is exceeded, a match is considered to have been found.
The identification of people or entities appearing multiple times is particularly important if multiple files are to be used to create one or more sample frames for a given study. When and how to do this can vary. If a single sample frame is to be constructed, one unduplication strategy is to undertake an initial unduplication effort to limit the number of sampled records with multiple chances of selection. Then additional information can be obtained from respondents, which may help further identify those who did appear multiple times on the sample frame despite the initial unduplication. If multiple frames are used for sample selection, it may be more efficient to unduplicate within each frame separately prior to sample selection and then focus on determining which of the respondents had multiple chances of selection taking advantage of survey and frame data. A challenge with a multiple frame approach is often the accurate assessment of the degree of overlap between frames. Variables that lend themselves to exact matching (e.g., date of birth) may not be available, adding a degree of uncertainty in the determination of the extent of overlap. When accurate assessments of overlap can be made, sample estimates can be developed in a number of different ways (see Lavallée, 2007; Lohr, 2011).
A multiple sample frame design, involving probabilistic matching to identify duplicates, was used for the 2008 National Sample Survey of Registered Nurses (NSSRN) conducted by the Health Resources and Services Administration (HRSA; see Fritz, DiGaetano, Green, & Clark, 2010). Multiple state-based sample frames of registered nurses were created from lists of those licensed as registered nurses in the 50 states and the District of Columbia. Date of birth was provided on the listings of 43 states as well as the District of Columbia, and state(s) of licensure was collected from survey respondents. The availability of these data helped to substantially increase the accuracy of the assessment of overlap. Sample estimates for the NSSRN were developed using a generalized weight-sharing approach (Kalton & Brick, 1995; Lavallée, 2007; Lohr, 2010).
The Accuracy of Contact Information
Inaccurate contact information is largely an issue of cost, although concerns about bias may arise as well. Many surveys of physicians or physician practices are conducted entirely or in part by telephone or mail. For instance, a telephone screener may be used, where possible, to gather some basic eligibility information to be followed by the survey questionnaire that is administered through the mail. If telephone numbers and/or contact addresses are out of date, a tracing effort is needed, adding to costs. If a sampled physician has died or retired, that can often be determined through initial screening contacts and thus the physician’s ineligibility for the survey (not a member of the target population) will have been ascertained. However, if neither the physician’s location nor his or her eligibility status can be ascertained, the physician is a survey nonrespondent, potentially contributing to bias.
Auxiliary Information Available on the Sample Frame
Information auxiliary to the basic identification information (name and contact information) can be extremely useful for both sampling and weighting purposes for a survey. For instance, physician specialty is commonly used to form explicit sample strata within which samples of physicians are selected. Also, geographic or demographic variables such as ZIP code, date of birth (or age), gender, and race/ethnicity can be used for sorting purposes, achieving an implicit stratification that can help ensure a proportional sample allocation within sample strata across the variables used in the sort. Such variables can also be used in the formation of cells to be used for adjusting sample weights to account for nonresponse since they are known at the time of sample selection and thus can be used to characterize both respondents and nonrespondents.
Measures of size are often useful to have on sample frames for sampling purposes. For example, it might be of interest to stratify physicians by the size of the organization for which they work, perhaps to oversample (sample at a disproportionately high rate) those who work in large practices. Such size measures can also be used for probability proportionate to size (pps) sampling (see Cochran, 1977; Kish, 1965a). If the number of physicians in a practice were available on a sample frame of practices, this would permit larger practices to be selected with increased probability.
Of course, to the extent that the auxiliary data are inaccurate, costs, bias, and/or variance may be affected, depending on how the data have been used. Suppose the auxiliary data variable “physician specialty” has been used to form strata, and some physicians classified as FM were really general practitioners (GPs) and vice versa, but 100% of all FMs and GPs were classified as one or the other. If samples are selected from both FMs and GPs and standard survey sample weights (reflecting the probability of selection of each physician) are used for estimation purposes, the misclassification of some physicians does not result in any bias concerns because all FMs and all GPs had a known probability of selection. The penalty paid for such misclassification is reduced precision and power due to variation in the sampling rates for members of the two specialties. If samples were selected only from those identified as FMs, then the FM population would be undercovered because those misclassified as GPs would have no chance of selection for the study, introducing the potential for bias. If, after evaluation, the level of misclassification appears to be small, the potential for bias may not be a critical concern.
Costs can be an issue if the extent of misclassification is such that sample sizes and corresponding precision levels are lower than targeted and the sample has to be supplemented.
Costs Associated With Use of a Particular File in the Development and Use of a Sample Frame
There are several components of cost associated with the development and use of a sample frame. Of course, there is the initial cost of procuring a file or set of files from which the sample frame is to be established. Some file vendors are prepared to handle the cost of physician sampling in two steps: first, sample selection from a full file with the information needed for sampling available but the name and contact information excluded; then, after sampling, procuring the name and contact information for only the sampled records. Others may not have such a cost structure established in which case it would have to be arranged. One file possibly suitable for sample selection purposes is actually free and can be downloaded from the Internet.
A second cost is the development of the sample frame. This involves putting the file or files in a form suitable for sampling purposes: unduplication; editing the file(s); identifying and dealing with missing data or other potentially problematic records; exclusion of records deemed “not eligible for sample selection purposes”; etc.
Another cost, although not monetary in nature, involves the absence or limited availability of auxiliary data for sampling or weighting purposes. Such information can help reduce the potential for bias or permit oversampling of some subgroups of analytic interest.
A final cost is the price paid due to inaccurate or inadequate data appearing on the sample frame. As discussed earlier, out-of-date contact information results in the need for tracing, adding to survey costs. It can also serve as a source of potential bias due to nonresponse (if a sampled physician cannot be located and his or her eligibility status cannot be ascertained) or undercoverage (if dropped from the sample frame due to inaccurate auxiliary information). Inaccurate data can also result in undesired assignments to sample strata, increasing the variability of sample estimates. If targeted sample sizes are not met in part due to poor quality contact or classification data and thus supplementation of the sample is required, additional cost increases will, of course, be incurred.
Files That Could Serve as a Sample Frame for a Survey of Physicians
The second section dealt with factors to consider in choosing between files that may serve as the source of a sample frame. In the third section we consider these factors in assessing alternatives that might serve for studies of physicians.
There are many such files. We focus here on three files that provide records for many different specialties and are national in scope. These are the AMA Masterfile of physicians; the “AMI” file, the American Medical Information (AMI) file maintained by InfoUSA; and the “NPI” file, a file of individual health care providers (including physicians, nurses, dentists, etc.) maintained by the Centers for Medicare and Medicaid Services (CMS) and containing all individual providers who have a National Provider Identifier (NPI ID number).
Examining Counts of Physicians by Specialty Across the Various Files
We will first examine Table 1, where counts of physician records are found. The counts for the AMA and NPI files are from October 2012, while those for the AMI file are from January 2013. A number of different specialties that may be of interest for research purposes are provided based on many of the specialties for which NAMCS collects data. The counts are provided for the specialties (referred to as taxonomies on the NPI file) based on the file descriptors and do not include subspecialties. For instance, the numbers associated with Internal medicine are those for records with the specific description “internal medicine.” Counts, for instance, of those records described as “Internal Medicine—Gastroenterology” have not been included for internal medicine. The counts for the AMA file were obtained from the Medical Marketing Service (MMS) website, a vendor of the AMA Physician Masterfile. The counts for the AMI file were obtained through a series of interactions on the AMI website. The counts for the NPI file were obtained after downloading the entire NPI file of individual providers and examining a frequency distribution of the taxonomies provided.
Distribution of Physicians by Specialty: AMA, NPI, and AMI Files.
Both the AMA and AMI files offer counts by the total number of physicians in a given specialty as well as those flagged as “office based” on the files. There is no office-based flag on the NPI file. Of course, the accuracy of such flags is something that requires evaluation in order to determine its usefulness for sampling purposes.
The AMI counts in Table 1 are presented in two sets. One set represents all records provided for each of the physicians in each specialty as well all records classified as office based. Multiple records per physicians may appear as many physicians work in multiple offices. The second set consists of “unduplicated” counts also provided for all physicians in specialty and those who were office based. These unduplicated counts were obtained by checking a box indicating “Remove Duplicates by First Name & Last Name” found under the “Office Information/Office Size” capability for refining counts. The extent to which this fully unduplicates the AMI file is uncertain.
In addition to the counts, some ratios between the counts of different files are provided. The ratios involving AMI counts are for those counts after unduplication. Percentages of office-based physicians for the AMA and AMI files are also provided.
To briefly step through part of the table data, we will focus on physicians classified as “family practice/family medicine” (descriptors can vary by file). The number of records representing individual family practice/FM physicians on the AMA file was 100,801, while it was 109,554 on the NPI file, so the ratio of NPI to AMA counts was 1.09. Of those appearing on the AMA file, 79,137 were flagged as office based (other possible classifications include “administration,” “medical teaching,” “hospital staff,” and “research”). For the AMI file, the counts of records associated with those in family practice/FM are 100,916 among all such physicians of which 92,721 were classified as office based. After AMI’s unduplication effort the counts are 88,738 and 82,172, respectively. The ratio of the AMI unduplicated counts to the AMA count is 0.88 for all physicians in the specialty and 1.04 for those classified as office based. The percentage of office-based records was 78.5 for the AMA file and 92.6 among the “unduplicated” records on the AMI file.
It is of interest to consider the ratios of counts between files across the range of specialties. The ratios of the NPI to the AMA counts are generally close, ranging from about 0.8 to 1.1 except for Oncology and Hematology/Oncology, where the ratios are 0.57 and 1.23, respectively. These departures may indicate differences in classifications for the two files. If these two specialty categories were combined, the resulting ratio would be 0.96. This emphasizes the need for researchers to carefully consider what set of specialty categories found on a file are likely to provide high coverage of the specialty or specialties of interest.
The ratios of AMI to AMA counts are generally lower than one for the full set of physicians in a specialty but generally higher than one for those flagged as office based. As we will discuss in more detail below, the coverage of the population of physicians is expected to be higher for the AMA file than the AMI file which could account for the AMA counts generally being higher than those for the AMI file. However, the AMI file is kept more up to date, so it is plausible that the AMI records better reflect office-based status than the AMA records. In addition, the AMI and AMA files may differ in how specialties are identified on the file, and the AMI unduplication effort may not be fully successful in removing all duplicates. By design the complete AMI file includes duplicates, while the AMA file does not.
The AMA File
The AMA file has long been regarded as the gold standard for establishing sample frames for physicians. Records on the file are established for students entering accredited medical schools in the United States, and provision is made to include physicians from foreign countries who practice in the United States as well. As a result, the AMA file is expected to provide very high coverage of physicians practicing in the United States.
However, a major issue with using the AMA Masterfile is out-of-date information. This includes contact information (address and telephone number) and whether or not a physician works in an office-based setting. For instance, consider survey disposition data from the NAMCS, a federal survey of physicians in office-based practices and carried out annually across the country. The sample of doctors of medicine (MDs) for the NAMCS is selected from among those physicians flagged as office based on the AMA file (the AMA file includes doctors of osteopathy (DOs) as well as MDs to help improve coverage the NAMCS uses as part of its frame records from an American Osteopathic Association file). The target population for the NAMCS (those in scope) can basically be described as physicians who are office based, principally engaged in patient care activities, and nonfederally employed. Virtually all specialties are eligible for the survey (only anesthesiologists, pathologists, and radiologists are not). From the 2010 NAMCS Micro-Data File Documentation available on the NAMCS website (Centers for Disease Control and Prevention, 2010), across all sampled records about 32% were classified as out of scope. For the three largest primary care physician (PCP) specialties (pediatrics, internal medicine [IM], and general and family practice [pooled into one stratum for NAMCS sampling purposes]) the out-of-scope percentage was about 37%. The 2010 NAMCS documentation further indicates that, “The most frequent reasons for being out of scope were that the physician was employed in a hospital emergency department, outpatient department, retired or employed in an institutional setting.”
Thus, using the AMA Masterfile can be costly in terms of fielding physicians who are not eligible for a study. For the NAMCS, those classified as office based on the AMA file often turned out to be working in an emergency room (ER) or outpatient department. Additional costs are incurred due to the need to trace a sampled physician whose contact information is out of date.
Another disadvantage of using the AMA file is that physicians are provided with an opportunity to avoid being contacted and are flagged as such on the file. Thus, using the AMA file means that nonresponse is incurred before the survey has even been fielded. The percentage that opt out can vary by specialty. For some surveys of PCP specialties in the past, “no contact” percentages have ranged from 1% to 5%. There are also records classified as “undeliverable.” Some do represent currently active physicians, so excluding them from fielding adds to overall nonresponse. Unfortunately, when fielded, physicians characterized as “undeliverable” can generally be expected to have lower response rates and higher costs due to the need to undertake tracing.
Those data items found on the AMA Masterfile that are invariant such as date of birth, state of residency, year of medical school graduation, and gender may be useful for sampling or adjusting sample weights for nonresponse. For instance, propensity to respond to a survey often varies by age group, so knowing the age of both respondents and nonrespondents can be useful for such weight adjustments.
At least one vendor of the AMA file (MMS) is prepared to support survey research of physicians in the sense that its cost structure recognizes that the researcher does not need identifying information for all members of the file serving as the sample frame. The vendor will provide the full file of physicians without contact information for sampling purposes for one set of costs per record and then provide contact information for the sampled records using a different set of costs.
The AMI File
The AMI database is maintained by InfoUSA. The AMI website indicates that it currently (early 2013) has records of roughly 575,000 physicians. Available auxiliary variables include specialty, state of licensure, medical school attended, office size, age, gender, as well as others. The website indicates that the database is updated monthly.
The database has been generated based on listings from the Yellow Pages and the Business White Pages directories across the country. They are supplemented with information from trade publications, public records, and professional directories. All records have been telephone verified. Based on conversations with AMI representatives, all records are to be verified annually.
There are some concerns with respect to the use of the AMI database. First, there is the issue of undercoverage. Some physicians work in settings where their phone number may not appear in phone directories and thus may not be included in the AMI listings. Undercovered physicians may include those working for HMOs, hospitals, outpatient clinics, and ERs. Thus, for surveys of PCPs, a source of concern to focus on would be those in HMOs. There may also be coverage issues for physicians who do not see patients (e.g., those engaged full time in research, teaching, or administration). This is speculation based on how the AMI database is constructed.
Another issue to be aware of was previously mentioned, that of duplication. Some physicians work in multiple offices and thus may appear multiple times on the AMI listings. InfoUSA will provide data users with an unduplicated list if requested. The extent to which this unduplication process is successful is unknown. If an AMI file is used as a sample frame, using an AMI “unduplicated” file would permit the removal of additional duplicates. However, to protect against the possibility that the AMI unduplication process inadvertently removes some nonduplicates, one could request the full version of the AMI file for the physician specialties under study and carry out the unduplication process as part of the sample frame construction process. Perhaps a requested AMI file could provide a variable identifying the records that have been characterized as duplicates and the records to which they are linked to permit a full evaluation of the AMI unduplication effort. Having multiple records for some physicians might help for tracing purposes. For instance, removing duplicates from the sample frame but retaining a separate file of these duplicates to help locate a sampled physician could prove useful.
Another thing to note about the AMI file is that InfoUSA is not generally structured to accommodate survey sampling as MMS is (described previously). A researcher planning to use the AMI file would want to work with InfoUSA staff to set up a price structure for sampling and obtaining contact information.
Several years ago, Westat undertook a comparison of the AMI and AMA files for the NCI. The report was unpublished but was alluded to in Klabunde et al. (2012). The evaluation was limited to three primary care specialties working in office-based settings: FM (also known as family practice), IM, and obstetrics and gynecology (Ob/Gyn). The evaluation did provide some evidence that at that time the AMA file could be expected to provide higher coverage than the AMI file, but the extent of undercoverage could vary by specialty. Rates of ineligibility and inaccurate contact information (requiring tracing) were higher for the AMA file as well, serving to add to survey costs.
The NPI File
Some Background
It might be helpful to have some understanding of how NPIs are assigned. The information provided here is based on reading documentation available through the Internet and a number of communications with CMS staff familiar with requirements associated with obtaining NPIs. A useful document with formal, official information about NPIs is “The National Provider Identifier (NPI): What You Need to Know” (U.S. Department of Health and Human Services, 2012).
The purpose of providing this information is basically to indicate what is available and why one of the NPI files is a reasonable candidate to consider for a survey of physicians. It is possible that some of the details presented here are incomplete or could be more accurately portrayed but that should not affect decisions related to choosing files for sample frame use.
The CMS maintains the National Plan and Provider Enumeration System (NPPES). The NPPES collects and maintains identifying information on health care providers and assigns each provider in the system a unique ID known as the National Provider Identifier or NPI.
There are two “NPI” databases, one at the individual level (Entity Type 1) and one at the organization level (Entity Type 2). As described in the document referenced subsequently, “Entity Type 1” providers are individual providers who render health care (e.g., physicians, dentists, and nurses). Sole proprietors and sole proprietorships are classified as Entity Type 1 (individual providers). Organization health care providers (e.g., hospitals, home health agencies, ambulance companies) are considered “Entity Type 2 (organization) providers.” No attempt is made to maintain a link between the two databases. The essence of the detailed discussion that follows is the organizations are “HIPAA covered entities” required to get NPIs and keep their NPI record information up-to-date (“HIPAA” is the acronym for the “Health Insurance Portability and Accountability Act”); most individuals are not such “covered entities” and, depending on a number of factors, may or may not be required to obtain an NPI and are not required to keep their NPI record information updated.
All Health Care Providers who conduct HIPAA standard transactions (e.g., claims and eligibility inquiries) are HIPAA covered entities and are required to obtain an NPI. By definition, a HIPAA standard transaction is electronic (rather than paper, fax, or phone). Thus, the vast majority of health care organizations (e.g., hospitals, practices, etc.) are required to obtain an NPI. All physicians are eligible to obtain an NPI, but, unless they themselves conduct HIPAA standard transactions under the provisions of HIPAA they are not required to obtain an NPI. That said, it is noted that the Affordable Care Act (ACA) has required all Medicare and Medicaid providers to obtain NPIs if they are eligible for NPIs. This would be expected to help enhance the number of physicians who obtain NPIs. From an examination of counts of physicians by specialty, evaluated through comparison with the AMA Masterfile of Physicians as was done in Table 1, it appears that most physicians in many specialties of interest have obtained NPIs.
An example might help clarify the understanding of these concepts. Suppose Dr. Jones works in a group practice. She is not involved in sending claims to a health plan, and she does not check on patient eligibility with a health plan. That is, she is not engaged in HIPAA standard transactions as described previously. The group practice does send claims to a health plan (this may be with or without the use of a business associate to do so, such as a billing company). Also, the group practice (i.e., the administrative staff) checks patient eligibility. Since Dr. Jones is not conducting either of those transactions (the group practice is conducting them), she is not a covered entity under HIPAA. Thus, Dr. Jones would be “eligible” to obtain an NPI but not necessarily required to. However, if Dr. Jones were a sole proprietor and the sole proprietorship was sending claims to health plans and/or conducting eligibility inquiries (billing company), Dr. Jones would be a covered entity and required to obtain an NPI.
Note that the “NPI Final Rule” does not prohibit Dr. Jones from obtaining an NPI. Rather, it encourages all health care providers to obtain an NPI. Moreover, health plans are free to require that all health care providers identified in all transactions conducted with them—HIPAA transactions or not—be identified by NPI and are not prohibited from requiring all of their enrolled health care providers to obtain NPIs (i.e., to include those like Dr. Jones to be enrolled in the health plan just like the group practice that employs her would be enrolled in the health plan). This goes for organizational health care providers as well, not just those who are practitioners. As mentioned before, the ACA requires all Medicare and Medicaid providers to obtain NPIs (if they are eligible for NPIs) and to report them on all enrollment applications and in all claims.
In terms of updating information appearing on the NPI files, the NPI Final Rule places the updating requirement only on health care providers who are “covered entities” under HIPAA. Since most physicians are not covered entities, they are not required by regulation to keep their NPPES data current, although they are encouraged to do so.
To sum up, it appears that a very high percentage of physicians have applied for NPIs. There is no formal effort to keep their contact information up to date. There is no link provided in the NPPES system between the individual health care providers and the organizations for which they work.
Sample Frame Potential
There are two potential advantages and one real advantage to using the NPI file of individual providers as the basis for a sample frame for sampling physicians. These are: expected high coverage of the physician population; contact information might be expected to be more up to date than the AMA file, at least currently; and free.
The expectation of high coverage stems from the comparison of physician counts by specialty found on the NPI file with those found on the AMA file; and the ACA requirement that all Medicare and Medicaid health care providers are to obtain an NPI. Because physicians began obtaining NPIs relatively recently, it is not unreasonable to expect that current contact information is fairly up to date. However, since there is no requirement for most individual providers (sole proprietors would be an exception) to keep such information up to date, over time it may be that contact information will grow more and more out of date.
A major advantage of using the NPI “individual” file to establish a ple frame of physicians is that it is available free of charge. One need only. One need to only download it from the CMS website. Of course, it will be important to learn about the file and its structure as well as to undertake some routine steps that would be taken whenever establishing a sample frame. For a study of physicians in the United States, this would include determining the taxonomy (specialty) codes eligible for the study at hand (sometimes new ones are added over time), as well as removing records associated with foreign countries (there is available information to permit this).
As discussed earlier, comparison of the NPI file counts for the physician specialties appearing in Table 1 to those from the AMA Masterfile indicates roughly the same number of physicians are found on each. Given this and the purpose and nature of the NPI file, it seems reasonable to expect that coverage will be high for currently active physicians. It would also seem reasonable to expect that the contact information on the NPI file of individual providers (including physicians, nurses, etc.) is currently more up to date than that on the AMA files. This is mainly because the NPI file is relatively new and the AMA updating process appears to be limited in its effectiveness. Since most individual providers with NPIs are not required to keep their contact information up to date, this may not hold true at some point in the future.
There is one definite disadvantage to using the NPI file compared to the AMA or AMI files. The variables available for the purposes of nonresponse adjustments to sample weights and sample stratification are mostly based on contact information: address and telephone numbers. Perhaps it is feasible to geocode frame or sample addresses and extract data from other sources, such as Census files, to help augment such information. The AMA and AMI files both have information related to age and gender that could be useful for nonresponse adjustment or stratification purposes. Propensity to respond often is correlated with age.
It should also be noted that the accuracy of the specialty (taxonomy) codes on the NPI file is uncertain. CMS does not attempt to verify its accuracy, and, to the extent that specialty is inaccurate, there are potential issues of bias. For example, bias would be a concern if a nontrivial proportion of physicians in a given specialty are not included on a sample frame because their NPI taxonomy codes were not recognized as potentially associated with the specialty of interest for a given study. This could arise, for instance, for subspecialties that might appear to be out of scope for a given study but where the description was inadequate and the physician was actually in scope. Such misclassification could also arise for the AMA and AMI files, but they do include updating mechanisms. The closeness in counts between the AMA and NPI files for various specialties suggests this is not necessarily a major concern, but this is not known.
The NPI file may have coverage issues for physicians who do not see patients (those engaged full time in research, teaching, or administration, for example). As with the AMI file, this is speculation based on the nature and purpose of the NPI.
Summary
Unfortunately, there have been no comprehensive methodological studies comparing the three files, so there is some limitation to what can be said definitively. Two exceptions are: the NPI file is free while the AMA and AMI files are not; and the NPI file has far fewer auxiliary variables available for sample design and estimation purposes than the other two. However, while the auxiliary information on the AMA file may be useful for stratification or nonresponse adjustment purposes, its usefulness for the purposes of excluding records to reduce screening costs is uncertain. Many records classified as office based turn out not to be, increasing operational costs, and if a nontrivial proportion of records classified as “nonoffice based” actually represent office-based physicians, excluding them from the sample frame raises the potential for biased estimates. The accuracy of the AMI office-based classification would be expected to be better than that of the AMA. Even so, it is unknown whether it is reliable enough to use to eliminate records from a sample frame without raising concerns about bias.
It is reasonable to expect that the AMA and NPI files provide very high coverage of currently active physicians in the nation while the AMI would provide somewhat less—the degree of AMI undercoverage is unknown and may vary by specialty. On the other hand, the updating process for the AMI file appears to be superior to that employed for the AMA file, and there is no routine updating process for the individual providers with NPIs. Thus, the need to trace sampled physicians should be less and eligibility rates should be higher if the AMI file were to be used as the source of a sample frame than either of the other two files. Both of these factors would serve to reduce the cost of field operations.
The AMA file has built in nonresponse associated with the “no contacts” in the file that varies by specialty. It also has cases characterized as “undeliverable” that can be expected to have lower response rates and higher tracing costs.
Much is unknown about the NPI file related to the general accuracy of the contact information or the actual coverage it provides. However, studies are beginning to use the NPI as a sample frame, so some things will be learned in the not-too-distant future. If the AMA or AMI files are to serve as the source of a sample frame for a study, NPI information could still be used for tracing purposes. This can be done through a downloaded file or through visiting the associated website where individual names can be entered and contact information obtained. Consideration could also be given to using the NPI file to supplement an AMI-based sample frame to help improve coverage or to using a dual frame approach (see Lohr, 2011) to sampling physicians, perhaps employing the NPI file (which is free) with the AMI file (with its more accurate tracing information).
Another source of uncertainty is the extent to which InfoUSA would be interested in developing a cost model to accommodate survey research for those who would consider using the AMI file.
If a study is focused on all physicians with a medical degree, not just those seeing patients, the AMA file may be the best choice based on the methods used for establishing the three different files. For example, a study may be interested in learning the extent to which those with medical degrees are working as a physician. An uncertainty is the extent to which the information on the AMA file distinguishing between those who are office based, working in research, working as a teacher, and so on, is up to date and thus useful for stratification or screening purposes.
There are other files of physicians that could potentially serve as the source for a sample frame of physicians but that have not been evaluated here. One such file is maintained by Health Market Sciences and another by SK&A.
There are no clear-cut answers to what file might best suit a given study. Trade-offs between factors such as bias, cost, and analytic objectives may be called for in making a decision about which file would be preferable to serve as the source of the sample frame.
Some Sampling Issues for a Survey of Physicians
Some Observations on the Choice of Target Population(s)
One of the more important decisions to be undertaken in developing a study is determining precisely what the target population or populations are to be. This can have implications with respect to costs, variability (precision and power), bias, and analytic objectives. Again, trade-offs may have to be made, for instance, placing less focus on certain subgroups of some analytic interest in order to ensure that sufficient resources are available to permit the most important analytic goals to be addressed.
For example, suppose a study is being contemplated focusing on the physician specialties that generally provide primary care to adults, and the candidates for the specialties to include are GPs, FMs, IMs, and Ob/Gyn. If separate estimates of equal precision for each specialty are desired, then, assuming no design effects arising from clustering or variation in sampling rates, equal sample sizes for each specialty should be targeted. However, IMs and FMs represent the vast majority of PCPs, so if the sample is equally allocated to specialties, IMs and FMs will be considerably undersampled. Moreover, the role of Ob/Gyns as a primary caregiver is primarily focused on women who only see one doctor annually, while GPs are by far the smallest of the PCP specialties and these numbers will continue to diminish.
The decision about which specialties to include is ultimately driven by factors such as costs, the relative importance of various analytic objectives, and policy requirements. If a study is simply focused on PCPs generally, a proportional allocation of the sample across the PCP specialties may suffice. The size of the overall sample might be determined to provide estimates of adequate precision for IMs and FMs but not GPs and Ob/Gyns. If comparisons are to be made between specialties, some oversampling of the smaller specialties may be required unless pooling of specialties is considered sufficient (e.g., GPs might be pooled with FMs for comparison to IMs).
Analytic Objectives and Precision Requirements
A useful initial step when undertaking a study is to formally identify the study’s analytic objectives and precision requirements. For studies of physicians, this includes describing the physician specialties as well as the estimates and comparisons of interest. For instance, are comparisons between specialties or physician types (e.g., PCPs vs. specialists) contemplated? What types of estimates are of interest (e.g., totals, means, or proportions) and for what variables? In helping to make sample size determinations, it is necessary to consider the level of precision desired for the various estimates of interest. For estimated proportions, it might be specified that for an estimated proportion of .5, the corresponding standard error of the estimate should be no more than .03. Alternatively, this could be expressed as targeting a 95% confidence interval for an estimated proportion of .5 to be about ±6%. If comparisons between subgroups are of interest, one might indicate that the power to detect existing differences of .05 for estimated proportions in the range of .6 to .7 should be at least 70%.
Which Specialty Categories to Include: Concern About Misclassification
One issue to be aware of when determining what specialties to include on a sample frame is that physicians may be misclassified or assigned to a specialty category where it is not clear whether physicians assigned to that category are members of the specialty targeted for a study or not. If all physician records on a file are to be included on the sample frame, this does not pose a concern about bias. Responding physicians can be assigned for analytic purposes to the specialty that they identify as their own. To the extent that there is misclassification of specialties that have been sampled at different rates, the variance of corresponding sample estimates will increase.
Suppose that only PCPs in the specialty IM are of interest to a study. It is possible that some IMs who serve as PCPs have been assigned to an IM subcategory not ordinarily associated with primary care (e.g., IM–gastroenterology) or they may simply have been inadvertently assigned to a non-PCP specialty (e.g., surgery). There is no obvious answer on how best to address this issue. An approach that attempts to be safe in terms of maintaining high coverage while still somewhat cost efficient would be to include all specialty categories that are considered at least somewhat likely to contain a nontrivial percentage of IM PCPs. Of course, this will increase screening costs to the extent that physicians not of interest to the study have been sampled and have to be screened out. For researchers who plan to undertake a series of studies of physicians, one approach would be to be overly inclusive for an initial study. If it then turns out that few, if any, eligible for a study were found from a given specialty category, such a category could be excluded from future consideration. If such an approach were planned, care should be taken in documenting the number and reasons for ineligibility across the various specialty categories.
This same issue can arise for studies focused on specified geographic areas. Consider a study of physicians working in Maryland and Pennsylvania. Since contact information may be out of date, physicians of interest to the study who have begun working in either of these states will not be covered by the study if the state appearing in the contact information indicates a state other than the two of interest. One way of addressing this might be to include physicians who are found in nearby states (e.g., Virginia, D.C., and New Jersey) or counties in bordering states where some crossover might be expected. Geocoding the contact information to identify addresses in such counties might be helpful in this regard. This would not ensure full coverage of a geographic area but could be expected to enhance coverage to some degree. Again, trade-offs between additional costs and expectations of improved coverage would have to be evaluated in making such decisions.
Surveying Physician Practices
There are a number of challenging issues associated with surveying physician practices and other types of medical groups (see, e.g., Klabunde et al., 2012). The focus of the discussion here will be in terms of practices considering such issues as: the definition of a practice; the information to be gathered; the establishment of a sample frame of practices; and sampling physicians to learn about practices.
Definition of a Practice
A clear definition should be established of the target population of practices. This has several critical functions. The process of developing appropriate screening and main interview questions in order to determine survey eligibility is generally straightforward for a well-defined population. When relevant, the identification of the information to be captured from respondents in order to determine the probability of selection of a given practice is more readily determined. And, of course, similarities and differences with like research can be appropriately identified and evaluated.
In the not so distant past, it was common to define a practice in terms of a group of physicians with common medical and billing records (e.g., Gans, Kralewski, Hammons, & Dowd, 2005; Hing & Burt, 2007). However, recently hospitals and other health-care-related organizations have been purchasing physician practices, resulting in many essentially independent practices sharing the same billing service. Thus, currently, shared medical records might be a more promising approach for definitional purposes, but even there care must be taken in its application. For instance, National Public Radio (NPR) did a story called Hospitals Gamble on Urgent Care Clinics to Keep Patients Healthy (Gold, 2012). The lead in to the story involved the experience of a patient who called his PCP with a swollen finger. The doctor referred him to an urgent health care clinic owned by the same health care system that employed the doctor. The health care clinic was able to take advantage of having access to his medical records, being part of the same health care system, in treating his condition. Hence, if shared medical records are to be used as a key component of the definition of a physician practice, survey questions need to be carefully crafted to make sure there are no ambiguities about what constitutes a practice.
Once an appropriate definition at the practice level has been established, it is important that survey responses, when pertinent, cover the practice as a whole. Many practices have offices at multiple addresses, and physicians may work out of some but not all offices. For some survey topics it may seem desirable operationally to focus on a single office. The choice of office should be well defined, so that the definition of the target population is clear and probabilities of selection readily determined. For instance, an office can be randomly selected and corresponding estimates would be at the physician office level rather than the physician practice level. Since the number of offices varies by practice, this approach will add variation to sample estimates due to varying selection probabilities compared to a corresponding practice level estimate. Another approach would be to focus on a uniquely defined office within a practice such as the largest office of the practice in terms of patient load. Questions could be asked specifically about the largest office in a practice. This, of course, would mean that study estimates and inferences drawn from study data would directly pertain only to the population of “largest offices in practices.”
If a survey has questions at both the practice and office level, it may be the case that different respondents would be required for different aspects of the survey because of the knowledge required to answer them. This can be operationally challenging to implement and increases the opportunity for incurring nonresponse. Some related thoughts are discussed next.
Information to be Collected: What is to be Learned From the Study?
A key consideration when collecting data about practices is what information is desired and why. One might ask the following questions. First, can the data of interest be obtained from a sampled physician or is it critical to obtain the data from those who are particularly knowledgeable about certain characteristics of a practice (e.g., practice policies, finances, etc.)? Even if the data should be obtained from a “knowledgeable respondent” within a practice, are estimates desired and useful at the physician level or is the focus strictly on practice level characteristics? Some may find it surprising to note that estimates presented at the organization level can sometimes be inadvertently misleading. An anecdote provided by Kish (1965a) may help illustrate this: One of the frightening statements made about American education, around 1957, was that half of the high schools offered no physics, a quarter no chemistry, and a quarter no geometry. It was later noted that, although these backward schools were numerous indeed, they accounted for only two percent of all high school students. There were many more small schools than large ones, but the small proportion of large schools accounted for a large proportion of students. Moreover, the curricula and facilities of large and small schools can and do differ drastically. Hence, presenting average school characteristics gives a misleading picture of conditions facing the average student.
Sample Frame Issues
Let us consider sample design issues related to sample frames. Suppose there were a national sample frame of physician practices that provides 100% coverage of the entire target population of practices but no measure of size (e.g., the number of physicians working for a practice) to distinguish them. If all practices are to be selected with the same probability from such a frame (i.e., an equal probability sample of practices is to be selected), then the sample will proportionately reflect the distribution of practices, including size of practice. If most of the practices in the country are small, then most of the sampled practices will be small. As illustrated by the Kish and “Hing and Burt” examples above, concern with such an approach is that, in many instances, characteristics of smaller practices will differ substantially from those of larger practices and thus a misleading picture may result pertaining to the characteristics of practices where most physicians work. In many cases, estimates at the physician level may be of greater utility in guiding health care decisions.
If useful practice level frames exist for a given study and size of practice is not on a frame, estimates at the physician level can still be produced. For example, an equal probability sample of practices could be selected, the number of physicians in the practice in the specialties of interest could be obtained as part of data collection, and then estimates could be established at the physician rather than the practice level. Thus, for example, an estimated proportion of physicians who work in practices with at least eight physicians could be created as the ratio of the estimated number of physicians in such practices to the estimate of all physicians in the target population of practices. To the extent that larger practices are undersampled with an equal probability sample of practices, such a sample is less efficient than one of physicians for such estimates. If size of practice is available on a practice level frame, pps sampling would permit the same estimate to be produced, but the precision of such an estimate would be greater because larger practices would be included in the sample proportionate to the extent they cover the physician population, not the practice population.
Unfortunately, establishing sample frames of practices can be problematic. For example, Gans, Kralewski, Hammons, and Dowd (2005) took great pains to construct a national frame of practices that was as complete as possible for a study for the Agency of Healthcare Research and Quality (AHRQ), obtaining practice listings from a number of sources. Even then they estimated that the coverage of practices nationally they were able to achieve was somewhere between 70% and 88%.
Sampling Physicians to Learn About Practices
An alternative sample design that would also produce a pps sample of practices is to select physicians to obtain data about the practices in which they work. This is an example of indirect sampling (Lavallée, 2007), sampling from a population related to the target population rather than directly from that target population. To draw a contrast, if the focus was on the physician’s perspective about the practices in which she or he works, sampling physicians would be a “direct sample” from the target population. Then data would be collected from the sampled physician and physician level estimates about the practices they work in would be produced.
Here we will consider the need to collect the practice level data from a person other than the sampled physician, someone knowledgeable about data important to the study. This might be from a “practice head,” an “office manager,” and so on. When this is the case, estimates may still be developed at the physician level. As Kish pointed out, these will often prove of greater value than practice level estimates. They can also be of greater precision.
Even if practice level estimates are to be developed, sampling physicians to get to practices has one tremendous advantage. There are files such as the AMA and NPI that are expected to provide very high coverage of currently active physicians and therefore of the practices in which they work. One need not incur the costs in money and time to put together a useful sample frame to cover the population of interest. It is critical that the coverage of physicians provided by the frame be very high in order to make this approach operationally feasible, as will be discussed subsequently. Thus, the AMI file might not be an optimal choice for this approach.
Listed below is an itemization of how a study might be carried out, following a general outline provided by Kish (1965b).
Sampling and Data Collection
Select an equal probability systematic random sample of physicians from a physician frame providing high coverage of the physicians for each specialty of interest to the study. Using the contact information provided from the frame, contact the physician or the office in which she or he works. If the contact information is out of date, trace the physician to his or her new place of employment. If the sampled physician is no longer employed as a physician (e.g., retired or died), the physician record is not tied to an existing practice and thus would be characterized as out of scope (ineligible). When the physician’s actual place of employment is found (be it the organization originally contacted or after tracing), establish whether that place meets the study eligibility requirements for the target population of practices. If so, gather practice level information from someone at the practice considered a reliable source of practice information for the study at hand. If not, the sampled practice is ineligible for the study. If eligible, collect all data at the practice level. If physician level estimates are of interest, the practice level data can be assigned to the sampled physician record that led to the practice.
Estimation
At the Physician Level
Estimates can be developed where the unit of analysis is physicians and the target population is physicians within practices eligible for the study. With equal probability sampling of physicians, physician level estimates within specialty will not incur added variance due to sample rate variation. If sampling rates vary by specialty and physicians are pooled across specialties for some estimates, the variation in rates will contribute to the sampling variability of those estimates. An example of a physician level estimate is the proportion of physicians in practices where the policy is to take course of action A when treating patients with condition B among all physicians in the targeted population of practices.
At the Practice Level
Estimates at the practice level can also be developed when sampling physicians to reach practices. However, such estimates would be subject to greater variation than estimates at the physician level because of the variation in the sample probabilities of selection of practices. Larger practices will have a larger chance of selection. There is no variation in the probabilities of selection of physicians—they were all selected with the same probability. (For further discussion of the impact of the variation of sample rates or weights on the variation of sample estimates, one can see Heeringa, West, & Berglund, 2010; Kish, 1965a
). To develop estimates at the practice level where an equal probability sample of physicians has been selected from a single specialty or set of specialties, the probability of selection of the practice can be obtained by multiplying a factor ni
by the chance of selection of the physician who led to the practice. Here ni
represents the number of physicians in the specialty (or specialties) appearing on the sample frame that are found in practice i. This factor accounts for the fact that sampling any of these physicians would have led to the practice. An example of a practice level estimate comparable to that discussed for physicians is the proportion of practices where the policy is to take course of action A when treating patients with condition B among all eligible practices.
There are a number of things to note. First, Point 2 for practice level estimation is discussed in terms of an equal probability sample of physicians. If different specialties are sampled at different rates, this can be incorporated into the determination of the overall chance of selection of the practice. Suppose, for instance, that a study involves two specialties with probability ra
for specialty a and rb
for specialty b. If a practice has three physicians in specialty a and two in specialty b, then the chance of selection of the practice can be computed as
In some circumstances, a generalized weight-sharing approach might also be considered (Kalton & Brick, 1995; Lavallée, 2007; Lohr, 2010).
Getting the counts of physicians within a specialty from a practice assumes high coverage of the physicians in a specialty provided by the sample frame. If a specialty is not fully covered (e.g., this might arise if the frame is restricted to those flagged on a file as office based and there are some physicians who actually are office based but are not flagged as such), then a more painstaking effort is called for. For instance, identifiers should be collected (e.g., first and last name) of all physicians in a practice and then matched to the frame to determine whether they are on the file or not. Otherwise, the chance of selection of the practice may not be appropriately determined. Solo practices that have been misclassified (e.g., as not office based) and, as a result, do not appear on the sample frame have no chance of selection, a potential source of bias.
Of course, the notion of what constitutes a practice must be clearly communicated to the respondent in order to ensure that the number of physicians in a practice who contribute to the chance of the selection of the practice can be readily ascertained.
Dealing with survey nonresponse is an important consideration with indirect sampling. For one thing, when sampling physicians to gather data at the practice level, the frame information does not pertain directly to the sampled practice except for the address (unless out of date). Moreover, the probabilities of selection of nonresponding practices can be problematic to ascertain since they are determined based on the number of physicians in the practice in the targeted specialties. A follow-up effort to obtain such information from nonparticipating practices or from the Internet may help in this regard but may fall short of providing the needed data or be operationally costly to implement. Methods of dealing with survey nonresponse for indirect sampling involve modeling (see Lavallée, 2001; Xu & Lavallée, 2009). One strategy to aid in the development of such models may be to follow-up a randomly selected sample of the nonresponding practices. This would help concentrate survey resources on obtaining the necessary information, helping to increase the response rate to such an effort while requiring less time and money.
Related Issues and Topics
Some physicians may work in multiple practices. This could vary by specialty. If this percentage is nonnegligible for the specialties of interest in a given study, sampling practices through the sample selection of physicians and the development of estimates becomes more complicated.
Characterizing multidisciplinary practices for estimation and analytic purposes may be cumbersome. For instance, depending on the specialties of interest to the study and corresponding analytic objectives, the population of practices for which estimates are produced could be characterized as those containing “at least one PCP” or “at least one family practitioner” or “at least half PCPs.”
As mentioned earlier, there is an NPI file at the organization level (Entity Type 2). Because of the requirements associated with obtaining NPIs (HIPAA covered entities), it would be expected that this file would provide very high coverage of physician practices and other medical groups in the country (although distinguishing between types of organization on the NPI file could require substantial screening to help ensure high coverage of a target population). Moreover, they would be required to keep their contact information up to date. To the extent that organizations that are physician practices (or other medical groups of interest) can be readily identified by the descriptions (taxonomies) found on the NPI organization level file, a sample frame of practices may be produced with very high coverage of the target population, but perhaps with a high number of ineligible organizations as well. Because sole proprietorships may represent practices, they should be added to the frame from the individual provider listings (Entity Type 1). There would be a number of limitations and uncertainties with such an approach. These include uncertainty about coverage because the descriptions may not readily identify all practices of interest (this would be a major concern in establishing the sample frame unless further research is done related to the NPI file of organizations); the amount of screening necessary to identify eligible practices (this raises concerns about cost); no size measure, so organizations cannot be sampled pps based on the number of physicians perhaps resulting in higher variances than might be considered desirable for any physician level estimates planned; and limited information for stratification and weighting purposes. Again, there is no link established between the NPI organization and individual provider files.
One may sample physicians to get to practices or clinics to get to particular types of patients or patient visits. This might be of particular interest if the type of visit is relatively rare. For instance, suppose it is of interest to sample women at the time of their first prenatal visit to a physician. A set of physician specialties (e.g., Ob/Gyn, Obstetrics, FM) could be identified and physicians sampled from listings. The practice or clinic for which the physician works will then have been selected with probability proportionate to the number of physicians in the targeted specialties who work there. Prenatal visits to the practice (to any physician in the practice or clinic, not just the sampled physician) could be sampled and those representing the first such visit to a physician would identify a woman asked to participate in the study. It might be of interest to note that for the example of identifying the first prenatal visit, using the NPI file could allow a more expansive set of health care providers. For instance, nurse practitioners and midwives are included in the taxonomies of health care providers. Of course, coverage of the full population of additional health care provider types would warrant investigation.
Conclusion
Choosing the source of a sample frame is part of the process of developing a sample design for carrying out a survey of physicians or practices. Important steps in this process include the: identification and prioritization of the analytic objectives of the study; definition of the target population(s) of the study; determination of the survey resources available to achieve the analytic objectives; Balancing considerations of variance, bias, cost, and analytic objectives. This includes issues related to the choice of file for the sample frame: population coverage; cost; up-to-date contact and other data; auxiliary data for stratification and weighting; duplication. estimation and analysis plans; sample size; sample allocation, related to subgroup analyses.
Three files that are candidates to serve as sources for a sample frame of physicians have been considered here. All three have distinct advantages and disadvantages related to factors such as coverage, up-to-date information, and cost. The AMA Masterfile has a long track record as the source of physician sample frames while there has been little or no use of the NPI and AMI files. It would be of great interest if methodological studies were to be undertaken evaluating the sample frame potential for these (and other) files. The NPI file is beginning to be used as the sample frame source for some physician surveys, so some things will be learned for the physician specialties surveyed as reports and articles are written. Since the NPI file is free, there may be some potential advantage to using both the NPI and AMI files in a dual frame approach.
The types of research needed to further evaluate files that represent potential sources of sample frames include: the identification of eligibility criteria commonly used in studies of physician populations. For example: specific specialties (e.g., a set of the individual specialties focused on primary care); nonfederal; see patients at least X percent of your work week; Patient load not solely restricted to the institutionalized population. the screening of office staff and sampled physicians to establish eligibility criteria for the study at hand; the development of sample weights adjusting for survey nonresponse; using weighted cross-tabulations, comparing specialty designated on the sample frame to specialty reported in response to the survey to determine the extent of misclassification on specialty (other items on the sample frame can be compared to survey responses asking about the same information to assess misclassification on these other items—“working in an office-based practice” could be evaluated for some files); and summing the sample weights adjusted for nonresponse for various subgroups of common interest, such as specialties, can be undertaken for comparison with other studies using similar eligibility criteria (e.g., NAMCS documentation for each annual public use file provides sample estimates of the number of physicians in selected specialties among those flagged as office based on the AMA file and meeting NAMCS eligibility criteria—standard errors of these NAMCS estimates are not provided, however, limiting the ability to compare estimates to some extent).
Physician files can also serve as a sample frame for sampling practices. We have considered sampling physicians to obtain estimates about practice characteristics both at the physician level and at the practice level. There are many advantages (statistical, operational, and interpretative) to producing physician level estimates. However, even if practice level estimates are desired, using the indirect sampling approach of sampling physicians to learn about the practices for which they work, can help ensure high coverage of practices without the cost and uncertainties associated with attempting to establish a practice-based frame. There are challenges with such an approach, such as dealing with issues of nonresponse. Nevertheless, relatively speaking, it appears that this would often be preferable to the frame construction issues of time, cost, and coverage associated with attempting to develop a sample frame of practices as well as the cost and time associated with screening sampled facilities to identify those eligible for a study.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
