Abstract
State databases offer researchers the opportunity to conduct research using data collected by states. These databases contain financial, demographic, and accountability data. Accessing and acquiring data from these repositories, though, can offer challenges to scholars interested in conducting research. This brief describes the type of data collected by states, how to acquire this data, and includes potential limitations when using this data. Special consideration is given to concerns regarding acquiring state information on smaller populations of students, especially students identified as gifted and talented.
At times, the greatest difficulty of conducting research is collecting data. By using data collected by state education departments, researchers can let others shoulder much of the hard work. A wealth of education data already exists in state data repositories, often referred to as administrative data. States require school districts to report student enrollments, district spending, and a plethora of other information to state agencies. Commonly, this information sits in spreadsheets housed in a state’s education department.
Researchers in gifted education may be particularly interested in data that are collected by state education departments. For example, a significant amount of research in gifted education focuses on equitable representation in gifted education programs (Goings & Ford, 2018; Hodges, Tay, Lee, & Pereira, 2019; Hodges, Tay, Maeda, & Gentry, 2018; Peters, Gentry, Whiting, & McBee, 2019). Demographic information used to calculate rates of representation is one example of the data collected by state education departments. State mandates and laws govern what data are collected by each state.
In states where gifted education is mandated, information about gifted education programs is collected. Even in states where gifted education is not mandated, useful information is often collected by the state. For example, South Dakota does not mandate gifted education programming and so does not collect any identification, programming, or enrollment information. That said, South Dakota still collects financial information for which spending on gifted education is still accounted. In other words, even though a state might not directly collect the data that a researcher needs, knowledge of what the state does collect can allow a researcher to conduct research.
Furthermore, using state data allows for the evaluation of state policies. Recently, Plucker, Makel, Matthews, Peters, and Rambo-Hernandez (2017) called on researchers to strengthen policy research in gifted education research. Researchers have answered this call by using state data to examine gifted education funding (Hodges, 2018; Hodges, Tay, Desmet, Ozturk, & Pereira, 2018), equitable representation (Lamb, Boedeker, & Kettler, 2019), and the efficacy of age-based classroom design (Peters, Rambo-Hernandez, Makel, Matthews, & Plucker, 2017). However, there is still a wealth of state policies to evaluate.
Researchers can take state collected data and use that data to evaluate state policies. Often, states purposefully mandate data collection for the purpose of policy evaluation. For example, Florida mandates alternative pathways of gifted identification for children who participate in free and reduced lunch programs. In turn, they collect data on students participating in free and reduced lunch programs who were identified as gifted (Florida Department of Education [FDOE], 2018). In short, not only should researchers use state date because of its availability but also because it fulfills an important purpose of research. Research conducted on state data can be used to inform those who make state policy.
Objective
In this article, we provide a brief overview of the types of data that are publicly available either on state data online repositories or through public information requests. Following this, we discuss common issues and considerations that arise when acquiring and using state based administrative data sets, with a specific focus on issues likely to arise when conducting research on gifted education.
Data Types
The three most common forms of state data are financial accounting data, demographic data, and accountability data. Financial accounting data include school revenue and expenditures. Demographic data include students and personnel counts. Accountability data include standardized test scores and school evaluations.
Financial Accounting Data
All states make school district financial accounting information public—either readily available on state websites or through request. Financial data are largely aggregated on a state level or at the district level accounts. States report total spending which then is commonly divided in per-pupil spending. District-level data are where researchers find specific program expenditures. For gifted education researchers, how spending for gifted education is accounted for in budgetary records varies across states. In South Dakota, for example, districts do not receive support from the state and must fund gifted programs through their general fund. Researchers can find district spending on gifted education under the general fund category (South Dakota Department of Education, 2018).
State websites are not the only location for financial information. External websites are available that aggregate data across multiple states. For example, the Davidson Institute provides summary information of gifted education laws and funding mandates across the United States. This repository provides researchers with an initial starting point in determining where to locate state information on funding (Davidson Institute, 2018). A second example is the School Funding Fairness Data System housed at Rutgers University (Baker, Srikanth, & Weber, 2016). This database includes information from the Census, state financial reports, and reported fiscal data from public school districts across the United States. One commonality across state reporting websites and aggregate websites is that financial information is primarily at the district level. This will change due to the requirements of the Every Student Succeeds Act (ESSA; 2018).
Increased Transparency Under the ESSA
Although all states provide financial information at the district level in terms of spending, there is no uniform mandate across states on spending by schools within a district. Under the ESSA (2018), school districts must report spending by school and not just districts. For example, currently districts report their budget allocated to teacher salaries across districts within their annual financial reports. Under the ESSA, those districts will now report their budget allocated to teachers’ salaries by school.
Demographic Data
Demographic data encompasses school enrollment and staffing information. These demographic data vary across states. That said, federal mandates require all states to report a core of demographic data. This information for all public school districts in the United States can be found on the National Center for Educational Statistics (NCES; 2018) website. English language learners and students with an individual education plan are included in this demographic data, as both receive federal funding (U.S. Department of Education, 2018).
For researchers looking for demographic information from state departments of education, reporting practices vary by state. Some states, like Florida, provide information on the race/ethnicity and free and reduced lunch status of gifted students in their demographic information (FDOE, 2018) which can be found on its online public data repository, the PK-20 Education Data Warehouse. In contrast, Washington does not provide free and reduced lunch status of identified gifted students, but it does provide grade-level information (Washington Highly Capable Program, 2018).
Masking
Two federal laws can lead to public records being masked in demographic student count data: the Family Educational Rights and Privacy Act (FERPA) and the Protection of Pupil Rights Amendment (PPRA). State interpretation and compliance with FERPA and PPRA vary across states (Greenberg & Goldstein, 2017).
Interpretation of FERPA and PPRA may involve the state mandating data granularity floors. A granularity floor means that all demographic data for groups whose population is below a given threshold are not reported (Iyengar, 2002) because small populations are more easily identifiable (Family Compliance Office, 2012). States can offer exceptions to researchers and provide access to identifiable data (Greenberg & Goldstein, 2017). Again, there is no standard across states as to how state education departments comply with FERPA and PPRA, the threshold at which students are obfuscated, or the process of acquiring identifiable data.
Accountability Data
Finally, all states collect accountability data. Accountability data include information on attendance, discipline, graduation rates, college attendance, and standardized testing information. Accountability transparency was increased across the United States following the implementation of No Child Left Behind. States that did not have accountability measures in place prior to 2002 added them once the provisions of No Child Left Behind were implemented (U.S. Department of Education, 2003).
Standardized testing data can include information on advanced placement course enrollment, testing rates, and passing rates; dual credit course enrollment, testing rates, and passing rates; SAT/ACT enrollment, testing rates, and score averages; and state standardized testing rates, passing rates, and mastery rates. A provision of the ESSA that went into effect in the 2017-2018 academic school year requires all school districts to report students at all proficiency levels. Prior to this, states were only required to report rates of students failing and students meeting proficiency levels (Klein, 2015).
Comparing state standardized test scores across states is challenging. This fact limits the inferences that can be drawn from state to state comparisons. Work done at Stanford University has made comparing student test scores on state standardized tests feasible.
Stanford Education Data Archive
The Stanford Education Data Archive is a project housed at Stanford where researchers are working to normalize disparate state accountability tests to allow for inferences to be drawn across multiple states (Reardon et al., 2016). The researchers in the project used state standardized testing accountability data collected by the U.S. Department of Education to construct a national standardized testing reference cohort. In turn, the state standardized test scores from all reporting school districts in the United States are compared within this reference cohort. In this way, a researcher can compare a school district in Indiana where students take the ISTEP+ to a school district in Iowa where students take the Iowa Assessments.
Data Requests
Not all publicly available data are hosted on the state education agency’s website. Data that are not immediately available can be acquired through data requests submitted to the appropriate state agencies.
Freedom of Information Act
The U.S. Freedom of Information Act is the model upon which all state governments dictate their own state-level public information acts (Cate, Fields, & McBain, 1994). The level of public availability of data, though, varies across states. The Freedom of Information Center at the University of Missouri’s School of Journalism houses one of the most comprehensive repositories for information regarding state and national Freedom of Information Acts (National Freedom of Information Coalition, 2018). This organization can provide information on availability and granularity of requested information. Furthermore, the organization’s repository contains all the requisite forms for each state’s information request process.
Costs
State-level freedom of information laws can make data available, but there is the possibility of incurring costs. Costs are levied on researchers by state organizations when the amount of time necessary to acquire the requested information is above a designated threshold. For example, in Texas, if the given threshold requires more than an hour of programming time by staff members, then a cost is charged to the requestor. These costs are set by legislators; in the case of Texas, this cost is described in the Texas Administrative Code (2018). Other states, like Indiana, state that costs are incurred but do not provide specific guidelines as to what these costs will be and leave the matter of costs to the discretion of state directors (Indiana Access to Public Records Act [APRA], 2018).
Of note, states can also include provisions that waive costs if the request is deemed in the interest of the public. Research by academic scholars in the field of education are often categorized as a request that is within the public interest. For example, under North Carolina’s Public Record Law (G.S. §132-1), fees relating to public education requests can be waived if the requestor states that the information’s use will increase the public’s knowledge (e.g., public’s knowledge of underrepresentation of students who are Black, Latinx, or Native American in gifted programs in North Carolina Schools).
Considerations When Making Data Requests
When requesting data from state agencies, a researcher should acknowledge that limitations exist. States retain the right to refuse requests in cases where individual privacy or safety is undermined.
States have provisions wherein they can refuse a public request for information. Refusals for requests are usually categorized as threats to an individual’s privacy or the safety of individuals in the state. In terms of privacy, states must comply with the FERPA of 1984 (20 U.S.C. Sec. 1232g; 34 CFR Part 99). For example, Indiana’s APRA law states that the “scores of tests if the person is identified by name and has not consented to the release of the person’s scores” are exempted from public data requests (APRA 5.14.3.4.4b).
It is important to also consider that fulfillment of data requests are not immediate. States can mandate the length of time required for state agencies to respond to information requests (e.g., Texas mandates that all requests be responded to within 10 business days; Texas Administrative Code, 2018). Determinations for when a request will be fulfilled are influenced by the number of requests currently being fulfilled, staff on hand available to fulfill the request, and if the request requires any special considerations (e.g., if the agency must determine if the request will violate individuals’ privacy or security).
Familiarity with the types of data collected by the state is critical in making public information requests to state agencies. State agencies are required to fulfill requests, but they will only provide what is requested by the researcher. For example, Texas has four categories for students who are economically disadvantaged: Not identified As Economically Disadvantaged, Eligible For Free Meals Under The National School Lunch And Child Nutrition Program, Eligible For Reduced-price Meals Under The National School Lunch And Child Nutrition Program, and Other Economic Disadvantage (Texas Education Agency, 2018b). A researcher who requests information regarding free and reduced lunch participation, in the belief that this is the only proxy for socioeconomic status, would only receive information regarding free and reduced lunch participation. The researcher’s study may be better informed if the researcher realized that the state also collects information regarding poverty information and food stamp participation. Being knowledgeable about the types of data available is an important factor when requesting information from any state.
In the same vein as socioeconomic status, information collected on gifted education varies across states. For example, Washington collects annual information on how gifted education programming is implemented within its school districts (Washington Highly Capable Program, 2018); whereas, Florida only collects demographic information (FDOE Statistics, 2017). Other states, like California, do not collect any information on gifted education. Because the state of California stopped funding gifted programs in 2013, it has not collected any information on gifted education enrollment since the 2012-2013 academic school year (California Department of Education, 2018).
Memorandums of Understanding
A Memorandum of Understanding is a legally binding agreement between two parties. In most cases, it is an agreement between the researcher (or their institution) and the state where the researcher wishes to acquire data. The two parties enter into a legally binding agreement wherein the researcher describes how they will use, store, and dispose of the requested data (once finished). These formal agreements often describe who will use the data and how it will be used.
Commonly, Memorandums of Understanding are used when the requested data are not publicly available or are only partially publicly available. An example of a data set that would require a memorandum of understanding rather than a public information request would be individual student test scores. Note, institutional review board approval is likely required for any research that requires data obtained from using a Memorandum of Understanding.
Limitations and Methodological Considerations
Federal and State Definitions
Definitions do not always align at the federal and state level. An example of this is locale designations. In Texas, the state formulates locale differently than federal agencies. Whereas the NCES bases locale on population density and distance from major urban centers, Texas determines locale by county population and the rate in which the school district’s population is increasing or decreasing (Texas Education Agency, 2018a). In other words, the group of school districts in Texas that the NCES defines as rural is different from the group of school districts that the state of Texas defines as rural.
Masked Data
Concerns for student privacy have led states to increasingly employ masking in their public data. Masking is a method used to obfuscate or hide data that might lead to infringing on an individual’s privacy (Wang, Wang, Ren, & Lou, 2010). A researcher must consider the likelihood that their requested data will be masked if they wish to use public data.
In the case of public education data, masking is commonly applied to a value that is derived from a group with too few members. A common masking technique is to categorize all values less than a certain threshold. For example, a request spreadsheet might list <10 for all cells with values less than 10. Or, for testing averages, the spreadsheet might include an “*” for all averages calculated from less than 10 students. This can lead to some research questions being unanswerable through data acquired through a public request. For example, consider an individual who wanted to request data on the gifted identification of students who are Asian from rural schools. Due to masking, acquiring this data through a public request would be difficult. In the case of masked data, a Memorandum of Understanding is likely required.
Using Inferential Statistics
Contemporary scholars are critical of the use of inferential statistics on administrative or census data (Gibbs, Shafer, & Miles, 2017; McBee & Field, 2017). The reasoning offered by the scholars is that administrative data represent population data. In other words, when a researcher is using administrative data to assess the identification rate of children for gifted education programs in Indiana, they are not estimating these rates, they are calculating the actual rates. The extracted coefficients are closer in effect to parameters than estimates (Gibbs et al., 2017). Given this, performing inferential statistics on population parameters is inappropriate (McBee & Field, 2017).
Using administrative data for descriptive analyses where beta coefficients and associated standard errors are reported is an appropriate analytical technique (McBee & Field, 2017). Using this allows a researcher to discuss the effect size of a coefficient and its stability (Gentry & Peters, 2009). I offer a note of caution in interpreting standard errors in administrative data sets: Standard error is a function of n which means that larger populations are associated with smaller standard errors (Faraway, 2016). In administrative data sets with large numbers of observations (e.g., Texas with its 1025 school districts), standard errors can be misleading. This can lead to instances where the coefficient can appear to be more stable than it otherwise should be.
Conclusion
In conclusion, state collected data offer researchers increased opportunity to conduct research. The data are readily available to researchers and can be used to examine state policy or issues that are pertinent to scholars in gifted education (e.g., equity and outcomes). Scholars in gifted education have called upon the field to conduct policy research to inform decision makers (Plucker et al., 2017). State data are the means with which researchers can assess those policies. That said, using state data is not without difficulties and limitations. State policies vary by state, costs are often prohibitive, and masking can impede the ability of researchers to answer questions. Regardless, why re-invent the wheel, and why recollect the data if a state has already done it?
Footnotes
Acknowledgements
The author thanks Matthew Makel for his recommendations regarding the structure of the manuscript.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
