Concept Inventory for Fundamentals of Environmental Engineering Courses: Concept Inventory Development and Testing

Abstract

Fundamentals of environmental engineering (FEE) is a common core component of undergraduate curricula in civil engineering, environmental engineering, and environmental resources engineering, and it is increasingly becoming a core course in other disciplines. Conceptual understanding is an important pedagogical goal in FEE instruction. A strategy used to address this need in other fields is the development and implementation of a concept inventory (CI) test, which is an assessment tool that identifies common misconceptions about key concepts. Here, a Delphi process was used to identify FEE concepts that were deemed fundamental, important, and at the same time, prone to misperceptions by students, including (1) reactor theory, (2) the mass balance equation, (3) biochemical oxygen demand, (4) units of measure, and (5) chemical equilibrium and partitioning. A CI was developed that included incorrect answers, or “distractors,” based on student interviews that identified common FEE misconceptions. The CI was beta tested in FEE courses at six universities. Analysis of psychometric data from beta testing revealed which concepts were most difficult and identified concepts that needed further refinement. Being cognizant of student misconceptions is a prerequisite for faculty who strive to improve students' conceptual understanding of FEE concepts.

Introduction

The fundamentals of environmental engineering (FEE) course is a core component of undergraduate curricula in civil engineering, environmental engineering, and environmental resources engineering, and is increasingly becoming a core course in other disciplines, such as chemical engineering and sustainability studies. FEE is often a foundation (prerequisite) course for electives such as water treatment, wastewater treatment, air pollution control, solid waste management, and hazardous waste management.

Conceptual understanding is an important pedagogical goal in FEE instruction, or, indeed, in any engineering instruction. The celebrated book How People Learn emphasizes the importance of conceptual understanding (Bransford et al., 2000). The authors assert that students who organize facts and ideas within a conceptual framework are more likely to learn new information quickly and will also be able to apply what they have learned to new situations. By contrast, students new to a field of study tend to make superficial connections or shoehorn memorized formulae rather than organizing their knowledge around fundamental concepts and general principles. Studies of learners' abilities to apply their knowledge to novel problems, a process often referred to as transfer of learning, have clearly shown that for knowledge to be transferred it must be based upon general principles (Litzinger et al., 2011). However, presently there is no readily available resource that lists the general principles needed by a student to understand core concepts in FEE classes. This lack results in at least two important challenges for FEE instructors. First, there is a lack of knowledge of which fundamental, underlying concepts give students the most difficulty and prevent them from mastering course material at the desired level. Second, there is no quantitative way to assess whether different teaching approaches, such as laboratory projects, different classroom formats, or curricular ordering improve students' conceptual understanding of FEE material.

A strategy that has been used to address this problem is the development and implementation of a concept inventory (CI). The pioneering work in CI development was the Force CI, a brainchild of Hestenes et al. at Arizona State University (Hestenes and Wells, 1992; Hestenes et al., 1992). CIs have been shown to be a powerful and accessible tool to support iterative improvement in faculty teaching and to enhance the scientific literacy of students (Smith and Tanner, 2010). They have been used to catalyze curriculum reform (Hake, 1998; Smith et al., 2008), and to identify student weak spots (Garvin-Doxas et al., 2007). The CI model has been applied by educators in many science and engineering disciplines; at present CIs exist for chemistry (Krause et al., 2004), biology (Klymkowsky et al., 2010), statistics (Stone et al., 2003; Allen et al., 2004; Allen, 2006; Stone, 2006), electromagnetics (Notaros, 2002), electromagnetic waves (Roedel et al., 1998; Rhoads and Roedel, 1999), circuits (Herman et al., 2011), signals and systems (Wage and Buck, 2001), strength of materials (Richardson et al., 2003), thermodynamics (Midkiff et al., 2001), materials science (Krause et al., 2003), dynamics (Gray et al., 2003, 2005), fluid mechanics (Martin et al., 2003, 2004), and statics (Steif and Dantzler, 2005).

Although many CIs have been developed as noted above, the process of generating a CI has not been very well defined. The primary input is typically the experience of one or more faculty members who have taught the course for many years and thus have a strong sense of what concepts students are having difficulty with. In the case of FEE, this situation is more complex. Environmental engineering itself is interdisciplinary in nature; moreover, FEE is required of many majors and with a wide range of prerequisites. Because of this diversity of ways in which FEE courses are incorporated into curricula, there is likely to be a similar diversity of faculty opinions on concepts that belong to a fundamentals of environmental engineering concept inventory (FEECI). Therefore, the development of a CI for FEE courses represents a particular challenge.

The overall goal of this work was to produce an instrument for assessing conceptual understanding in a core curriculum course for civil and/or environmental engineering. Such an instrument has the potential to provide a needed technique for formative assessment of pedagogical frameworks and instructional methods in the FEE curriculum, and can play an important role in assessment for programmatic accreditation under the ABET standards. Results of the CI could also be used to indicate the effectiveness of different sequences of prerequisites or other programmatic decisions, which might be particularly beneficial for environmental engineering and science programs housed in nontraditional departments. The specific objectives were to (1) identify key concepts in FEE that are both important and difficult to understand, (2) develop a FEECI that can be used to quantify students' conceptual understanding of these key FEE concepts, (3) administer a “beta” version of the FEECI at U.S. universities with required undergraduate FEE courses, and (4) conduct a psychometric analysis of the results of its initial administration, thereby assessing the effectiveness of questions on the FEECI and identifying trends in conceptual understanding of key FEE concepts. Although the authors previously published a brief conference paper that outlined a general approach to developing a FEECI (Sengupta et al., 2013) the current article provides greater detail on the methods used, results from FEECI beta testing, psychometric analysis, and discussion of the implications of the results.

Methods

A FEECI Development Team was formed that consisted of professors from the following 12 universities: University of Colorado Denver, Florida A&M University–Florida State University Joint College of Engineering, Humboldt State University, Lehigh University, University of Massachusetts Amherst, University of Massachusetts Dartmouth, Missouri University of Science & Technology, Northeastern University, University of South Carolina, University of South Florida, University of Utah Salt Lake City, and Utah State University. The institutions were selected to represent a range of sizes, to include both public and private schools, and to include both teaching-focused and research-focused universities. Faculty were identified who had taught FEE courses for multiple years.

Topic identification

Development team faculty initially submitted their syllabi to the authors to identify common topics among the courses. In CI development, a topic is a course unit, which may include multiple concepts. For example, the topic of chemical equilibrium includes multiple concepts, such as precipitation-dissolution equilibrium, acid-base chemistry, and gas-liquid partitioning. Following the identification of common topics, the FEECI was developed following the general procedure outlined by Adams and Wieman (2011) using the specific steps described below.

Delphi study

The rationale of the Delphi method is that it iteratively moves a team of experts toward a consensus (Dalkey and Helmer, 1963; Streveler et al., 2003). The process recognizes that expert judgment is necessary to draw conclusions in the absence of full scientific knowledge. The method avoids relying on the opinion of a single expert or merely averaging the opinions of multiple experts (Goldman et al., 2008). Instead, experts share ideas and beliefs based on their expertise in teaching (in this case, FEE), so that each can make a more informed decision, but in a structured way. This can prevent the situation in which, for example, in round-table discussions a few panelists have excessive influence (Pill, 1971).

In the present study, an online Delphi study was used to identify concepts in FEE courses that are critical but prone to difficulty among students. Each member of the FEECI Development Team completed an on-line survey as the first step of the Delphi study (stage 1). The on-line survey asked members of the Development Team to identify up to five key concepts within each of the identified topics in an open-ended format. For each concept listed, the Development Team members also indicated their opinion or perception of how important that concept is, and how difficult it is for students, using a five-point Likert scale. The members were also asked to state their rationale for including that particular concept in the list. Following identification by the authors of the most common concepts for each topic, members of the Development Team were resurveyed (stage 2). In this round, members were informed of the frequency with which each concept was reported in stage 1, were informed of the means and standard deviations for the team's Likert scale values in stage 1, and were then asked again to rate the importance and the difficulty of each concept.

Surveys were also administered to 10 students at each of three universities: University of Massachusetts Dartmouth, University of South Florida, and University of Utah Salt Lake City. The 30 students were volunteers from a pool of students who had recently completed an FEE course. Students were given the results of the stage 1 Delphi study described above and asked to rank the importance and the difficulty of each concept using a five-point Likert scale.

Generation of FEECI questions and distractors

Based on the Delphi Study, eight concepts (discussed in detail in Results of CI Development section) were initially identified as being both important and difficult for students in FEE courses. Questions were formulated following the heuristic propounded by Simoni et al. (2004) for each of the initial eight concepts for possible inclusion in the FEECI test. Simoni's heuristic consists of the following guidelines:

(1) Each question should cover only a single concept.

(2) The wording of the problem statement and the choice of answers should be such that little computation is required for solving the problem.

(3) The answers should be framed such that the correct answer identifies proper understanding of the concept, and the incorrect answers accurately represent the students' misconceptions.

(4) Each problem should not be dependent upon a nonstandard term or definition.

Each CI question consists of a “stem” (i.e., the prompt or question to be answered) along with the correct answer and a set of “distractors” (incorrect answers that represent common student misconceptions). To develop the distractors, the stems were administered to student volunteers in an open-ended or “free response” fashion, that is, without any choices for selection. Thirty student volunteers were chosen, 10 each from University of Massachusetts Dartmouth, University of South Florida, and University of Utah. Necessary IRB approvals to interact with these students were obtained from each campus (Approval No. 10.069 and its extension No. 12.032 under the “Expedited Category No. 6 & 7” at UMass Dartmouth; approval No. Pro00003722 at USF; and approval No. 00048622 at University of Utah). These student volunteers, who were either registered in the FEE course at their campus at that time or had recently completed it, were asked to describe (both verbally and in writing) their rationale or thought process for each question as they worked through the answers. The study authors took notes during these sessions. If multiple student volunteers indicated a particular misconception or incorrect approach, that misconception was used to formulate a distractor for the multiple-choice version of the FEECI. Sengupta et al. (2013) provide additional details on the development of FEECI questions and distractors.

FEECI beta tests

A beta version of the FEECI was administered to faculty volunteers during the Association of Environmental Engineering and Science Professors (AEESP) bi-annual Education and Research Conference, which was held in Golden, CO, in 2013. Faculty were asked to take the test and provide their comments or questions. They were also solicited for suggestions for additional questions to be incorporated into the FEECI. The results of the faculty responses were used to revise the CI. Subsequently, a 21-question CI was administered to 320 students at the following six campuses during the Fall 2013 or Spring 2014 semesters: University of Massachusetts Amherst, University of Massachusetts Dartmouth, University of South Carolina, University of South Florida, University of Utah, and Northeastern University. The 1-h, multiple-choice test was typically administered near the end of the semester, that is, after the concepts tested had been covered in the course during the semester. Some researchers have experimented with administering this instrument twice, at the beginning of the semester (pretest) and again at the end of the semester (posttest) and computing “normalized gain,” which is defined as (Hake, 1998): \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} < g > = { \frac { posttest - pretest } { 100 - pretest } } \tag { 1 } \end{align*} \end{document}

that is, the fraction of the available improvement in score that was obtained during the course. Thus, the higher the gain, the more effective the course was in instilling fundamental concepts. We did not follow this procedure because we felt students in general would be unfamiliar with environmental engineering terminology at the beginning of the semester even though they may understand the basic principles involved. In such a situation, the pretest would not be able to provide any meaningful information.

Data analysis

Item analysis was conducted based on classical test theory (CTT) to examine the validity and reliability of the FEECI using the data from the beta tests on the six campuses. Several statistical parameters were computed to indicate the quality of individual questions and the whole test. All item and distractor analyses were conducted using the IA_CTT SAS macro developed by Yi-Hsin and Li (2015).

Item difficulty index

The item difficulty index is the proportion of students who answer a question correctly, and ranges from 1 (i.e., all students answer a question correctly) to 0 (i.e., all students answer a question incorrectly). Items with difficulty >0.90 (too easy) or <0.10 (too hard) need to be revised or eliminated from the test because they provide little information in terms of students' performance on the test.

Item discrimination indices

Item discrimination indices are used to measure how well a question distinguishes students with different performance levels on the scale of interest. Three item discrimination indices were computed in this study: (1) discrimination from two extreme groups, (2) item-total correlation, and (3) item-concept correlation.

We classified the top 25% students (Q1; ≥12 of 21 items correct) and bottom 25% of students (Q4; <7 correct items) into the highest and lowest performing groups, respectively. Then, discrimination from the two extreme groups was computed as the fraction of Q1 answering the item correctly minus the fraction of Q4 answering correctly; this metric ranges from 1 (all students in the high performing group answer a question correctly and all students in the low performing group answer incorrectly) to −1. Items with a large positive value indicate a good quality test, while items with values <0.20 are considered poor.

The item-total correlation is the correlation between the item response (1 for the correct answer and 0 for the incorrect answer) and the total score for all remaining items on the CI. A large positive item-total correlation indicates consistency between the response to an individual item and overall performance on the test. Questions with an item-total correlation <0.10 were deemed candidates for revision or elimination (Nunnally, 1967; Stone, 2006; Revelle, 2014).

The item-concept correlation was used to examine whether a response to a question was consistent with a student's performance on the concept to which that question belongs. Thus, the item-concept correlation was obtained by computing the correlation between the response to a question and the total score for the remaining questions within the same concept. A large positive item-concept correlation value for a question indicates that an individual question measures the same construct that the test or concept is designed to measure.

Cronbach's alpha

Cronbach's alpha (Cronbach, 1951) has become a standard reliability measure under CTT. The value of alpha represents the smallest fraction of total score variance that is due to true score variance (rather than errors in measurement). The minimum acceptable value for alpha is typically set at 0.70 (Litwin, 1995), such that less than 30% of the score variance is due to measurement errors (Bardar et al., 2007). For the beta test of the FEECI, the Cronbach's alpha was computed using the IA_CTT macro in SAS^® (Yi-Hsin and Li, 2015).

Alpha-if-item-deleted

The alpha-if-item-deleted statistic is based on the value of Cronbach's alpha calculated if an individual item is removed from the test. A decrease in alpha means that removing a particular question makes the whole test worse, indicating that that question is a good question, whereas questions exhibiting an increase in alpha are poor questions.

Distractor analysis

We examined the frequency with which each distractor was chosen to gain insight into which misconceptions were most common for each concept. We also drew trace lines for each correct item and distractor by classifying the students into five different performing groups based on their raw scores (Table 1) and computing correct item and distractor frequency percentages for these groups. A distractor trace line is expected to decrease with increasing performing levels (i.e., higher-performing students select any incorrect answer less frequently), while a trace line for the correct option, or item characteristic line (ICL), is expected to increase with increasing performing level (i.e., higher-performing students select the correct answer more frequently).

Table 1.

Division of Students into Five Groups of Different Performance Levels (n = 320 Total Students)

Group	Range of raw scores (out of 21 total questions)	Number of students in the group	Percentage of students in the group
Group 1	2–6	70	21.9
Group 2	7–8	72	22.5
Group 3	9	45	14.1
Group 4	10–11	68	21.2
Group 5	12–19	65	20.3

Results of CI Development

Topic identification

A summary of the topics covered at participating universities was developed based on the syllabi provided by the faculty and is shown in Table 2. Five topics were chosen (mass balances, environmental chemistry, measurements and units, risk assessment, and environmental biology) for use in the Delphi study because they were common to FEE courses for at least half of the participating universities. It may be noted that many of the concepts that challenge students in topics such as water treatment or wastewater treatment would also fall under the heading of one (or more) of the five topics identified here. For example, “Monod growth kinetics” or “biochemical oxygen demand (BOD)” are both concepts that are applicable to the topic of wastewater treatment, but here were listed under the topic of environmental biology.

Table 2.

Topics Identified Through Analysis of Syllabi at 10 Participating Universities

Topic	Number of syllabi that include the topic
Mass balances	10
Water quality	9
Wastewater treatment	9
Air pollution	9
Environmental chemistry	8
Solid & hazardous waste management	8
Measurements & units	7
Drinking water treatment	7
Risk assessment	6
Energy balances	5
Environmental biology	5
Population growth & predictions	4
Environmental regulations	3
Global element cycles	2
Sustainability	2
Groundwater & contamination remediation	2
Hydrology	2
Ionizing radiation	1
Nanotechnology	1
Renewable energy systems	1

Delphi study and selection of concepts for inclusion in the CI

In stage 1 of the Delphi study, members of the FEECI Development team were asked to identify fundamental concepts within each topic area, indicate their rationale for concept inclusion, and rate the degree of importance and difficulty of each concept using a five-point Likert scale. The initial identification of concepts by the Development Team members was “open-ended.”; in other words, members were not prompted to select a concept from a menu. Instead, members were able to phrase their responses in any manner desired. Because of this, some interpretation of members' responses was required to determine whether two members had identified the same concept. For example, if one member wrote “oxygen demand,” one wrote “BOD,” and one wrote “biochemical oxygen demand,” these were all considered to be the same concept.

In stage 2 of the Delphi study, the FEECI Development Team was given a summary of the data from stage 1 and resurveyed. The data from stage 2 are shown in Fig. 1, which provides detailed information on the concepts identified, their degree of importance, and degree of difficulty. Student volunteers were given a similar survey. Based on these surveys, eight concepts were identified as both important and difficult: (1) reactor theory (RT), (2) the mass balance (MB) equation (MB), (3) BOD, (4) units of measure (UOM), (5) chemical equilibrium, (6) partitioning, (7) reaction kinetics, and (8) definition of risk. The primary criterion for this list of concepts was the frequency of responses of the development team and the student volunteers.

FIG. 1.

Concepts, degree of importance, and degree of difficulty identified by the FEECI Development Team within the following topics: (a) mass balances, (b) environmental biology, (c) environmental chemistry, (d) risk assessment, (e) units of measurement. “Frequency of Responses” indicates the number of times that a particular concept was listed by participants taking the survey. The total number of survey participants was 12, but the “Frequency of Responses” can exceed 12 if a single participant listed a concept more than once (e.g., a survey respondent could have listed both “solubility product” and “chemical equilibrium,” which would thus count as 2 responses under the concept of “chemical equilibrium, partitioning, and equilibrium constants”). “Importance” and “Difficulty” were both scored on a 5-point scale by the survey participants; for the purposes of this figure, scores have been rescaled to a 10-point scale. The figure shows the arithmetic mean of the (rescaled) values reported by survey participants, and error bars represent the standard deviations of the participants' responses. FEECI, fundamentals of environmental engineering concept inventory.

Following the question-writing sessions, concepts (7) “reaction kinetics” and (8) “definition of risk” were dropped from the CI to reduce the number of concepts tested to the practical limit. The practical limit is the number of concepts that can be covered on a CI test, as the test is expected to consist of no more than 30 questions (preferably fewer), and each concept must be represented by multiple questions to ensure reliability. Although these concepts were identified as both important and difficult, they did not readily lend themselves to CI questions that followed the heuristic described above. In addition, it was found that the concepts of (5) “chemical equilibrium” and (6) “partitioning” could be understood as a single concept, chemical equilibrium and partitioning (CEP). The final list of concepts, along with a discussion of their importance and difficulty, is provided in Table 3. Because of the trade-off between the number of concepts that can be covered and the number of questions that can be included for each concept, we deemed that the maximum number of concepts included was five, with at least four questions included for each concept.

Table 3.

List of Concepts Tested in Fundamentals of Environmental Engineering Concept Inventory, Along with the Reasons That They Are Important and Difficult Identified Through the Delphi Process

Concept	Importance	Difficulty
RT	Many environmental systems can be treated as reactors, into and out of which streams (e.g., water, air) flow, and within which chemical reactions may take place.	Students not knowing or comprehending the assumptions or simplifications inherent in each of the ideal reactor models (e.g., what does it mean to be “completely mixed” or “plug flow”).
	Treating an environmental system as an ideal reactor facilitates appropriate analysis or design, even though no real system behaves exactly like one of the three idealizations.
The MB equation	Powerful problem-solving technique, based on conservation of mass.	Students not knowing or comprehending that the equation is an expression of the principle of conservation of mass.
	Can be applied to almost any environmental engineering problem that can be represented as a system with inputs, outputs, sources, and/or sinks.	The meaning of each individual term in the equation. How to apply it to solve environmental engineering problems in different situations.
BOD	Needed to understand wastewater, treatment process design and performance, impact of pollutants on natural systems and use of water quality models, such as the Streeter-Phelps equation.Historical significance—first proposed by Royal Commission on Sewage Disposal for setting discharge standards in 1912.Familiarizes students with concepts in organic chemistry and microbiology and laboratory measurements used for water quality parameters.	Aggregate water quality parameter rather than measurement of an individual constituent.^Students have little knowledge of organic chemistry or microbiology needed to master this concept.Measures potential impact (DO depletion) of a pollution source rather than measurement of the pollutant itself.^Knowing and comprehending the operationally defined parameter and bioassay that may be difficult for students to understand if they have not taken laboratory courses.
UOM	Required for problem solving in MBs, environmental chemistry and biology, risk assessment, environmental engineering system design, and others.	^*Faculty assume students master this concept in high school and/or introductory college chemistry and physics classes, therefore do not devote time to this during FEE courses.
	Required for understanding regulations and standards for soil, water and air pollution.	Carrying out calculations when concentrations are expressed in equivalent units, such as when ammonia or nitrate concentrations are expressed “as N” or methane “as CO₂ equivalents.”
		Problems with UOMs hinder students' ability to solve more complex problems.
CEP	Deals with interactions of chemical compounds that may affect human health and the environment.	^*Students not knowing or comprehending fundamental relationships between reaction stoichiometry and equilibrium constant.
	Critical for aspects of environmental engineering related to water, wastewater and hazardous waste treatment, and issues in water sustainability and climate change.Enables students to establish relationships between diverse practices and observations.	Students fail to remember when molar concentrations need to be used.^*Equilibrium equation establishes relationship between equilibrium concentration of reactants and products, not initial concentration or change in concentration.
		Equilibrium equation must be combined with MB to establish relationship among constituent concentrations.

Difficulties related to the students not having the knowledge and skills from previous coursework or experiences that faculty expect that they ought to have when enrolled in an FEE course.

BOD, biochemical oxygen demand; CEP, chemical equilibrium and partitioning; DO, dissolved oxygen; FEE, fundamentals of environmental engineering; MB, mass balance; RT, reactor theory; UOM, Units of measure.

Specific difficulties listed in Table 3 can be grouped into the fundamental categories. They are lack of (1) basic knowledge and skills; (2) understanding of constitutive concepts; (3) methods of measurements; (4) knowledge of basic scientific laws; and (5) ability to transfer knowledge to new situations. For example, difficulties shown with an asterisk (*) in Table 3 were mainly related to students not having the knowledge and skills from previous coursework or experiences that faculty expect that they ought to have when enrolled in an FEE course. Difficulties 2 and 4, although each only appears once in Table 3, may be of importance in other concepts not measured in the FEECI. Another issue is that the concepts identified by the Delphi study are themselves constituted by other concepts, such as the ability to use and apply equations properly, being able to discern between what an instrument is measuring and what the measurement indicates, or understanding the equivalency of different measures of concentration.

Generation of FEECI questions and distractors

Distractors were developed for each question based on misconceptions that became apparent when students responded to stems in an open-ended format. Distractors were designed to distinguish between a correct conceptual understanding and the most commonly identified misconceptions. For example, Fig. 2 shows an example stem for the RT concept, along with the correct answer and three distractors developed from student responses. Additional details, including samples of student responses, are provided in Sengupta et al. (2013).

FIG. 2.

Example of a FEECI test question with stem, correct answer, and distractors.

Results of CI Testing

The 21-question beta version of the FEECI was administered to 320 students on six campuses. Table 4 presents the results of the statistical analysis of the FEECI questions.

Table 4.

Item Analysis Statistics Based on Classical Test Theory for Fundamentals of Environmental Engineering Concept Inventory Questions

Content	Item	Difficulty	Discrim.	Item-total correl	Item-concept correlation	Alpha-item-deleted	Alpha-deleted-rank
CEP	Item 1	0.188	0.098	0.028	0.041	0.584	2
CEP	Item 9	0.684	0.337	0.160	−0.034	0.569	9
CEP	Item 11	0.150	0.089	0.034	0.069	0.582	3
CEP	Item 14	0.319	0.111	−0.037	−0.039	0.596	1
UOM	Item 2	0.763	0.310	0.210	0.218	0.562	13
UOM	Item 6	0.603	0.308	0.137	0.135	0.572	6
UOM	Item 16	0.350	0.207	0.080	0.078	0.580	4
UOM	Item 21	0.438	0.350	0.198	0.196	0.563	12
MB	Item 3	0.613	0.488	0.279	0.007	0.551	18
MB	Item 7	0.378	0.321	0.177	0.073	0.566	10
MB	Item 12	0.316	0.340	0.182	0.132	0.565	11
MB	Item 20	0.225	0.212	0.105	0.150	0.575	5
BOD	Item 4	0.581	0.325	0.153	0.225	0.570	7
BOD	Item 10	0.472	0.580	0.351	0.351	0.539	21
BOD	Item 13	0.359	0.474	0.271	0.322	0.552	17
BOD	Item 17	0.569	0.554	0.305	0.317	0.546	19
BOD	Item 18	0.178	0.269	0.255	0.157	0.557	15
RT	Item 5	0.469	0.448	0.263	0.196	0.553	16
RT	Item 8	0.297	0.291	0.167	0.189	0.568	8
RT	Item 15	0.509	0.419	0.208	0.109	0.562	14
RT	Item 19	0.481	0.538	0.335	0.224	0.541	20
Mean		0.426	0.337	0.184

Discrim. = item discrimination from two extreme groups; Item-total correl = the correlation between the response to a question and the total score for the remaining questions; Item-concept correlation = the correlation between the response to a question and the total score for the remaining question in the same concept; Alpha-item-deleted = the test reliability when a question is removed from the test; Alpha-deleted-rank = the order that a question is deleted from the test based on alpha-item-deleted.

Test alpha = 0.577.

Item difficulty index

The item difficulty for the FEECI ranged from 0.150 (question 11; the most difficult question) to 0.763 (question 2; the easiest question). Most questions had moderate difficulty level (0.30 to 0.70). Five out of 21 questions were difficult (<0.30; questions 1, 8, 11, 18, and 20). Only question 2 was an easy question (>0.70). The average difficulty for the FEECI test was 0.426, indicating that on average 42.6% of students answered a question correctly. For the purpose of diagnosing misconceptions, the FEECI questions had appropriate item difficulty level.

Item discrimination indices

The cutoff values for discrimination from two extreme groups and item-total correlation were 0.20 and 0.10, respectively. Questions with indices less than the cutoff value were considered to be poorly discriminating questions. Questions 1, 11, 14, and 16 were identified as poorly discriminating questions based on multiple metrics. In contrast, questions 10, 17, and 19 had high discriminating power (discrimination >0.5 and item-total correlation >0.3). As for the concepts, the BOD and RT questions tended to have both higher discrimination and higher item-concept correlation, followed by the questions for MB and UOM. Except for the CEP, average discrimination values for all other concepts were well above the cutoff criteria.

Cronbach's alpha

Cronbach's alpha was calculated for the FEECI and was found to have a value of 0.58 with 95% confidence interval (0.51, 0.65). Because the alpha value was below 0.70, it suggests that the beta version of the FEECI does not have sufficient reliability to be usable to assess students' conceptual understanding of FEE. However, there are some questions about what alpha actually measures (Sijtsma, 2009; Tavakol and Dennick, 2011). Cronbach (1951) developed alpha to measure the internal consistency of a test where “internal consistency describes the extent to which all the items in a test measure the same concept or construct and hence it is connected to the inter-relatedness of the items within the test” (Tavakol and Dennick, 2011). Given that the FEECI by design measures multiple concepts across items, an alpha value of 0.58 does not mean that it should be discarded. Rather, it indicates that there may be some items that need to be revised or eliminated.

Alpha-if-item-deleted

In Table 4, there were four questions that if deleted would increase alpha (questions 1, 11, 14, and 16) where question 14 had a largest increment. In contrast, questions 10, 17, and 19 had the largest decrease in alpha. The last column in Table 4 presents item rank based on the alpha-item-deleted to indicate the item order for revision or elimination.

Distractor analysis

ICLs and distractor characteristic lines (DCLs) were used to further explore the quality of items and distractors (Fig. 3). For questions 10 and 17 (best questions), the item lines for the correct answers increase from low to high-performing groups, showing that they have high discriminating power: lower-performing groups tend to select an incorrect distractor while higher-performing groups tend to select the correct answer. However, ICLs for questions 14 and 1 (least discriminating) showed fluctuations when moving from the low to high scoring groups, indicating low discriminating power: higher-performing groups are not necessarily more likely to select the correct answer. DCLs also show discrimination power for each distractor. For question 17, distractor lines for options D and E showed better discriminating power than those for options B and C. Figure 3 suggests that options D and E for question 17 were well-operating distractors but options B and C might be implausible distractors.

FIG. 3.

Item trace lines and distractor trace lines for questions 10 and 17, which show good discrimination and questions 14 and 1, which show poor discrimination. Note that group 1 was the lowest performing group and group 5 was the highest performing group. The correct options are shown with an asterisk.

Discussion: Students' Conceptual Understanding of Fundamental Concepts

Reactor theory

The difficulty level for the four RT questions (percentage of students answering correctly) ranged from 30% to 51% on the four questions, with an average of 44%. Students generally performed well (48–51% correct) on two questions that posed hypothetical physical situations and asked students to choose which type of ideal reactor best described that situation. On questions that related more specifically to the aspects of completely mixed flow reactors and plug flow reactors, students did not perform as well (30–47% correct). An advantage of the FEECI is that it enables faculty to identify such weak areas and to plan corrective actions that can be administered either as part of the Fundamentals course, or as part of a subsequent course later in the curriculum. For instance, to help students develop better conceptual understanding regarding these different types of reactors, one possible solution would be to incorporate demonstrations or laboratory activities involving flow-through reactors, for example, tracer tests with non-reactive dyes. Such an exercise might be beyond the scope of a Fundamentals course, but administering the FEECI provides a diagnostic assessment of what concepts must be given additional attention at some point in the curriculum.

The MB equation

On four questions related to the MB, students scored 22–61% correctly, with an average of 38% correct. Compared to the other four topics, the questions related to the MB appear to have moderate difficulty, moderate ability to discriminate between high- and low-scoring students, and moderate correlation with overall student performance. MB questions were able to reveal important student misconceptions. When asked to select a material balance equation that described a hypothetical situation, 40% of students selected distractors that were not actually MB equations, but rather recognizable formulae that they may have seen during their FEE course. This suggests that many students may not have a fundamental understanding of the balance equation, its meaning, or how the balance equation can be used to describe a particular physical problem; instead, these students may try to solve MB problems by “finding the right formula.” Another pattern of misconception was that students do not properly understand that the balance equation must be applied to mass of a species or constituent, and that each term in the equation must apply to the same species or constituent, as stated in the “Difficulty” column in Table 3.

As was observed previously, identification of such difficulties via administration of the FEECI allows faculty to plan corrective actions. For instance, laboratory exercises could be easily designed that involve mass input to a system and mass output from the system; asking students to apply the balance equation to a simple physical system might help students develop the proper conceptual understanding of what the balance equation “means.” Another advantage of the FEECI is that it could be used to diagnose whether such intervention activities do, in fact, lead to enhanced conceptual understanding by the students.

Biochemical oxygen demand

In general, students did well on the BOD concept questions, with 43% answering correctly on average. On one of the five BOD questions, only 18% of students answered correctly; on the other four questions, the difficulty score ranged from 0.36 to 0.58, indicating good conceptual understanding by the students. Furthermore, BOD questions tended to score relatively high on the discrimination between high- and low-performing groups, on the correlation between individual items and the overall exam, and on the correlation between individual items and the rest of the concept group. This indicates that the BOD questions are probably reliable indicators of conceptual understanding. An analysis of the answers to the FEECI questions on BOD shows some of the common misconceptions related to this concept, which included the ideas that BOD is a measure of the dissolved oxygen (DO) concentration in a water sample, that the concentration of soluble organic substrate increases over time during a BOD test, and that wastewater containing BOD “contains bacteria that enter the water and consume oxygen.” Only 18% of students could determine the ultimate BOD concentration given a simple sketch of DO concentration versus time during a BOD test. The results point to a need to incorporate more laboratory activities and demonstrations of the BOD test into FEE classes. On the positive side, few students chose distractors that suggested that BOD provided a measure of the hazardous or toxic nature of wastewater effluents.

Units of measure

The average percentage of correct scores on the four UOM questions was 54%, the highest of any of the five concepts. However, the discrimination index and the correlation with overall results were relatively low. This probably indicates that, because most students were able to perform so well on this set of questions, the questions were not as effective at distinguishing between high- and low-scoring students. Despite the overall high scores on the UOM questions, some misconceptions are apparent from the incorrect student responses. When explicitly given concentrations of ammonium and nitrate and asked what calculation would be needed to express those concentrations “as nitrogen,” 76% of students selected the correct answer. However, when students were given a more complex stem in which ammonium is converted to nitrate, and were then asked for the conversion necessary to express the nitrate concentration “as nitrate-nitrogen (NO₃⁻-N),” only 35% answered correctly. This contrast may be indicative of students memorizing a procedure (i.e., how to convert given concentrations of NH₄⁺ and NO₃⁻ to concentrations “as nitrogen”) rather than developing a true conceptual understanding of this system of units. The results indicate that faculty should not assume that students have gained required skills for addressing UOMs in prerequisite courses, such as chemistry and physics, particularly when dealing with units particular to environmental engineering.

Chemical equilibrium and partitioning

Three of the four questions on CEP were found very difficult by students, with difficulty scores ranging from 0.15 to 0.32. These three questions all had discrimination indices below 0.10 and correlation values below 0.10. This suggests that, as a whole, the CEP questions were “too difficult” and therefore were not able to distinguish between high- and low-scoring students. Of these three “too difficult” questions, two will be revised for future versions of the FEECI, and one will be replaced altogether as further analysis suggested that the question was conflating multiple concepts, and therefore not truly testing students' conceptual understanding of chemical equilibrium. Despite that, an analysis of the incorrect answers selected by students is still able to reveal some common misconceptions related to this topic. In particular, one of the most common misconceptions is that the solubility product (K_sp) for precipitation-dissolution equilibrium quantifies a relationship for the change in the concentration of the ions involved, rather than a relationship between the equilibrium concentrations.

As with the other concepts discussed above, identification of difficulties in CEP via administration of the FEECI allows faculty to plan corrective actions. For instance, Visual MINTEQ, a freeware chemical equilibrium modeling software can be incorporated into an FEE course. Students can be presented with scenarios of aqueous chemical systems and asked to apply the governing equations in Visual MINTEQ, and check the results generated by the software. This exercise might help students develop the proper conceptual understanding of relationship between species under equilibrium conditions. Another advantage of the FEECI is that it could be used to diagnose whether such intervention activities do, in fact, lead to enhanced conceptual understanding by the students.

Implications and Limitations of Analysis

Although FEE courses are taken by students in many engineering disciplines and even nonengineering majors, and although prerequisites for these courses vary widely, the FEECI was able to distill this wide spectrum to some universal concepts that can be tested in a 1-h instrument. Beta testing provided the authors a list of misconceptions commonly made by students. Nationwide online testing will provide more granular data, but beta testing has already set into motion a discussion about how to remedy this situation: can new technologies (e.g., computer animation, visualization software, chemical equilibrium software, etc.) help in this effort? Should the laboratory experiments that usually accompany an FEE lecture be modified? Would tighter integration of some concepts (e.g., BOD with chemical equilibrium) make the subject more holistic for the student?

Summary

FEE is a foundational course needed for engineers and scientists who hold the future of our national and global environmental and public-health infrastructure. An analysis of syllabi from representative U.S. universities and colleges was carried out to identify common topics in FEE. A Delphi process with both faculty experts and students was used to identify concepts that are both important and difficult to understand for FEE students. A beta version of the CI was developed through detailed question writing sessions and feedback from faculty at a major environmental engineering education conference. The beta test was given to FEE students at six universities and a detailed psychometric analysis of the results was undertaken. Although the psychometric analysis revealed that several of the FEECI questions need to be reevaluated, the results revealed many misconceptions related to concepts of RT, the MB equation, BOD, UOM, and CEP. By understanding these misconceptions, environmental science and engineering faculty can improve students' conceptual understanding of FEE concepts. FEECI will be made available to interested FEE faculty members provided that the instrument is not compromised.

Footnotes

Acknowledgments

This material is based upon work supported by the National Science Foundation, under grant number DUE 1044063. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. The authors would like to thank the following faculty members for their contributions to this research: Caitlyn Butler, Eileen Cashman, Amy Chan Hilton, Ryan Dupont, Elizabeth Eschenbach, Joseph Flora, Kristen Jellison, Laurie McNeill, Dan Oerther, Annalisa Onnis-Hayden, Chul Park, Jason Ren, Maya Trotz, and Yeomin Yoon. Andrea Stone and Dilek Özalp assisted with data analysis.

Author Disclosure Statement

No competing financial interests exist.

References

Adams

W.K.

, and Wieman

C.E.

(2011). Development and validation of instruments to measure learning of expert-like thinking. Int. J. Sci. Educ., 33, 1289.

Allen

(2006). The statistics concept inventory: Development and analysis of a cognitive assessment instrument in statistics [dissertation]. University of Oklahoma.

Allen

, Stone

, Rhoads

T.R.

, and Murphy

T.J.

(2004). The statistics concepts inventory: Developing a valid and reliable instrument. In Proceedings of the 2004 American Society for Engineering Education Annual Conference and Exposition (pp. 1–15).

Bardar

E.M.

, Prather

E.E.

, Brecher

, and Slater

T.F.

(2007). Development and validation of the light and spectroscopy concept inventory. Astronomy Educ. Rev., 5, 103.

Bransford

J.D.

, Brown

A.L.

, and Cocking

R.R.

(2000). How People Learn: Brain, Mind, Experience, and School (expanded edition). Washington, DC: National Academy Press.

Cronbach

L.J.

(1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297.

Dalkey

, and Helmer

(1963). An experimental application of the Delphi method to the use of experts. Manage. Sci., 9, 458.

Garvin-Doxas

, Klymkowsky

, and Elrod

(2007). Building, using, and maximizing the impact of concept inventories in the biological sciences: Report on a National Science Foundation–sponsored conference on the construction of concept inventories in the biological sciences. CBE Life Sci. Educ., 6, 277.

Goldman

, Gross

, Heeren

, Herman

, Kaczmarczyk

, Loui

M.C.

, and Zilles

(2008). Identifying important and difficult concepts in introductory computing courses using a Delphi process. ACM SIGCSE Bull. 40, 256.

10.

Gray

G.L.

, Costanzo

, Evans

, Cornwell

, Self

, and Lane

J.L.

(2005). The dynamics concept inventory assessment test: A progress report and some results. In American Society for Engineering Education Annual Conference & Exposition, Portland, OR.

11.

Gray

G.L.

, Evans

, Cornwell

, Costanzo

, and Self

(2003). Toward a nationwide dynamics concept inventory assessment test. Proceedings of the 2003 American Society for Engineering Education Annual Conference and Exposition. 8, 1.

12.

Hake

R.R.

(1998). Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses. Am. J. Phys., 66, 64.

13.

Herman

G.L.

, Loui

M.C.

, and Zilles

(2011). Students' misconceptions about medium-scale integrated circuits. IEEE Transact. Educ., 54, 637.

14.

Hestenes

, and Wells

(1992). A mechanics baseline test. Phys. Teach., 30, 159.

15.

Hestenes

, Wells

, and Swackhamer

(1992). Force concept inventory. Phys. Teach., 30, 141.

16.

Klymkowsky

M.W.

, Underwood

S.M.

, and Garvin-Doxas

R.K.

(2010). Biological Concepts Instrument (BCI): A diagnostic tool for revealing student thinking. arXiv Preprint arXiv:1012.4501.

17.

Krause

, Birk

, Bauer

, Jenkins

, and Pavelich

M.J.

(2004). Development, testing, and application of a chemistry concept inventory. In 34th Annual Frontiers in Education: Expanding Educational Opportunities Through Partnerships and Distance Learning - Conference Proceedings, FIE, Savannah, GA.

18.

Krause

, Decker

J.C.

, and Griffin

. (2003). Using a materials concept inventory to assess conceptual gain in introductory materials engineering courses. In Frontiers in Education, 2003. FIE 2003 33rd Annual (Vol. 1, pp. T3D-7). IEEE.

19.

Litwin

M.S.

(1995). How to Measure Survey Reliability and Validity (Vol. 7). Thousand Oaks, CA: Sage Publications.

20.

Litzinger

, Lattuca

L.R.

, Hadgraft

, and Newstetter

(2011). Engineering education and the development of expertise. J. Eng. Educ., 100, 123.

21.

Martin

, Mitchell

, and Newell

. (2003). Development of a concept inventory for fluid mechanics. In Frontiers in Education, 2003. FIE 2003 33rd Annual (Vol. 1, pp. T3D-23). IEEE.

22.

Martin

J.K.

, Mitchell

, and Newell

(2004). Work in progress: Analysis of reliability of the Fluid Mechanics Concept Inventory. In Frontiers in Education, 2004. FIE 2004. 34th Annual (pp. F1F-3). IEEE.

23.

Midkiff

K.C.

, Litzinger

T.A.

, and Evans

D.L.

(2001). Development of engineering thermodynamics concept inventory instruments. In Frontiers in Education Conference, 2001. 31st Annual (Vol. 2, pp. F2A-F23). IEEE.

24.

Notaros

B.M.

(2002). Concept inventory assessment instruments for electromagnetics education. In Antennas and Propagation Society International Symposium, 2002. IEEE (Vol. 1, pp. 684–687). IEEE.

25.

Nunnally

J.C.

(1967). Psychometric Theory. New York: McGraw-Hill.

26.

Pill

(1971). The Delphi method: Substance, context, a critique and an annotated bibliography. Socio Econ. Plann. Sci., 5, 57.

27.

Revelle

(2014). Psych: Procedures for Personality and Psychological Research. Northwestern University, Evanston. R package version 1. Available at: https://cran.r-project.org/package=psych

28.

Rhoads

T.R.

, and Roedel

R.J.

(1999). The wave concept inventory-a cognitive instrument based on Bloom's taxonomy. In Frontiers in Education Conference, 1999. FIE’99. 29th Annual (Vol. 3, pp. 13C1-14). IEEE.

29.

Richardson

, Steif

, Morgan

, and Dantzler

(2003). Development of a concept inventory for strength of materials. In Frontiers in Education, 2003. FIE 2003 33rd Annual (Vol. 1, pp. T3D-29). IEEE.

30.

Roedel

R.J.

, El-Ghazaly

, Rhoads

T.R.

, and El-Sharawy

(1998). The wave concepts inventory-an assessment tool for courses in electromagnetic engineering. In Frontiers in Education Conference, 1998. FIE’98. 28th Annual (Vol. 2, pp. 647–653). IEEE.

31.

Sengupta

, Cunningham

J.A.

, Ergas

S.J.

, Goel

R.K.

, Ozalp

, and Reed-Rhoads

(2013). Development of a Concept Inventory for Introductory Environmental Engineering Courses. In 120th American Society for Engineering Education Annual Conference & Exposition, Atlanta.

32.

Sijtsma

(2009). On the use, the misuse, and the very limited usefulness of Cronbach's alpha. Psychometrika, 74, 107.

33.

Simoni

M.F.

, Herniter

M.E.

, and Ferguson

B.A.

(2004). Concepts to questions: Creating an electronics concept inventory exam. In Proceedings of the 2004 American Society for Engineering Education Annual Conference & Exposition, Salt Lake City, UT.

34.

Smith

J.I.

, and Tanner

(2010). The problem of revealing how students think: Concept inventories and beyond. CBE Life Sci. Educ., 9, 1.

35.

Smith

M.K.

, Wood

W.B.

, and Knight

J.K.

(2008). The genetics concept assessment: A new concept inventory for gauging student understanding of genetics. CBE Life Sci. Educ., 7, 422.

36.

Steif

P.S.

, and Dantzler

J. A.

(2005). A statics concept inventory: Development and psychometric analysis. J. Eng. Educ., 94, 363.

37.

Stone

(2006). A Psychometric Analysis of the Statistics Concept Inventory (Doctoral dissertation, University of Oklahoma).

38.

Stone

, Allen

, Rhoads

T.R.

, Murphy

T.J.

, Shehab

R.L.

, and Saha

(2003). The statistics concept inventory: A pilot study. In Frontiers in Education, 2003. FIE 2003 33rd Annual (Vol. 1, pp. T3D-1). IEEE.

39.

Streveler

R.A.

, Olds

B.M.

, Miller

R.L.

, and Nelson

M.A.

(2003). Using a Delphi study to identify the most difficult concepts for students to master in thermal and transport science. In Proceedings of the Annual Conference of the American Society for Engineering Education, Nashville, TN.

40.

Tavakol

, and Dennick

(2011). Making sense of Cronbach's alpha. Int. J. Med. Educ., 2, 53.

41.

Wage

K.E.

, and Buck

J.R.

(2001). Development of the Signals and Systems Concept Inventory (SSCI) assessment instrument. In Frontiers in Education Conference (Vol. 2, pp. F2A-2).

42.

Yi-Hsin

, and Li

(2015). IA_CTT: A SAS® Macro for conducting item analysis based on classical test theory. SouthEast SAS Users Group 2015 Proceedings. Cary, NC: SAS Institute.