Abstract
While universities routinely use student data to monitor and predict student performance, there has been limited engagement with student and staff views, social and ethical issues, policy development, and ethical guidance. We reviewed peer-reviewed and grey-literature articles of 2007 to 2018 describing the perspectives of staff and students in tertiary education on the use of student-generated data in data analytics, including learning analytics. We used an ethics framework to categorize the findings. There was considerable variation but generally low awareness and understanding amongst students and staff about the nature and extent of data collection, data analytics, and use of predictive analytics. Staff and students identified potential benefits but also expressed concerns about misinterpretation of data, constant surveillance, poor transparency, inadequate support, and potential to impede active learning. This review supports the contention that consideration of ethical issues has failed to keep pace with the development of predictive analytics in the tertiary sector.
“Big data”—a catchall phrase that encompasses the collection, linkage, and analysis of very large data sets using automated processes such as machine learning—provide a powerful vehicle through which to explore elements of human behavior, evaluate services, and answer research questions. However, the sheer volume of sensitive information that can be linked and attributed to individuals can also lead to discrimination, loss of autonomy, infringements on privacy, reputational damage or embarrassment, identity fraud, and commercial misuse of data. These concerns are prominent in the consciousness of publics internationally (Ipsos MORI, 2016; Polonetsky & Tene, 2014; Presser et al., 2015, Productivity Commission, 2017).
A number of review papers have recently been published that identify, describe, and categorize the ethical issues that arise in big data collection and use. Some reviews focused on ethical issues in data science generally (Richards & King 2014; Salz & Dewar, 2019). Others addressed ethical issues within specific fields, particularly health and biomedicine (Lipworth et al., 2017; Xafis et al., 2019). For example, Lipworth et al. (2017) described the growing literature (Ioannidis, 2013; Larson, 2013; Metcalf & Crawford, 2016) on the implications of big data in health research and practice for “consent, privacy, data sharing, return of results, benefit sharing, ownership, trust and custodianship” (Lipworth et al., 2017, p. 491). In this same field, scholars have also explored the extent to which data subjects understand and accept how information about them will be collected and used (Berry et al., 2012; Hill et al., 2013; Metcalfe et al., 2008; Papoutsi et al., 2015; Riordan et al., 2015) and the nature of the relationship between data subjects, custodians, and users (Kaplan, 2016; Vayena & Blasimme, 2017).
In other fields, understanding of the social implications of and stakeholder views about the use of big data is less well developed. One field to which relatively little attention has been paid is the tertiary education sector. This is despite rapid growth over the past decade in the collection of data about students in this sector. The first open-source learning management systems (LMS) were released in 2002, but the first cloud-based open-source LMS was not released until 2008 (Oxagile, 2016). Over the next few years, improvements in LMS and access to cloud-based technology allowed courses to be run without classrooms and without the need for a supporting mainframe. Universities now collect a wide array of data about students: demographic data (e.g., age, gender, location, place of birth, residency, nationality), financial data (e.g., employment status, income status, support grants), prior academic record data (e.g., grade point average, standardized test scores or national exam results, record of school subjects), performance data (e.g., course grades within and across courses, academic transcript data), behavioral data from an LMS (e.g., interaction data with online sources and forums), assessment content (e.g., submitted material, exams, plagiarism monitoring data), historical progression data (e.g., information about students after graduation), library data (interaction with library facilities), and regulatory compliance data (e.g., information that demonstrates that an international student is fulfilling the requirements of their visa; Adejo & Connolly, 2017; Arnold & Pistilli, 2012; Jantti, 2016, Shacklock, 2016). In addition, universities hold some health data, if only to underpin disability waivers and academic support, and some higher education institutions are collecting data from other sources such as face recognition data (NEC Media Release, 2019), Wi-Fi tracking of students across campus (The World Today, 2016), and digital footprints of prospective students (Selingo, 2017).
As the volume and variety of student data held by universities have grown, so has the use of these data. Universities have begun to use routinely collected data to monitor and predict student activity and performance (Colvin et al., 2015; Culnan & Carlin, 2009; Heath, 2014; Newland & Trueman, 2017; Waller & Fawcett, 2013; van Barneveld et al., 2012) with specialist societies (see https://www.solaresearch.org/about/) and journals (Gasevic et al., 2014) established over the past decade to support and extend these activities. In particular, learning analytics are being used internationally in quality assurance and quality improvement, to enable and support student retention and success initiatives, and to target students thought to be at risk of poor academic progress based on known demographic or behavioral characteristics (Sclater et al., 2016). The use of learning analytics by instructors to shape adaptive and personalized learning is a rapidly emerging area (Sclater et al., 2016).
Awareness of the potential ethical issues associated with the use of big data and predictive data analytics in the tertiary education sector is developing, albeit in a patchy way (Adejo & Connolly, 2017; Jones, 2016; Pardo & Siemens, 2014; Roberts, Chang, & Gibson, 2017; Rubel & Jones, 2016; Scholes, 2016; West et al., 2016; Wintrup, 2017). There are clear benefits for students and staff but also a range of potential harms and risks (Daniel, 2019). Roberts, Chang, and Gibson (2017) noted that even if concerns about data security, privacy, and informed consent are overcome, academic staff and university administrators must use the data collected in an ethical manner to improve teaching and support student success or there is a risk such analyses could harm rather than help students. Ethical use of big data by tertiary institutions will require exemplary data governance arrangements, enabling systems and organizational structures, and clear policy guidance for faculty and staff.
To date, student and staff input into data governance and policy has been limited. We know very little about what students and staff think about the use of big data and learning analytics, including whether they are concerned about issues such as privacy and consent in this context. Such views are important for a range of reasons. First, historically higher education has enjoyed a high level of trust in amongst publics internationally and between students, institutions, and teachers and has only recently seen this confidence challenged (Baert & Shipman, 2005; Enders 2013; Woelert & Yates, 2015). If data usage, linkage, and analytics occur without permission and usage results in harm, there is potential for loss of trust in tertiary institutions. A number of recent examples of data breaches in other sectors highlighted the precarious nature of trust in public institutions (Ipsos MORI, 2016; Sterckx et al., 2016). At a time when trust in public institutions is falling, it is important that universities work hard to retain trust with staff, students, and the public more generally.
Second, understanding student and staff views about the uses of big data in the tertiary sector can contribute to better design and use of learning analytics. For example, students who are concerned about privacy violations may need to be reassured before they will fully engage with LMS, and additional safeguards may need to be put in place. Staff may also be more willing to engage if concerns about the use of learning analytics for performance management are addressed explicitly. The academic literature on this topic that does exist is scattered across journals in fields as disparate as higher education, computing, ethics, and the social sciences. It is not surprising, then, that there have been no papers that have sought to systematically collect and analyze the academic literature concerning staff and student views on the use of big data in the tertiary education sector.
Drachsler and Greller (2012) characterized the primary stakeholders in learning analytics as data clients (e.g., teachers) and data subjects (e.g., students): Both sets of views are important in understanding the ethical issues associated with data analytics in a tertiary education setting. Accordingly, we sought to undertake a systematic scoping review of the literature to address the questions: What are the views of students and staff in the university sector on the use of student information in the tertiary sector? What ethical issues do these views highlight?
Method
We conducted a scoping review of empirical studies and public reports to identify articles describing the views and perspectives of staff and students in the university sector on the use of student generated data through data analytics, including learning analytics. Although most papers conceptualized staff as teaching staff, the role of teaching staff often includes administrative and governance functions. Therefore, we also included those papers that canvassed the views of administrators, information technology professionals, and learning designers. We focused particularly on staff and student understanding of the ethical issues that such usage raises.
Scoping reviews are useful in mapping the literature for a particular topic or field in terms of its nature, features, and volume while drawing on a diverse range of sources and study designs (Arksey & O’Malley, 2005). Scoping reviews aim to be as comprehensive as possible and therefore may draw on both peer-reviewed and grey literature, the websites of relevant organizations, and the reference lists of identified articles.
A logic grid was developed for the study (Table 1) by three of the authors (XF, JS, and ABM). We used a standard population/problem-exposure-outcome/theme grid (Munn et al., 2018), where the population was staff and students in higher education settings, the “exposure” concerned engagement with big data, and the thematic areas we wished to explore were ethical issues and views and perspectives. These terms and relevant synonyms were included in the searches.
Logic grid of terms describing population, exposure, and outcomes with respect to the research question
Documents were sourced for the years 2007 to 2018, with this window of time reflecting the period during which data analytics has developed, and also the rapidly changing nature of this area. Developments in technology mean that articles published outside this window have little relevance to present technology use within universities. Databases, Scopus, ProQuest (ERIC), Education Research Complete, Web of Science, Academic Research Complete, Informit (multiple databases), and PsyclNFO (Ovid) were selected for their coverage of education- or technology-oriented research, as well as ethical, social, and legal perspectives.
For the database searches, we initially used the logic grid described above. These searches picked up only a small number of papers, and we could see in the reference lists of these articles that the list was not exhaustive. We therefore moved to an iterative approach, in which we searched the reference lists of all included papers and used this to refine the focus of the search and the search terms. As a consequence, we added the phrases (student*) AND (learning analytic*) to the search string and additional terms to describe academic staff. Our final search strategy used the search string: ((universit* OR professor* OR expert* OR student* OR academic* OR staff OR lecturer*) AND (learning analytics OR Big Data OR data mining OR data analytics OR data security) AND (view* OR concern* OR perspective OR opinion* OR feedback)) for the period January 1, 2007, to June 15, 2018, the date on which we finalized the logic grid.
Initial searches also indicated that a large proportion of the published work, including peer-reviewed work, emanating from the information technology sector is available in conference proceedings and reports. These articles are not included in the common databases used for academic peer-reviewed literature. Therefore, we also searched Google Scholar, using the search string (learning analytics and ethic*) and looked at the first 1,000 hits. We supplemented this with searches of specific websites, namely, Jisc, a U.K. provider of digital solutions (historically, JISC stood for Joint Information Systems Committee), and Educause, a U.S. association, both of which are not-for-profit organizations working to support the use of digital technologies in higher education.
The reference lists of identified relevant papers and all papers referenced in this article were searched for further relevant articles.
Inclusion/Exclusion Criteria
Inclusion and exclusion criteria were developed (Table 2) by three researchers (JS, XF, and KSG) for the research question and used to select relevant articles.
Criteria to select material for analysis
Selection of Studies
Two authors (XF and KSG) conducted the initial searches, screened titles and abstracts against the inclusion criteria, and retrieved full-text articles meeting the criteria or where it was uncertain. This search was overseen by a third author (JS) who also acted as a third reviewer for articles in the “uncertain” category. As noted above, this search strategy identified only seven relevant articles, and it was clear from searching the reference lists of these articles that the list was not exhaustive.
The study selection process shown in Figure 1 is for the final search with additional search terms as described above. This search was conducted by one author (JS) in discussion with authors ABM and RT to ensure complete collection of relevant papers. The articles were screened based on the inclusion criteria of document type and relevance of title and abstract. Duplicates, non-English articles, and articles published prior to 2007 were excluded as were papers that described the technical, legal or ethical aspects of the use of data in the tertiary sector without canvassing student and staff views or that described public perspectives in other sectors such as health care or other levels of education. Search results were combined into a single EndNote X7 database.

Study flow diagram.
Conceptual Framework and Analytical Approach
We used the conceptual framework offered by Beauchamp and Childress (2008) in their text Principles of Biomedical Ethics to support our analysis. We arrived at using this framework in the following way. First, we searched for theoretical accounts of ethical issues in big data in higher education. In the absence of any developed theoretical account that was directly relevant to higher education, we took as the starting point for our analysis a number of recent accounts of ethical issues in big data from a range of fields. We reviewed papers on ethics in big data in education research (Daniel, 2019), big health data (Ienca et al., 2018; Mittelstadt & Floridi, 2016; Xafis et al., 2019), big data research (Lipworth et al., 2017; Richards & King, 2014; Ioannidis, 2013), and data science in information technology (Salz & Dewar, 2019). These papers all highlighted a very similar array of ethical issues, related chiefly to respect for persons (including privacy, confidentiality, identity, consent, human rights, and data ownership) and harms and benefits (including data misuse, vulnerabilities, group harms, discrimination, and safety).
It is not surprising that these papers cohered around a common set of ethical issues because they all, explicitly or implicitly, referenced a conceptual framework that has underpinned most codes of research ethics in the Western tradition since the 1980s (Canadian Institutes of Health Research et al., 2014; Economic and Social Research Council, 2015; National Health and Medical Research Council, 2018). That framework has its origins in the U.S. National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research that spawned the Belmont Report (National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, 1978), which, in turn, aligns closely with Beauchamp and Childress’s (2008) Principles of Biomedical Ethics. Despite stringent critique, almost since its inception (Arras 1991; Clouser & Gert 1990; Toulmin 1981), the principles articulated in Beauchamp and Childress’s book, and the ethical issues they foreground, continue to be the predominant framework in research ethics.
The Beauchamp and Childress framework is built around four principles: respect for autonomy, beneficence, nonmaleficence, and justice (Beauchamp & Childress, 2008). These four moral principles are intended to provide “an integrated framework of principles through which diverse moral problems can be handled” (Beauchamp & Childress, 2008, p. viii). Although how the principles relate to each other and are to be applied has varied across the wide range of settings in which they are now used, in general they are guides for decisions about whether actions are prohibited, required or permitted (Solomon, 1978). Contemporary accounts of these principles tend to focus on balancing principles against each other in moral judgment, which requires attention to both the meaning of each principle and its relative weight in a particular situation. In the paragraphs that follow, we provide a very brief outline of how the four principles are defined and related to each other; relevant parts of the results begin with a brief discussion of each principle’s relevance to big data in higher education.
The principle of respect for autonomy as an action guide is a product of the modernist era. The concept, and debates about its role, has dominated medical ethics (Gómez-Vírseda et al., 2019), but it also appears in fields as diverse as education theory and policy (Gutmann, 1987), political and legal philosophy (Feinberg, 1986), and research practice (Kirchhoffer & Richards, 2019). The principle of respect for autonomy refers to the idea that a person should be allowed to be self-determining, to “live one’s life according to reasons and motives that are taken as one’s own and not the product of manipulative or distorting external forces” (Christman, 2018). Respect for autonomy is, therefore, closely related to informed consent, since people ought not to be coerced into certain decisions by external forces and without understanding and accepting the reasons for those decisions; to privacy, confidentiality, and data ownership, since self-determination includes sovereignty over information pertaining to oneself; and to identity, since the notion of control over one’s being is central to autonomy.
If the principle of respect for autonomy has been ascendant in ethics over the past 40 years, then the principle of beneficence has been on the other end of the see-saw. The principle that we are morally required to act for the benefit of others is clearly enunciated in the professions and in civic society. However, particularly when it comes to actions concerning competent adults who may not wish us to help them, there is considerable debate about getting the balance right between doing good for others and allowing people to make their own choices (Pellegrino & Thomasma, 1988). These debates have contributed to greater clarity about the scope of beneficence, through exploring questions such as how much are we required to do for others and whether there is a general duty of beneficence, or only role-related duties (Engelhardt, 1986).
The principle of nonmaleficence requires that practices do not cause injury or harm. In Beauchamp and Childress’s (2008) work, as in the work of other influential theorists (Hart, 1961; Ross, 1930), beneficence was conceptually distinguished from nonmaleficence. However, in research ethics the distinction is often blurred, with the focus also including getting the balance between risks and benefits right (Beauchamp, 2019). The range of risks to be considered is broad, including physical and psychological injury, reputational harm, discrimination, and safety, and both individual and group harms need to be considered.
The principle of justice requires that we are to give people what they are due. Distributive justice, which was the principal focus of Beauchamp and Childress’s (2008) chapter on justice, focuses on ensuring that people who are equal have equal access to goods and services, allowing inequalities between people to be addressed with different levels of access. The criteria for distributing these goods and services (the material principles of justice) therefore depend on characteristics of persons, such as their level of need, effort, or contribution. The inclusion of justice in an otherwise more clinically oriented Principles of Biomedical Ethics reflected its key role for research ethics in the Belmont Report. There, and in subsequent writing about justice in research practices, the principle addresses concerns about the overrepresentation of vulnerable populations and is a vehicle for ensuring that the benefits and burdens of research are distributed fairly and that participants, particularly those who are vulnerable, are not exploited (Pieper & Thomson, 2013).
We thematically analyzed content in the papers and then mapped our analysis against the four principles and their accompanying ethical issues as noted above, taking account of the different framing of these principles that a focus on big data creates. We also noted ethical issues that have been highlighted in recent review papers but were not addressed in the papers included in this scoping review. Definitions of “big data” including “learning analytics” were extracted, where available, from the included papers (see Supplemental Table S1 in the online version of the journal).
Results
We identified 30 papers, describing 26 separate studies, which discussed student (n = 22 papers) and staff (n = 11 papers) perspectives on the use of student data in universities. Four studies, reported in seven papers, examined both staff and student responses. Nineteen papers discussed student views only, and eight papers focused only on staff views. Papers on staff perspectives reported on the views of teaching staff (5 papers, although the type of “teaching staff” was not always clear); mixed academic groups, including managers and teaching staff (3 papers); and managers (2 papers, although seniority was unclear in one paper and only 28% of managers were at Head of Department level or above in the other), with academic research and technology professionals’ views reported as part of one paper and two papers providing no indication of staff type (see Supplemental Table S2 in the online version of the journal). Four papers included the views of nonteaching staff, for example, information technology professionals, learning designers, researchers, administrators, senior managers, and directors of teaching and learning (Bischel, 2012; Drachsler & Greller, 2012; Howell et al., 2018; West et al., 2016). Lack of clarity in most papers about the specific category of staff included meant that we could not analyze staff views by type of staff.
The findings in this review draw on descriptions of data analytics and/or the views of staff and students in tertiary settings in 10 different countries. All but one of these countries has a Western cultural tradition (as defined by Huntington, 1991): United Kingdom, Canada, United States, Australia, New Zealand, Germany, France, Austria, and Netherlands. Two studies set in Malaysia were identified (Wook et al., 2015, 2017). The papers described the views of students at Australian, German and Austrian, German and French, and Malaysian higher education institutions and the perspectives of staff from the United Kingdom, United States, Australia, and New Zealand, and internationally.
Most papers (n = 24) focused on student and staff knowledge, preferences, and concerns about the use of learning analytics in universities (Adejo & Connolly, 2017; Arnold & Pistilli, 2012; Arnold & Sclater, 2017; Atif et al., 2015; Bischel, 2012; Brooker et al.; Corrin & De Barba, 2014; Corrin et al., 2013; Dobozy, 2017; Drachsler & Greller, 2012; Havergal, 2016; Heath & Fulcher, 2017; Heath & Leinonen, 2016; Ifenthaler & Schumacher, 2016; Jones, 2016; Kammer, 2015; Khan, 2017; Knight et al., 2016; Maghrabi, 2014; Mahroeian et al., 2017; May et al., 2017; Newman & Beetham, 2017; Reimers & Neovesky, 2015; Wook et al., 2015, 2017). Although a number of papers (n = 22) acknowledged these concerns as ethical issues, only a small number (n = 8) discussed the issues in depth and/or explicitly used an ethical framework to analyze them.
In the sections that follow we describe the content of the papers in this review under four headings. First, we provide an account of how “big data” are conceptualized in the higher education sector, noting that most papers align the term with learning analytics. In the next two sections, we describe awareness of and preferences among staff and students in relation to the use of student data in higher education. The final, and most substantial, section focuses on ethical issues.
We have included staff and student perspectives together in this article because the ethical issues addressed by each group, their views about those issues, and the ways in which they described the issues are very similar (see Table 3). This was not obvious within any one paper, even if that paper reported both student and staff perspectives. However, focusing on issues identified across papers, students and staff held quite similar views about ethical issues in the use of big data in tertiary education. The similarities were strongest in those domains that focused on the benefits of big data with respect to the student experience (not surprisingly) and weakest in relation to risks for staff, a topic on which students were generally silent. Concerns about privacy, confidentiality, identity, and consent were also identified by both groups and discussed in similar ways.
Summary of ethical issues
Defining “Big Data” in the Tertiary Sector
Most papers in this review conceptualized the use of “big data” in the tertiary sector in relation to learning analytics. Ten articles (Adejo & Connolly, 2017; Corrin & De Barba, 2014; Corrin et al., 2013; Howell et al., 2018; Mahroeian et al., 2017; Reimers & Neovesky, 2015; Roberts et al., 2016; Roberts, Howell, & Seaman, 2017; Slade & Prinsloo, 2014; West et al., 2016) drew on a definition of learning analytics from the call for the First Conference on Learning Analytics and Knowledge and the work of George Siemens (2011), namely, “the measurement, collection, analysis and reporting of big data about learners and their contexts, for the purposes of understanding and optimizing learning and the environments in which it occurs” (p. 1). Other definitions emphasized that analysis took place in real time (Arnold & Pistilli, 2012; Ifenthaler & Schumacher, 2016), that learning analytics provided the predictive power (Arnold & Pistilli, 2012; Kammer, 2015) to identify at-risk students, that taking action in response to this predictive power was important (Jones, 2016), and that learning analytics “makes use of pre-existing machine readable data” (West et al., 2016, p. 904), permits the development of “useful and meaningful patterns from large data sets” (Khan, 2017, p. 267), and uses techniques capable of handling big data which could not be processed manually (West et al., 2016).
The papers that did not focus specifically on learning analytics were more diverse in orientation. One article (Maghrabi, 2014) focused on student perceptions of the threats to data security in the cloud, and a second described student views on data storage and privacy protection (Newman & Beetham, 2017). Brooker et al. (2017) asked students about their understanding of the types of data that might be included in data analytics conducted in the tertiary sector. Two articles by Wook et al. (2015, 2017) differentiated between educational data mining—“application of data mining techniques in analysing specific types of datasets in the educational setting” (Wook et al., 2017, p. 1196)—and learning analytics. They defined the focus of learning analytics more narrowly on “students and educators . . . understanding and optimizing their learning environment” (Wook et al., 2017, p. 1196) but, drawing on Ferguson (2012), acknowledged that the differences between the data mining and learning analytics were “vague and overlapping” (Wook et al., 2017, p. 1196).
Regardless of whether a definition of “big data” was given or not, most papers (n = 17) provided no description of the types of data considered within the paper or they briefly described data collection (behavioral data) within an LMS. In most cases it was not clear if students were aware of the types of data that were collected in their institution (see Supplemental Table S2 in the online version of the journal). Adejo and Connolly (2017), Arnold and Pistilli (2012), and Ifenthaler and Schumacher (2016, p. 923) categorized the types of data collected within a university environment as behavioral data, demographic data, assessment or performance data, financial data, historical progression data, and prior academic history data. Brooker et al. (2017) included regulatory compliance data, namely, “information we are authorised to collect from other organisations (e.g. government agencies).”
Knowledge of Universities’ Use of Data
There was considerable variation in student and staff awareness and understanding of how student data are stored and used, how learning analytics is used in tertiary institutions, and data policies in universities. Some students knew very little about the how student data were managed. For example, a survey of 209 students in Scotland conducted in 2016 (Adejo & Connolly, 2017) showed that 40% of students were unaware of the data policy at their university. This was supported by an Australian qualitative study that showed that most students were “unaware or unsure about what big data and learning analytics were at the start of the focus groups” (Roberts et al., 2016, p. 5). Similarly, recent studies conducted in the United Kingdom (Khan, 2017; Slade & Prinsloo, 2014) and two Australian universities (Brooker et al., 2017) demonstrated that most students knew very little about learning analytics and how it works. The students were unaware that the university collected a wide range of data. A large study of 22,593 U.K. students (Newman & Beetham, 2017) found that students were about equally divided between indicating that they had been told how their personal data were stored in the University they attended, saying that they had not been told, and being “neutral,” suggesting that they did not know if they had been told or not. Interestingly, the students were not asked if they knew how personal data were stored. In another U.K. study, 20% of 50 students from a range of disciplines did not understand anything about cloud computing and how it worked (Maghrabi, 2014).
Staff were little better informed: 39% of staff in a global online survey of 121 academics (Drachsler & Greller, 2012) did not know or were unsure about whether they had an ethics committee and guidelines in place for the use of student data for research and 12% of senior staff at New Zealand universities (Mahroeian et al., 2017) did not know if their university was using learning analytics or not. A survey of 22 Australian universities (West et al., 2016) showed that only four universities had informed all relevant parties (by their own definition) about the use and impact of learning analytics.
Preferences
Several studies assessed student preferences for the use of data and learning analytics. In general, students supported the use of learning analytics to enhance their learning experience. A Macquarie University study (Atif et al., 2015) found that students recruited from first-year math, physics, and computing studies wanted early alerts, and to be contacted about low scores (90%), missing work (67%), lack of participation (52%), frequent absences (37%), and in-class behavioral problems (30%). The same students, however, were less supportive of alerts for lecture content not viewed (27%), other uptake issues (21%–22%) and poor forum participation (3%–4%). The reasons for these differences were not explored. A similar study by Corrin and De Barba (2014) from the University of Melbourne suggested that students saw learning analytics as helpful in study planning. Reimers and Neovesky (2015), in a survey of German and Austrian students, found that most students wanted all relevant information in a central place with an overview of deadlines.
In general, students in the studies included in this review were positive about the use of their data to improve their performance. A detailed study by Arnold and Sclater (2017) explored students’ attitudes toward the use of their data to improve their performance. These authors found that students at both U.S. and U.K. universities were willing to have their data used to improve their grades, but U.S. students were more accepting (91%–94%; 2% said no) than U.K. students (71%–77%; 8%–12% said no). Similarly, U.S. students were more willing to have their data leveraged so they do not drop out (72%–76%; 5% said no) than U.K. students (53%–54%; 20%–23% said no; Arnold & Sclater, 2017). Explanations as to the ways that data might be leveraged to achieve these ends were not provided to the participants, and the reasons for the differences in response were not explored.
Students were less positively disposed toward the use of their data for comparative purposes. In Reimers and Neovesky’s (2015) study, students were supportive of learning analytics for personal use (60%), but only 34% of participants wanted to compare their performance with other students. Another study that compared U.S. and U.K. students (Arnold & Sclater, 2017) indicated that U.S. students (60%–61%) were more likely than U.K. students (25%–26%) to support data visualization in an app that all students could see. Students in an Australian study (Corrin & De Barba, 2014) found graphic representation of data particularly valuable but thought it could be demotivating if they were above the class average. By contrast, students from a different Australian university suggested that messages about poor performance might be similarly demotivating (Roberts, Howell, & Seaman, 2017). The students supported a feature through which they could compare themselves with their peers, but they wanted a customizable nuanced dashboard: Presentation of performance through “traffic lights” was not considered sufficient (Roberts, Howell, & Seaman, 2017). All of this suggests that there may be national and/or institutional differences in how students view data use.
In some cases, the data presented in papers did not appear to support claims about student support for data analytics. Ifenthaler and Schumacher (2016, p. 933) claimed that their survey data supported the contention that students would be willing to share more extensive data if, in return, the learning analytics system “provided rich and meaningful information,” but the data presented were insufficiently nuanced to support this contention. Similarly, in an article in the Times Higher Education Supplement, drawing on the findings of a report for Jisc, also reported by Arnold and Sclater (2017), Jisc’s Chief Executive Officer suggested, with no supporting data, that students would support learning analytics more if they understood it better (Havergal, 2016).
Ethical Issues
Across the tertiary education sector, awareness of ethical issues in the use of data analytics appears to be low. A 2014 survey by West et al. (2016) of registered Australian universities (n = 22 from 40 contacted) and Australian and 10 New Zealand academic staff (n = 341) was the only article we found that explicitly asked for institutional responses on ethical issues in the use of big data in the tertiary sector. This article found that ethical problems associated with the use of data analytics were identified as a concern by 64% of institutions but by only 29% of academic staff (34% were unsure). Eight universities indicated that they had not yet considered the issues. Institutional responses focused on privacy and ethical use of data, whereas staff identified issues associated with the use of learning analytics for performance management, transparency, and profiling of students. From an open-ended question (n = 112 staff responses) West et al. identified those ethical principles that participants believed should guide learning analytics. These were autonomy (42%), privacy (29%), beneficence (23%), and justice (17%), with less attention to the ability to use own data (1%), nonmaleficence (6%), transparency (7%), and duty of care (2%).
Despite the lack of explicit attention at an institutional level to ethical issues, staff and students in the papers we reviewed did clearly think about big data and its ethical implications. In the following sections, drawing on Beauchamp and Childress’s (2008) framework, we have focused on those views that describe how participants in the included studies thought about three key areas of ethical significance: beneficence, nonmaleficence, and respect for autonomy (focused around privacy, confidentiality, identity, consent, and transparency). We have not included a section on justice, because there was very little attention to this concept in the papers we reviewed. Only one article (West et al., 2016) explicitly identified justice as an ethical issue.
At the beginning of each section, drawing particularly on Beauchamp and Childress’s (2008) and Richards and King’s (2014) work, we define the area of ethical concern. Using findings from this review of the literature, we then illustrate how the views of staff and students can inform our understanding of these areas. Each section concludes, if appropriate, with a brief comparison with broader literature on ethical issues in the use of big data. In the discussion we comment on those issues that were not addressed in the papers included in this review but that have received more attention in other review papers.
Beneficence (Benefits)
Students and staff identified a range of benefits for students that they perceived to be associated with the use of data analytics in university settings. Students acknowledged the potential for data analytics to improve the student experience, including through minimizing isolation, helping students who find it difficult to ask for help, and reaching students with different learning styles (Heath & Leinonen, 2016; Roberts et al., 2016). Both staff and students recognized that data analytics can: help students learn more effectively (Corrin et al., 2013; Heath & Leinonen, 2016; Howell et al., 2018; Kammer, 2015; Knight et al., 2016; Mahroeian et al., 2017), provide helpful feedback to which students can respond (Corrin et al., 2013; Heath & Leinonen, 2016; Howell et al., 2018; Kammer, 2015; Roberts et al., 2016), and identify “at–risk” students (Corrin et al., 2013; Heath & Leinonen, 2016; Howell et al., 2018; Knight et al., 2016; Mahroeian et al., 2017; Roberts et al., 2016).
Staff participants in several studies identified benefits from data analytics for staff performance, including the ability to map strategic initiatives, develop predictive models, make teaching more proactive, increase evidence-based decision-making, and potentially free up time for other activities (Corrin et al., 2013; Howell et al., 2018; Knight et al., 2016; Mahroeian et al., 2017). Staff from New Zealand universities described how data analytics could help to optimize resources, reduce administrative costs, and improve administration services (Mahroeian et al., 2017).
Nonmaleficence (Harms)
Students identified a range of harms associated with the use of data analytics. They expressed anxiety about surveillance, the potential intrusion on privacy (Heath & Fulcher, 2017; Heath & Leinonen, 2016; Roberts et al., 2016; Slade & Prinsloo, 2014), and the constant flow of negative, unnecessary, and duplicated messages (Arnold & Pistilli, 2012; Roberts et al., 2016; Roberts, Howell, & Seaman, 2017; Slade & Prinsloo, 2014). Staff were concerned about the possibility that learning analytics could set unrealistic expectations for students and encourage students to remain in courses to which they were ill-suited (Howell et al., 2018; Roberts et al., 2016). The feedback systems were criticized by staff as inadequate to the task (Corrin et al., 2013; Howell et al., 2018), redundant given the existing services (Howell et al., 2018), and lacking in mechanisms for effective follow-up (Howell et al., 2018; Jones, 2016). A survey of staff (n = 92; 17% response) at a regional Australian university suggested that although many staff failed to use learning analytics because of inadequate support and training, the primary barrier was lack of time (Jones, 2016). Participants in two studies with staff (Arnold & Pistilli, 2012; Howell et al., 2018) and one with students (Roberts et al., 2016) expressed the concern that learning analytics could impede personal responsibility. One participant suggested that learning analytics could cause teaching to move “away from active learning principles” and shift the organization towards becoming “a helicopter university” (Howell et al., 2018, p. 12).
Academic staff participants were also concerned that data analytics might contribute to inappropriate key performance indicators for staff and organisations and would increase staff workload (Arnold & Pistilli, 2012; Bischel, 2012; Howell et al., 2018; Jones, 2016; Knight et al., 2016; West et al., 2016). The high cost of implementing effective systems for data analytics and the failure to address lack of expertise to support sound analysis and use of data were also seen as barriers to ethical use of data analytics (Bischel, 2012; Corrin et al., 2013; Howell et al., 2018). Finally, allowing some students to withdraw their personal data – assuming that this is possible – was perceived to increase the risk that these students would fail because of lesser levels of monitoring and support. It could be argued that universities “have a duty of care and a responsibility to optimize student success” and permitting students to withdraw from learning analytics would “undermine these core University responsibilities” (Heath & Leinonen, 2016, p. 81).
Respect for Autonomy
Being autonomous is usually taken to mean that people shape or direct their own lives and make their own decisions (Beauchamp & Childress, 2008, Chap. 4). The concept underpins many aspects of contemporary research and professional practice: We recognize the importance of autonomy by respecting the privacy of others, keeping information entrusted to us confidential, and allowing people to create their own identities. In research settings, informed consent is a mechanism for people to know about and control what happens to them and to the information collected about them. In this article, we use the umbrella principle of respect for autonomy to collect up the range of ethical issues of privacy, confidentiality, identity, informed consent, and transparency.
Privacy
Privacy has often been defined as being about exercising control over who has information about us and under what circumstances (Parent, 1983). In contrast, in an era of big data usage, Richards and King (2014) framed privacy as concerned with how information flows are governed. They focused on the practices of notice—the idea that data processors should disclose how they are collecting and using personal data—and choice—the idea that people should be able to choose which uses of their data they will allow (p.412). Their framing of privacy is echoed in how students and staff in the included studies discussed data use in tertiary education settings.
Both staff and students held general concerns about the impact of data analytics on privacy, particularly that individual privacy rights would be affected and that data analytics could constitute an invasion of privacy (Bischel, 2012; May et al., 2017; Roberts et al., 2016; Slade & Prinsloo, 2014). For example, although a forum of 50 students at Open University (United Kingdom) recognized the “positive intentions associated with a learning analytics approach” (Slade & Prinsloo, 2014, p. 296), many did not think that the university needed to collect all the information that they did, viewing “some questions as impertinent and intrusive” (Slade & Prinsloo, 2014, p. 295). More explicitly, students wanted to control access to their data so that they would be able to selectively or completely opt out of the use of their data for analytics (Heath & Fulcher, 2017; Khan, 2017; Knight et al., 2016; May et al., 2017; Slade & Prinsloo, 2014). For example, nearly half of all students surveyed at a Scottish University wanted all their data to be removed on graduation (Adejo & Connolly, 2017). In three studies students expressed the belief that some data, such as personal information about marital status, social media use, income, and health, should never be included as part of an analytics panel (Ifenthaler & Schumacher, 2016; Knight et al., 2016; Slade & Prinsloo, 2014). Students were also concerned about information flows between contexts, although there were fewer objections to anonymized data being used in this way (Heath & Fulcher, 2017; Heath & Leinonen, 2016; Kammer, 2015).
The views of academic staff on privacy centered on the belief that access to data should be on a need to know basis (Drachsler & Greller, 2012; Heath & Fulcher, 2017; Heath & Leinonen, 2016; Knight et al., 2016; Slade & Prinsloo, 2014). In a study by West et al. (2016), which examined institutional responses to learning analytics, a staff participant indicated that institutional conversations often revolved around “a real lack of clarity around clearance to access certain data” (p. 911). Both student and staff participants called for good data handling protocols and expressed a lack of trust in organizations’ mechanisms for data security (Adejo & Connolly, 2017; Maghrabi, 2014; Slade & Prinsloo, 2014; West et al., 2016). Very few papers addressed the challenge that predictive analysis can create for privacy, despite Baruh and Popescu’s (2017, p. 579) concern that predictive analysis will generate “a new social organization of knowledge which normalizes a climate of privacy loss.”
Confidentiality
Confidentiality as a principle assumes that other people do have information about us but will keep it private (Richards & King, 2014). As Richards and King (2014) noted, Virtually all information exists in intermediate states between completely public and completely private . . . [and] much of the information in intermediate states that we share is private data that we share in trust, expecting them to remain confidential. (p. 413)
There were mixed views among staff and students in the included studies about how well confidentiality is protected in universities. For example, a 2017 survey of Scottish students (n = 209; Adejo & Connolly, 2017) found high levels of trust in their institution’s ability to protect student data: 80% trusted the university to protect collected data from hacking although only 62% thought that their personal information was adequately protected within the institution. With a broader reach and reflecting a general distrust of cloud storage, in a study of English students (n = 40), 62% said they did not trust that their data were safe in the cloud although they regularly used Dropbox to share work (Maghrabi, 2014). Anonymization was not necessarily seen as the answer to confidentiality issues. In a global survey of 121 staff members from tertiary institutions (Drachsler & Greller, 2012), approximately half trusted anonymization methods, but a similar number did not trust the methods or did not know if the methods were reliable. However, in two studies, students differentiated between the inherently confidential nature of “anonymised data to observe/monitor large scale trends” and “the ‘snooping’ variety of data collection tracking the individual” (Slade & Prinsloo, 2014, p. 296).
Identity
If privacy concerns our right to limit others’ access to information about us, identity concerns our right to define ourselves. Richards and King (2014, p. 422) suggest that we think about identity not as a “specific name for a specific person” or as those properties of qualities of something that make it that thing but, following Erikson and Cohen, as “the right to define who I am” (Richards & King, 2014, p.423). Data analytics by its very nature removes self-definition from individuals since it seeks to categorize students on the basis of characteristics and subsequent behaviors and outcomes observed at a population level.
In this vein, some student participants thought that personalized learning analytics may be useful in promoting individual identity by reducing feelings of isolation at university (Roberts et al., 2016) and of “being just a number” (Arnold & Pistilli, 2012, p. 269). However, students and staff also expressed concern that categorizing students in certain ways could alter the behavior of teachers toward individuals and groups of students (Knight et al., 2016; Roberts et al., 2016; Slade & Prinsloo, 2014; West et al., 2016). A major concern was that a change in how they were profiled could impact on future study opportunities. As one student stated, “There could be preconceived judgement about my abilities to be able to complete or do something which may inadvertently make me singled out from being available to do something” (Roberts et al., 2016, p. 8).
Several of the studies included in this review highlighted that students and staff doubted the accuracy of the data collected (Adejo & Connolly, 2017; Bischel, 2012; Howell et al., 2018; Slade & Prinsloo, 2014) or the ability of learning analytics to meaningfully track student behavior (Drachsler & Greller, 2012; Howell et al., 2018; Roberts et al., 2016). Students in four focus groups at an Australian university (Roberts et al., 2016) suggested that logging in to a teaching platform did not mean engagement, that each student is unique, and that current cohorts may behave differently to past cohorts. They also thought that inaccurate information and inappropriate data algorithms could falsely label a student in particular ways, which could prejudice the staff-student relationship (Slade & Prinsloo, 2014). Some students pointed to what they saw as inaccuracies in their records or in mistargeted emails as evidence of misidentification (Adejo & Connolly, 2017; Slade & Prinsloo, 2014). On the other hand, one student suggested that the mistargeting was almost certainly due to poverty of the data held and that this issue could be overcome only by the university holding more personal data (Slade & Prinsloo, 2014). Some students indicated that if they knew their instructors were looking at their data, they might manipulate their data to influence how they were perceived by staff (Brooker et al., 2017; Kammer, 2015).
The problem of identification goes further than incorrect data or misinterpretation of data. Data analytics “provides probabilities about what someone in a particular category is likely to do” (Andrejevic, 2014, p. 65). Labelling of specific groups of students as a “problem” through the use of data analytics may be problematic particularly if it is used to entrench existing inequalities in educational opportunity. For example, staff from an Australian university were concerned that algorithms categorizing which sorts of students complete successfully may change university recruitment strategies so that they may “not bother advertising [to] them or going to their schools” (Howell et al., 2018, p. 9); this could exacerbate current issues of higher university rejection rates for school students coming from low–socioeconomic status areas.
Consent
The issue of consent was clearly identified by students and staff as problematic for data analytics. A lack of awareness, among students, of what learning analytics is (Brooker et al., 2017; Corrin et al., 2013; Heath & Leinonen, 2016; Roberts et al., 2016), what data are collected (Brooker et al., 2017; Heath & Leinonen, 2016; Slade & Prinsloo, 2014), and what institutional data protection policies are in place (Adejo & Connolly, 2017) precludes informed consent for data usage. The general lack of awareness and, in some studies, the specific ethical issues raised by student and staff participants suggested that effective communication and consultation with students and staff, as key stakeholders, were not occurring and that this was seen as an issue by some participants (Heath & Leinonen, 2016; Howell et al., 2018; Knight et al., 2016; Roberts et al., 2016). Participants did recognize that informed consent was difficult given the problems associated with using opt-out clauses (West et al., 2016) and the long time spans and shifting environment in which information might be used (Roberts et al., 2016; West et al., 2016).
Students wanted the ability to choose how data analytics operates for them in a tertiary education setting to include the opportunity to be well informed, provide consent (May et al., 2017; Roberts et al., 2016), and potentially opt out from data use (Heath & Fulcher, 2017; Knight et al., 2016; Slade & Prinsloo, 2014). For example, one Australian student saw data analytics as a sort of forced participation and expressed the desire to completely opt out from learning analytics: “Ignorance is bliss, I—just take me out of the equation, like I don’t want to know anything about it” (Roberts et al., 2016, p. 8). This view was reinforced by an American student in another study who felt under pressure to share data since he believed, “If I block my instructor from seeing everything I do, he might have a personal issue with that. Right? That could influence how he graded me possibly” (Kammer, 2015, p. 3).
Although not clearly articulated in the studies included in this review, participants’ views on consent did reflect the standard conditions for consent outlined in the professional and research ethics literature: Participation should be voluntary, based on adequate information, and there should be no coercion to contribute data (Faden & Beauchamp, 1986).
Transparency
Many of the ideas discussed above turn, in part, on the concept of transparency, which concerns openness about how data are collected, analyzed, used, and reported; who does this; and where and why. This information is essential to underpin consent and trust in the privacy and confidentiality of student data, but it is clearly a fraught area of policy for institutions.
The views expressed by students and staff in the included studies reflect beliefs that universities have a responsibility to inform students and staff about the type of data collected, the ways in which data are used and the protections in place to ensure privacy and confidentiality. In particular, students and staff advocated for students to be fully informed about how data will be used and how and why learning analytics are being used (Heath & Fulcher, 2017; Heath & Leinonen, 2016; Kammer, 2015; Slade & Prinsloo, 2014; West et al., 2016) and for transparency about how long data will be retained by the university (Slade & Prinsloo, 2014). Methods noted by students, staff, and institutional representatives that may improve transparency included periodic data audits (Slade & Prinsloo, 2014), institutional ethics boards governing data analytics (Drachsler & Greller, 2012; West et al., 2016), clarity around data access (West et al., 2016), and policy/guidelines on ethical use of student data (Jones, 2016; Slade & Prinsloo, 2014).
The concerns about trust in institutional use of learning analytics expressed by some participants in the included studies reflect broader concerns about data security and the growing lack of confidence in data anonymization (Andrejevic, 2014; Barocas & Nissenbaum, 2014). However, it is also clear that the descriptions of ethical issues in the identified studies did not display the level of understanding and sophistication among participants that could help elucidate issues of trust and acceptance. This was true of all the ethical issues discussed in the identified papers: Understanding of the nature of data analytics amongst the participants was generally insufficient to fully engage with and discuss the ethical issues associated with the use of learning analytics in tertiary settings.
Discussion
The aim of this review was to identify and describe what is already known about student and staff knowledge, attitudes, and concerns in the collection and analysis of large volumes of student information in universities. What is clear from this review is that many staff and students are unaware of or know little about the use of student data in data analytics and, in particular, the use of learning analytics in higher education settings. This was true across studies, disciplinary areas, institutions, and countries covered by the studies that explored this area. Many of the studies were small, but the universality of this finding suggests that understanding of student data collection, analytics, and use in higher education is low generally. This gap may reflect a lack of transparency in tertiary institutions and/or a lack of engagement with students and staff. It is also possible that including only English language papers in the review excluded papers in other languages, which could have drawn attention to regional differences in knowledge and use of learning analytics.
Across studies, both students and staff recognized the potential for data analytics to improve the well-being of students. Staff and students saw benefit from the use of learning analytics in the early identification of at-risk students, in enhanced support for effective learning behaviors, and in the provision of personalized learning experiences (Greller & Drachsler, 2012; Pardo & Siemens, 2014).
The use of learning analytics also has the potential to cause harm. Papers describing student perspectives indicated that learning analytics has the potential to (a) reinforce stereotyping and discrimination based on group characteristics (Slade & Prinsloo, 2013), (b) result in a loss of autonomy and/or privacy (Howell et al., 2018), (c) exacerbate adverse impacts on student well-being due to negative feedback and/or misinterpretation of results (Howell et al., 2018), and (d) increase perceptions of intense pressure to perform and compete (Roberts et al., 2016). Not surprisingly, therefore, in identifying concerns, students focused more on what happens to student data than on other issues, for example, that some types of data should not be collected at all (Ifenthaler & Schumacher, 2016; Knight et al., 2016; Slade & Prinsloo, 2014). In contrast, staff focused on the needs of teachers and the impact of data analytics on both institutional resourcing and the universities’ expectations of staff performance (Arnold & Pistilli, 2012; Dobozy, 2017; Jones, 2016; Knight et al., 2016; West et al., 2016). The small number of papers that explicitly involved nonteaching staff primarily focused on pragmatic implementation and governance issues such as poor communication across university sectors, data silos, recruitment issues in acquiring expert staff, and underdeveloped governance structures (Bischel, 2012; West et al., 2016).
Staff and students also agreed on a broader range of concerns, including potential loss of privacy (Bischel, 2012; Slade & Prinsloo, 2014; Roberts et al., 2016); doubts about the data collected, particularly that the metrics may not reflect behaviors (Adejo & Connolly, 2017; Slade & Prinsloo, 2014; Roberts et al., 2016; Howell et al., 2018); potential misuse of data to the detriment of students and staff, including undermining the development of behaviors for independent learning (Arnold & Pistilli, 2012; Howell et al., 2018; Roberts et al., 2016); and that categorizing students may lead to stigmatization, adverse changes in behaviors, and loss of privacy. In general, staff were more skeptical than students about the value of data analytics (Corrin et al., 2013; Dobozy, 2017; Drachsler & Greller, 2012; Howell et al., 2018; Roberts et al., 2016) and identified a number of barriers that would need to be overcome to develop data systems that are effective in supporting improved learning and teaching. For example, staff described inadequate institutional support for teaching staff to adequately access, interpret, and use data analytics (Corrin et al., 2013; Dobozy, 2017; Jones, 2016).
Some studies described the potential for data analytics to free up time for other activities (Corrin et al., 2013; Mahroeian et al., 2017), whereas in others, participants warned of increased work load (Corrin et al., 2013; Jones, 2016; Dobozy, 2017). These conflicting findings probably reflect, in part, the lack of appropriate support systems but also differences between early and late adopters of technologies (Davis, 1989). For some staff the burden of negotiating an understanding of the new technology (perceived ease of use) outweighs any benefit they might gain in the long term (perceived usefulness). In addition, as this review suggests, some teaching academics question the value of learning analytics in their teaching practice, and therefore any time devoted to its use may be seen as wasteful.
The findings from this review indicate that there has been little effort to engage students and staff to explore the ethical issues associated with learning analytics. This almost certainly reflects the fast-moving nature of the field and the generally poor understanding and exploration of what those issues might be. It also may reflect a general lack of recognition in the university sector of the importance of including diverse voices in policy decision-making on contentious issues. This review also supports the contention by Roberts, Chang, and Gibson (2017, p. 90) that “the systematic consideration of ethical issues has failed to keep pace with the rapid development and implementation of learning analytics in higher education.” In the studies included in this review, students and staff identified a range of ethical issues underpinned by inadequate governance and a failure to “establish an environment of trust and a culture of ethical data use” (Polonetsky & Tene, 2014, p. 27).
The low levels of student and staff engagement with data analytics do not suggest that these stakeholders are disinterested. Rather, the findings indicate some awareness among staff and students of the benefits and risks associated with the use of learning analytics, and they imply a willingness to engage in these issues. An ethical approach to learning analytics will incorporate a focus on building a culture of active learning and engagement rather than simply adopting passive surveillance and intervention. Increased engagement with staff and students can help determine how to balance the ethical principles at stake. For example, staff and students are well placed to assess matters such as the extent of universities’ duty of care for students.
International differences in regulation and law with respect to data sharing as well as cultural norms may explain some of the variations in practice observed. These differences may affect the potential for changes in policy and practice in response to the findings in this review. However, as Polonetsky and Tene (2014), reflecting on the response to the collapse of the U.S. education technology company inBloom, suggested, The escalating drumbeat of calls for legislative reform, including more than 100 education privacy bills pending in state legislatures as well as for tighter contractual obligations, while important will not solve the problem . . . social norms are rarely established by regulatory fiat, and laws that fail to reflect techno-social reality do not fare well in the real world.
Put another way, an emphasis on regulation and compliance will not protect organizations against perceptions of malfeasance. This review indicates that although there are perceptions of potential benefit from the use of student data analytics in tertiary education settings, the essential groundwork to ensure sustained and effective realization of this potential is, in many cases, missing.
The findings in this review point toward four sets of conditions that need to be satisfied to ensure effective use of data analytics in the tertiary sector. First, data analytics, and in particular learning analytics, must be tied closely to providing benefits for students, including improving the student experience, optimizing learning, and developing independent learners. Although it can be argued that “students suffer most from the inequities of the system” (Korn, 1969; Roberts et al., 2016), student leadership in curriculum development is undergoing a transformation, albeit slowly (Darwin, 2017), and student satisfaction is a strong driver in academic policy and practice (Thomas et al., 2017). The risk, as Williams van Rooij (2009) commented on the early evangelism of information technology staff in universities, is that technological capacity, rather than student learning, will drive data analytics in the tertiary sector.
Second, institutions need to develop and maintain a high level of trust in the processes of data collection, linkage, analysis, and use. This review suggests that trust can be supported by a range of strategies, including investment in data security and in systems of support for staff to access, interpret, and use data analytics effectively. Student-facing strategies can also increase trust. Such strategies can include disclosure of the amount and type of data collected, transparency about data usage and length of retention, positive stories about the impact of data analytics, allowing students to opt out of certain types of tracking, ease of access for each student to their own stored data (Andrejevic, 2014), and an appeals process through which students can challenge the accuracy of data.
Third, if tertiary institutions are to establish good processes and governance around data analytics and address the concerns raised in the studies included in this review, they will need a variety of ways to engage with staff and students. An ethical student-centered program of learning analytics will include clear channels for student and teacher voices in the development, implementation, and modification of learning analytics to promote and maintain a collegial culture (Heath & Leinonen, 2016; Wintrup, 2017). The Jisc Code of Practice for Learning Analytics 2015 (Sclater & Bailey, 2015, p.4) recommends, “Student representatives and key staff groups at institutions should be consulted around the objectives, design, development, roll-out and monitoring of learning analytics.” Processes to support this approach include student and staff representation (a) on committees providing oversight of data-use and (b) in the development and implementation of codes of conduct and policies that set boundaries on data use and provide guarantees of transparency.
Fourth, such engagement requires an informed university community: one that understands data analytics and the ethical issues associated with its use. One of the problems with many of the studies included in this review is that staff and student participants have had limited understanding of the ways in which data is collected, linked, analyzed, and used in their institutions and even less of the potential for its use going forward.
This review reveals considerable gaps in the literature, beginning with the lack of robust studies with students and staff on the use of data analytics in tertiary education settings. Such studies should encompass the full range of ways in which data are collected, linked, analyzed, and used in institutions including for recruitment, retention, financial aid, marketing, resource management, and campus facility management.
There is also considerable potential for researchers to extend the methods used in this field beyond the survey-based and small, qualitative studies that dominated the studies we reviewed. In the health sphere, the literature on professional, patient and public attitudes to sharing health data has burgeoned in the last decade with studies using a wide variety of methods, including surveys, deliberative events, dialogue groups, workshops, and online surveys and interviews (Aitken et al. 2016; Kalkman et al., 2019). Deliberative methods would be particularly appropriate here, as they address one of the principal weaknesses of surveys—their inability to provide nuanced interpretations of the interplay between participants’ understandings of a complex topic and their values about that topic. Deliberative approaches, by contrast, both allow participants to engage in an informed manner with demanding topics and support participants to think beyond their own personal values to consider collective needs (Degeling et al. 2017; Street et al., 2014).
We included the views of both staff and students in this article and found that, by and large, these two groups identified similar issues, and held similar concerns about them. However, there were points of difference between student and staff views: Unsurprisingly staff were concerned about the implications of big data and learning analytics for their own work and more likely to consider organizational issues such as the lack of attention by institutional ethics boards to learning analytics. However, this finding should be treated with caution given the students and staff were mainly in different countries. Further research could focus on these issues.
Ethical analysis of the responsible use of big data is well established in the health sphere (Allen et al., 2013). Three topics of debate in the health and biomedical literature on big data that did not appear in papers included in this review are particularly relevant: justice and fairness, and the epistemology of big data. First, the papers we reviewed were relatively quiet on the “big data divide” (Xafis et al., 2019): the fact that the benefits arising from collecting and using big data are not evenly distributed. Related to this, some groups of data subjects are much more likely to be harmed by the use and misuse of data. The papers included in this review that do focus on the potential for learning analytics to entrench existing inequalities in educational opportunity treat it as a matter of inaccurate identification of students (Andrejevic, 2014; Howell et al., 2018), rather than a matter of justice. As Xafis et al. (2019) noted, “[A]ddressing the big data divide therefore requires that we engage with questions about justice, and about solidarity—particularly with those who are most vulnerable” (p. 241). This would be a fruitful area for further analysis.
Second, we found no papers that explicitly addressed the epistemology of big data (Lipworth et al., 2017; Mittelstadt & Floridi, 2016). For example, the idea that bigger is not always better when it comes to data was not questioned in the papers we reviewed. The observational nature of most big data studies exposes them to the deficiencies of all observational research (e.g., inferring causation from correlation; confounding bias); such deficiencies can be magnified in studies with very large data sets of variable quality. Big data studies can be overpowered, leading to the possibility of acting on false signals and inappropriate inference. Here, again there is potential for future studies to explore the relationship between epistemology and the higher education setting.
Third, little attention has been paid, to date, to the potential for data from learning analytics to be used externally by software providers or by third parties to whom vendors might sell data. Again drawing on literature from the health sphere, there is evidence that patients and publics are much more concerned about private sector access to their health data than they are about use by researchers or government (Aitken et al. 2016). Exploring staff and student support for such alternative uses before these begin to occur on a large scale would be helpful for universities contemplating a move in this direction.
Universities stand to gain a great deal from engaging with these matters if they are to provide the best possible learning environment for all students. They may also gain from adopting the ethical review and governance frameworks that are widely used to review big data research in the health sector. In addition, a cross-country comparison of data analytics use and governance systems would assist in developing a full understanding of the complexity of the issues and the lessons to be learned from prior experience in institutions that have been early adopters of data analytics.
Supplemental Material
Table_S1__Definitions_of_Learning_Analytics_15_May_2020 – Supplemental material for Student and Staff Perspectives on the Use of Big Data in the Tertiary Education Sector: A Scoping Review and Reflection on the Ethical Issues
Supplemental material, Table_S1__Definitions_of_Learning_Analytics_15_May_2020 for Student and Staff Perspectives on the Use of Big Data in the Tertiary Education Sector: A Scoping Review and Reflection on the Ethical Issues by Annette J. Braunack-Mayer, Jackie M. Street, Rebecca Tooher, Xiaolin Feng and Katrine Scharling-Gamba in Review of Educational Research
Supplemental Material
Table_S2_Data_summary_student_and_staff_type_May_15_2020 – Supplemental material for Student and Staff Perspectives on the Use of Big Data in the Tertiary Education Sector: A Scoping Review and Reflection on the Ethical Issues
Supplemental material, Table_S2_Data_summary_student_and_staff_type_May_15_2020 for Student and Staff Perspectives on the Use of Big Data in the Tertiary Education Sector: A Scoping Review and Reflection on the Ethical Issues by Annette J. Braunack-Mayer, Jackie M. Street, Rebecca Tooher, Xiaolin Feng and Katrine Scharling-Gamba in Review of Educational Research
Footnotes
Notes
Authors
ANNETTE J. BRAUNACK-MAYER is a bioethicist with considerable experience in the use of qualitative research methods to explore social and public health issues: University of Wollongong, Northfields Avenue, Keiraville, New South Wales 2500, Australia; email:
JACKIE M. STREET has particular expertise in the use of deliberative inclusive methods, qualitative methods, and scoping and systematic reviews to research community views on contentious policy issues: University of Wollongong, Northfields Avenue, Keiraville, New South Wales 2500, Australia; email:
REBECCA TOOHER is director Education Strategy and Teaching Excellence at the University of Adelaide, Division of Academic and Student Engagement, Level 7, Wills Building, North Terrace Campus, Adelaide, SA 5000, Australia; email:
XIAOLIN FENG graduated in 2019 with a bachelor of health and medical sciences (advanced) from the University of Adelaide, School of Public Health, 57 North Terrace, Adelaide, SA 5000, Australia; email:
KATRINE SCHARLING-GAMBA is a 2018 graduate from the University of Adelaide with a double degree in psychology and health promotion: University of Adelaide, School of Public Health, 57 North Terrace, Adelaide, SA 5000, Australia; email:
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
