Abstract
Recent advancements in Generative AI (GenAI) demonstrate strong potential for enhancing student evaluation and assessment practices through complementary and symmetric human–AI collaboration. This study examines the dispositions, perceived benefits, and challenges that teachers experience when they and GenAI complement one another in providing formative assessment and feedback. Partnering with a leading GenAI-powered EdTech platform, we conducted a case study in a Norwegian secondary school using Learny, a GenAI-supported feedback tool. Qualitative data were collected from teachers who employed Learny in their feedback practices. The findings indicate that teachers were generally curious and open to using GenAI, while emphasizing the need to maintain professional responsibility. They valued GenAI’s capacity to save time, offer inspiration, and deliver timely feedback, yet raised concerns about feedback quality, lack of relational and contextual awareness, and the risk of student over-reliance on such feedback. The study contributes to the growing literature on hybrid intelligence in education by identifying conditions under which GenAI can effectively support, rather than replace, human educators. It underscores the centrality of teacher agency in AI-supported feedback and assessment practices and outlines key implications for pedagogy, future GenAI system design, and sustainable implementation.
Keywords
Introduction
Generative artificial intelligence (GenAI) has rapidly expanded across sectors, including education, where it holds promise for supporting teaching and learning by automating tasks such as content and feedback generation (Giannakos et al., 2025). Recent research emphasizes the potential of hybrid intelligence (HI), in which humans and GenAI complement one another to achieve outcomes neither could reach alone (Järvelä et al., 2025; Nguyen et al., 2024; Sung et al., 2025). In the context of education, this means that GenAI should operate alongside teachers, augmenting their capabilities rather than replacing their role. One promising application lies in formative feedback: teachers often struggle to provide individualized assessment and feedback due to time and workload constraints (Burner et al., 2025; Coenen & Pfenninger, 2025), even though feedback remains essential for clarifying achievement and guiding improvement (Hattie & Timperley, 2007).
At the same time, GenAI systems carry the risk of creating inherently asymmetrical interactions and power dynamics (Woodruff et al., 2024), potentially undermining human agency and reinforcing social processes of deskilling. Furthermore, the integration of GenAI in education raises significant concerns regarding misinformation, bias, and ethical issues, including data privacy (Bahroun et al., 2023; Giannakos et al., 2025), as well as risks associated with teachers’ control, and increased cognitive laziness (Fan et al., 2025; Giannakos et al., 2025). Qualities that can undermine the importance of complementarity in HI, and significantly hinder teaching and learning. Therefore, it is essential to critically examine and proactively manage the opportunities, challenges, and risks associated with using GenAI to complement teachers in support of core teaching practices, such as feedback and evaluation.
Despite the growing importance of AI, teachers are still the ones who bring most of the insights about students’ unique needs, and adapt the technology to support their pedagogical designs and decisions (Ley et al., 2025; Topali et al., 2025). Therefore, in the context of an HI sociotechnical system, understanding teachers’ perceptions and attitudes toward using GenAI tools in educational contexts is crucial for developing complementary HI and using GenAI in the right contexts. As the ones implementing the technology in practice, their understanding, concerns, and sense of value influence whether and how GenAI becomes a meaningful part of teaching and learning. This study, therefore, investigates how secondary school teachers collaborate with GenAI when providing formative feedback. Specifically, it addresses the research question: • RQ1: What dispositions, perceived benefits, and challenges do teachers experience when they and GenAI complement one another in providing formative assessment and feedback?
To investigate this question, we conducted a case study involving 24 secondary school teachers who initially agreed to use a GenAI-supported feedback tool called Learny. The study was carried out in collaboration with Learnlab, the EdTech provider that developed the tool, and a secondary school that integrated it into classroom practice. Of these 24 teachers, 21 completed the study and provided full data. We collected qualitative data on teachers’ experiences of using the GenAI-supported feedback tool and their perceived benefits and challenges of such a human–AI collaboration. Additionally, we conducted several classroom observations to gain deeper insights into how teachers collaborated with GenAI to provide feedback to their students. In particular, this study contributes to the growing field of human–AI collaboration and HI in education by providing empirical evidence on how secondary school teachers engage with GenAI tools to support formative assessment and feedback. It identifies key dispositions, benefits, and challenges shaping teachers’ adoption of GenAI, highlighting the importance of maintaining professional agency and ethical responsibility in hybrid feedback systems. The paper advances the concept of HI in educational contexts, offering concrete design principles for balancing automation with teacher control. Finally, it provides practical implications for system designers, teacher education, and school leadership seeking to integrate GenAI responsibly and sustainably into feedback practices.
Background and Related Work
The Importance of Formative Assessment and Feedback in Education
Formative assessment and feedback are fundamental components of effective learning. Both have received significant attention in recent years, with scholars linking assessment and feedback, and associated strategies, to a range of educational, social, and psychological benefits (Gaynor, 2020). At the practical level, there is a broad consensus that formative assessment and feedback should play a substantial role in course design and delivery (Baughan, 2020). However, beyond this general agreement, evidence remains mixed regarding where the strongest effects lie and which specific approaches and elements are most effective for enhancing student learning (Morris et al., 2021); making it particularly challenging to identify approaches for proper integration of Human and GenAI feedback.
Although there is no single, universally accepted definition of either formative assessment or feedback, scholars generally agree that feedback constitutes an integral component of the broader framework of formative assessment (Wiliam, 2018). Both concepts center on gathering and providing information about a student’s current performance or understanding with the aim of improving learning outcomes. Black and Wiliam (1998), for example, define formative assessment as encompassing “all those activities undertaken by teachers, and/or by their students, which provide information to be used as feedback to modify the teaching and learning activities in which they [the students] are engaged” (p. 8). Importantly, this exchange of information occurs not only between teachers and students but also through other assessment types (e.g., peer, automated, and self-assessment), which serve as critical mechanisms for generating feedback that guides learners’ progress.
Different types of feedback address distinct aspects of learning, including (a) the task, (b) the process, (c) self-regulation, and (d) the individual (Hattie & Timperley, 2007). Each serves a different purpose, carries varying impacts on learning, and therefore requires different strategies for effective implementation. Feedback is typically delivered either verbally or in written form. Verbal feedback is often embedded within instructional dialogue and is viewed as a “move” within dialogic teaching and learning (Perry et al., 2020). Written feedback, by contrast, often includes more ”assessment qualities” such as corrections, marks, comments, questions, or targets designed to stimulate reflection and guide improvement. Recent works have focused on exploring the value of GenAI assessment and feedback (Coenen & Pfenninger, 2025; Corbin et al., 2025), without, however, considering the potential of Teacher–GenAI hybrid feedback creation.
GenAI in Teaching and Learning
Recent studies have shown that many teachers are open and positive to adopting GenAI and AI-based EdTech in their classrooms (Chiu et al., 2023; Han et al., 2024; Nazaretsky et al., 2022; Shankar et al., 2025). Teachers generally acknowledge AI as a tool that can help them become more efficient and improve their teaching competency (Chiu et al., 2023). At the same time, they have concerns related to a lack of understanding of how AI systems work, ethical issues such as bias and privacy, limited adaptability in teaching methods, and frustration over the lack of transparency in AI decision-making. Some K-12 teachers also believe that the growing presence of GenAI tools in education will lead to increased workload, additional responsibilities, and require adjustments to their teaching practices (Kumar & Sharma, 2025). Teachers describe the need to revise planning, curricula, and assessments, underscoring that GenAI use requires not just technical understanding, but also a pedagogical reflection on how to assess and evaluate students’ work (Kumar & Sharma, 2025). Such transitions might include placing more emphasis on students’ process over product, and evaluating students’ choices and justifications for when to use GenAI and for what tasks. At the same time, many teachers still have generally low familiarity with GenAI tools and lack knowledge and understanding of AI, its capabilities and limitations, as well as its potential benefits and challenges in educational contexts (Moura & Carvalho, 2024). To use GenAI tools safely and effectively in education, many studies underscore the need for AI literacy (Brandão et al., 2024; Han et al., 2024; Kumar & Sharma, 2025; Shankar et al., 2025).
GenAI to Support Formative Assessment and Feedback in Education
Research comparing GenAI- and teacher-generated feedback shows that GenAI can approach human quality under certain conditions, particularly when feedback is criteria-based or process-oriented (Banihashem et al., 2024; Dai et al., 2023; Steiss et al., 2024). While human instructors generally outperform GenAI in assessing student performance, GenAI tends to provide more detailed and coherent summaries and can effectively highlight both strengths and areas for improvement. Its key advantage lies in timeliness and accessibility as it can deliver instant, personalized feedback that allows students to reflect and revise while learning is still in progress (Coenen & Pfenninger, 2025; Han et al., 2024; Lai et al., 2024). This immediacy not only supports self-regulated learning but also alleviates teacher workload by automating repetitive feedback tasks, freeing time for relational and higher-order pedagogical interactions. Moreover, several studies indicate that engaging with GenAI-generated drafts can prompt teachers to reflect on their own feedback practices, thereby enhancing professional awareness and improving the quality of teacher-authored feedback (Rüdian et al., 2025; Sung et al., 2025). In this way, GenAI may serve not only as an efficiency tool but also as a catalyst for teacher reflection and professional learning.
Despite these benefits, the integration of GenAI into formative assessment remains constrained by several limitations. Studies point to recurring issues of accuracy, specificity, and pedagogical relevance: GenAI feedback can be inconsistent (Xu et al., 2025), contain false or misleading information (Han et al., 2024; Yan et al., 2024), or fail to provide actionable guidance for improvement (Lai et al., 2024). It can also overemphasize minor details, recognize issues in students’ work without explaining how to address them, or provide excessive praise that undermines credibility (Rüdian et al., 2025; Sung et al., 2025; Xu et al., 2025). Furthermore, GenAI cannot replicate the relational and emotional dimensions of teacher feedback that foster trust, recognition, and motivation (Corbin et al., 2025). Even when students perceive GenAI and human feedback as equally useful, the absence of genuine empathy and contextual understanding limits its pedagogical authenticity. Consequently, GenAI should be viewed as a complementary aid rather than a substitute for teacher judgment, effective primarily when integrated within hybrid systems that preserve human oversight, contextualization, and relational care.
Hybrid Intelligence as a Lens to Understand Teacher–GenAI Collaboration for Formative Feedback and Assessment
Understanding how teachers and GenAI complement one another in student assessment and feedback is essential for identifying the competencies needed to use these tools responsibly and for advancing this emerging hybrid practice. Through the lens of HI, the co-evolution, co-learning, and mutual reinforcement of humans and AI can be understood as a sociotechnical process aimed at achieving outcomes neither could accomplish alone (Akata et al., 2020; Dellermann et al., 2019). While prior studies have explored the integration of teacher and automated feedback, few have examined teachers’ dispositions (beliefs, attitudes, and intentions) when teachers and GenAI complement one another to provide feedback, or how they perceive the associated benefits and challenges. Although some research has explored teachers’ perspectives on GenAI integration, much of it has focused on higher education contexts or teachers who have not yet used GenAI as a complementary tool. As a result, there is limited empirical understanding of how the true strength of HI, that is, how humans and GenAI complement one another to achieve outcomes neither could attain alone-is perceived by teachers, as well as how teachers experience, interpret, and adapt GenAI within everyday feedback practices. To address this gap, this study offers qualitative evidence from interviews with secondary school teachers complementing GenAI in authentic feedback scenarios, providing novel insights into how teachers’ dispositions shape hybrid feedback practices and the opportunities and challenges that emerge when AI enters formative assessment. Thus, using HI as a lens will help us to investigate the ethical, technological, pedagogical, and practical issues that emerge when teachers engage with GenAI technology to provide formative assessment and feedback to their students.
Formative Feedback With Learny
To address the research objectives of this study, we conducted a case study on Learny, a GenAI tool currently used by teachers to provide formative assessment and feedback to students. The choice of this particular technology has implications for our findings; therefore, we briefly describe its main functions and design principles.
Learny is an AI assistant designed to support formative assessment and feedback in educational settings. Its overarching goals are to increase the quantity and quality of feedback, enhance learning outcomes, and optimize teachers’ time for higher-value tasks such as mentoring, evaluation, and individualized instruction. Rather than replacing teacher assessment, Learny is intended to augment it by helping teachers provide more and more consistent (criteria-based) feedback while preserving professional judgment and pedagogical intent. Learny integrates four complementary knowledge bases to tailor and contextualize the feedback process for both teachers and students: (1) (2) (3) (4)
To achieve this, Learnlab employs six specialized AI models that interpret multimodal student submissions and contextual teacher inputs from mainstream office platforms (e.g., Microsoft 365, Google Workspace). These models are continually refined through retrieval-augmented generation (RAG) techniques using a combination of high-quality human-assessed and synthetic data. Figure 1 illustrates Learnlab’s overall architecture and its feedback interface. Visualization of Learnlab’s architecture (left) and teacher-facing feedback platform (right).
In practice, Learny enables teachers to generate GenAI feedback drafts that can be reviewed, edited, and personalized before being shared with students. Drafts are generated from three main sources: (1) the teacher’s task description, (2) the curriculum goals linked to the assignment, and (3) the student’s submitted work. In this way, the quality of AI-generated feedback is influenced by the teacher’s pedagogical design and evaluative intent. Teachers can also specify focus areas (e.g., improvement suggestions, reflective prompts, or affective comments), and the system produces editable feedback drafts within a chat-like interface based on the input. This design supports interactive formative assessment by allowing teachers to refine, contextualize, or retract feedback as needed, while maintaining a continuous dialogic exchange with the student.
In addition to supporting hybrid teacher–AI collaboration, Learny also allows students to request immediate, automatically generated feedback on their own work within the integrated learning tools in Learnlab. This student-facing feature provides real-time formative guidance without prior teacher review, which can enhance autonomy but also raises questions about teachers’ awareness of, and control over, the feedback students receive.
Learny thus represents a concrete example of hybrid formative assessment and feedback, where teachers and GenAI collaboratively generate, refine, and deliver feedback to students. Through this co-adaptive process, the technology supports teachers’ evaluative judgment while enabling students to receive timely, personalized guidance. In the following section, we outline the methodological approach used to examine how secondary school teachers engaged with Learny in authentic classroom settings, focusing on how they perceived, adapted, and assessed its role in their feedback practices.
Method
Participants
Participants in this study were 21 secondary school teachers from a school in Norway who actively engaged with the GenAI feedback tool Learny during the project period. Initially, 24 teachers consented to participate. However, not all teachers engaged equally with Learny. Of the three teachers excluded from the final sample, two did not start using the system before the end of the project period, and one taught a subject (arts) where written feedback via digital platforms played a less central role. These teachers were therefore excluded from analysis as they did not meet the study’s engagement criterion of active use of Learny for student feedback.
Teacher profiles
All participation was voluntary, and all teachers reviewed the study information and provided written informed consent for data collection. Ethical approval for the study was granted by the Norwegian Agency for Shared Services in Education and Research (SIKT), which oversees human research ethics in Norway. Regarding students’ privacy and security, Learnlab (the EdTech provider) ensured a fully GDPR-compliant solution with secure EU-based data storage, supported by appropriate data processing agreements and regulatory compliance with Norwegian public schools and the EU AI Act. For more details, see https://info.learnlab.net/privacy-policy/ and https://info.learnlab.net/eco-system/.
Settings and Procedure
The study was conducted at a Norwegian secondary school that participated in a professional development initiative exploring GenAI for formative assessment. It began with an introductory online focus group on October 30, 2024, organized by the technology provider in collaboration with the researcher. During this session, teachers were introduced to Learny and its core functionalities. Following this, a series of in-person workshops was held at the school, led jointly by the researcher and the provider’s trainers. These workshops offered hands-on guidance, allowing teachers to explore the platform, pose questions, and discuss possible pedagogical applications. They also served as collaborative arenas where participants exchanged strategies and reflected on potential classroom applications. Throughout this phase, the researcher acted as a facilitator and support resource, assisting teachers with both technical and pedagogical inquiries to ensure a smooth onboarding process and a shared understanding of the tool.
The main implementation phase took place during January and February 2025, when teachers actively integrated Learny into their classroom practice through four core activities: (1) (2) (3) (4)
Data Collection
To gain deeper insight into how teachers experienced and evaluated GenAI in their feedback practices, we conducted semi-structured interviews. This qualitative approach allowed for systematic coverage of key themes while maintaining the flexibility for participants to elaborate on their experiences and reflections (Oates et al., 2022). The interview protocol (see Appendix) was designed to capture teachers’ perceptions of Learny’s pedagogical value, its impact on feedback processes, and the challenges of integrating GenAI into formative assessment. It was informed by questionnaire data collected during the initial focus group, which provided a basis for tailoring questions to participants’ prior exposure and attitudes.
Interviews were conducted individually and lasted approximately 30 minutes each. Most were carried out in person at the school, while a few were conducted online for practical reasons such as scheduling or availability. All interviews were audio recorded using both Nettskjema Diktafon (voice recorder app approved for collecting research data) and Microsoft Teams to ensure data integrity and backup in case of technical issues.
Data Analysis
The interview data were analyzed using thematic analysis (Braun & Clarke, 2006), which provides a flexible framework for identifying and interpreting patterns across qualitative datasets. Given the exploratory nature of the study and the large corpus of interview material (21 interviews), the analysis combined inductive and deductive strategies to balance openness to emerging insights with analytical coherence.
The process began inductively, with open coding of six semi-structured interviews (approximately 29% of the total text corpus) selected to represent variation in gender, age, and subject area. During this initial phase, the first author independently coded interviews and developed initial codes directly from the data without imposing predefined categories. The two authors then met to compare interpretations, discuss preliminary categories, and refine the coding scheme. Through this process, additional codes were added, and overlapping ones merged, leading to an agreed-upon set of coding categories. The remaining interviews were then coded deductively by the first author according to these agreed categories, allowing for consistency while leaving room for minor refinements when new nuances appeared.
The analytic process followed the six phases outlined by (Braun & Clarke, 2006): (1) Familiarization with the data through iterative reading of transcripts while verifying automated transcriptions generated via Nettskjema Diktafon (OpenAI Whisper V3). (2) Generating initial codes by identifying meaningful segments of text relevant to teachers’ experiences, perceptions, and practices. (3) Searching for themes by clustering related codes into potential themes and sub-themes that captured patterns across participants. (4) Reviewing themes by checking consistency and representativeness of coded excerpts across the dataset and refining thematic boundaries. (5) Defining and naming themes to ensure conceptual clarity and distinctiveness. (6) Producing the report by synthesizing each theme and illustrating them with representative quotations.
All transcripts and codes were managed in NVivo to support systematic organization and transparency. The final analysis resulted in three overarching themes and fourteen sub-themes that described teachers’ dispositions toward GenAI and the perceived benefits and challenges of integrating Learny into formative feedback practices.
Findings
The thematic analysis identified three overarching themes and fourteen sub-themes that capture how teachers perceived and engaged with Learny when providing formative feedback. These themes reflect teachers’ dispositions toward GenAI, the perceived benefits of hybrid feedback practices, and the challenges and tensions that emerged during implementation. Each theme is presented below with illustrative quotations to highlight key patterns and variations across participants.
Theme 1: Dispositions to and Strategic Embeddedness of GenAI to Support Formative Assessment and Feedback
This theme comprises three sub-themes that capture teachers’ underlying attitudes and reflections on using GenAI in their formative assessment and feedback practices: trust, professional agency, and willingness to experiment. Together, these sub-themes reveal how teachers position themselves in relation to GenAI, balancing curiosity with caution and maintaining a strong sense of professional responsibility.
Trust
Several teachers described GenAI as a helpful tool, but emphasized the need to always verify the output before using it with students. “I don’t fully trust it, no. But it’s a bit like, we shouldn’t blindly trust it either, but actually use it as a tool.” (T1)
This shows a healthy amount of skepticism when introduced to a new technological system. Wanting to verify the contents of generated feedback, teachers also felt more confident using GenAI in areas they knew well, and expressed uncertainty when the content went beyond their expertise, being unable to verify it themselves. This shows that trust in GenAI seems to be conditional. “I trust it to the extent that it’s things I already know. But if it starts including things I don’t know, then I can’t give that feedback to the students. I have to kind of verify what it says first. If I can’t do that, then I can’t use it either, I think.” (T10)
The sense of responsibility is closely tied to teachers’ professional role and their obligation to justify and explain feedback to students. The need to verify content may, however, become problematic when introducing semi-autonomous systems, as their use presupposes a certain level of trust and a willingness to relinquish some degree of control. Because teachers ultimately remain accountable for the feedback and assessments provided to students, their threshold for trusting GenAI systems may be higher than for less critical applications, such as administrative reporting or real-time student recommendation systems, particularly when the uncertainty of such recommendations is clearly communicated.
Teacher Autonomy and Agency
A strong recurring theme in the interviews was the importance of maintaining professional autonomy and a sense of ownership when using GenAI. Teachers consistently expressed that their role is not only to deliver content but to actively make, validate, and ensure good quality in the feedback they provide to students. “I’ve heard some people say, just click and send, but I really disagree. You don’t have ownership over it. […] I think it’s important that I look at it and agree and know what feedback I’m sending to my students.” (T20)
This shows teachers’ desire to be involved in the final feedback by making decisions on what content should be included and not. However, it also shows that this is not equally important for all teachers, and that some seem more interested in achieving efficiency than in maintaining control. At the same time, some teachers worried that transferring control to GenAI might undermine the teacher’s role in the learning process, as well as introduce anchoring bias, where the AI’s initial interpretation shapes the teacher’s final judgment. These teachers stressed that GenAI should be seen as a supplement and not a substitute, and emphasized the importance of teacher–student interaction and following the student’s development throughout. “I’m a bit afraid that if students only receive feedback from the AI, I lose an important part of my role as a teacher regarding the interaction with the students. I’d miss out on the process and maybe just be a product. For us, it’s important to follow the student’s process and be there with them along the way.” (T10)
Teachers view GenAI as a useful assistant but stress the importance of professional autonomy. They emphasize actively evaluating and refining AI-generated feedback to align with their judgment and knowledge of individual students. Autonomy is considered essential for maintaining educational quality, teacher identity, and meaningful teacher–student interactions. Teachers also note that initial GenAI feedback may bias their assessments, reinforcing the need for teacher ownership and agency. While GenAI support is valued, teachers firmly reject the idea of relinquishing some degree of control to AI.
Adaptation, Integration, and Adoption of GenAI
Several teachers expressed that learning to use GenAI tools came with a steep curve and some initial frustration. This was related both to the fact that learning the new GenAI platform came on top of an already busy teacher schedule and that this decision came from management, and not from the teachers themselves. Therefore, the need for collaboration, training, and collective structures was a recurring topic, and teachers suggested allocating time at the start of the school year to experiment with GenAI tools as a team by trying it out in class before coming back and discussing what worked and did not. Others emphasized the importance of actually using the tools to become more confident. “Regular coursing, and maybe that we have some common program, discuss and share experiences. That we don’t sit by ourselves, because it’s a lot of that around here.” (T19)
As noted earlier, teachers adopt diverse strategies and have varying needs; therefore, a one-size-fits-all approach is unlikely to effectively support teacher–GenAI complementarity. At the same time, several teachers pointed to concrete technical barriers and a lack of platform integration that made adoption harder than necessary. With many teachers being concerned about managing their time effectively, such frustrations might negatively impact the integration and adoption of GenAI tools. Nevertheless, most teachers expressed a fundamentally positive attitude toward the potential of GenAI, especially when framed as a long-term and human-centered shift in education. This openness was often coupled with curiosity and a desire to explore its usefulness, showing that teachers’ resistance to the integration could be more closely linked to a lack of time than to the technology itself. “I think it will definitely help us a lot in society. Seeing patterns, helping with everyday things. […] If we can use it to help plan teaching, then great. Use it as a model and then adapt. That would be perfect.” (T14)
Overall, the findings indicate that adopting GenAI tools requires time, effort, and institutional support for teachers to gain confidence as users. Navigating multiple platforms can be challenging, particularly for older teachers. The data also reveal a sense of limited agency and a need to better understand and learn how to use GenAI effectively. Teachers emphasize collaboration as essential and value opportunities to share experiences through courses or workshops. Despite these challenges, they remain largely positive about GenAI’s potential and stress that successful adoption depends on the ability to make it part of their professional practice (including the needed training and support).
Theme 2: The Important Qualities of GenAI for Feedback and Assessment
This theme encompasses five sub-themes highlighting the benefits teachers experienced when using GenAI in their feedback practices. It covers GenAI’s ability to generate high-quality and timely feedback, its potential to save time, enhance student motivation, and support teachers’ idea generation.
Timely Feedback
One of the most appreciated benefits of GenAI feedback was its immediacy. Teachers frequently highlighted how difficult and time-consuming it is to help all their students in the classroom, and that GenAI enables students to receive feedback exactly when they need it. The following comments focus on the part of the system where students receive generated feedback directly from GenAI and not the version where teachers review the feedback before giving it to them. “That students have access to it and can ask Learny for some feedback, instead of me running around. It helps me a lot. […] It saves me time, and they don’t have to sit and wait.” (T20)
Some teachers highlighted how this helps students to improve their work while their ideas are still fresh, rather than waiting days or weeks for teacher comments. Others pointed to how this timely support enhanced students’ independence and learning agency, while also allowing teachers to support students who needed more in-depth help. “It can be a tool to support them to maybe become more independent and less dependent on asking the teachers about little things all the time.” (T13)
While this might be the perspective of one or more teachers, a critical question to ask is whether students actually become more independent or not. If they use GenAI to handle many of the smaller questions they may have (and questions in general), their dependency might transition partly to AI, making teachers experience their students as more independent. While this might be a trade-off, it will free up time for teachers to support students with greater needs and to answer more cognitively demanding questions.
High-Quality Feedback
Several teachers emphasized that the overall quality of GenAI feedback was surprisingly good and described it as well-structured, clear, and aligned with pedagogical principles. Others were impressed by how specific and rich the feedback was compared to what they would typically have time to write, and reflected on how combining human and GenAI feedback could result in feedback of high quality. “I think it’s very good at giving feedback that follows what we’ve learned about formative assessment: two or three positives, followed by a suggestion for improvement. […] It’s mostly focused on what the student does well. […] I think it seems very qualitatively good.” (T15)
Some teachers even pointed out that the GenAI often highlighted the same issues they would have addressed themselves, and even phrased things in ways similar to their own: “It’s like I wrote it myself. […] I have experienced that AI has quite a similar mindset to mine. Sees the same things and stuff. And then it formulates it a bit like I might have said it myself. […] It’s actually kind of amazing how often it hits the mark based on what the assignment asks.” (T19)
This shows how GenAI feedback often identifies the same strengths and areas for improvement that teachers would themselves, making them experience it as both accurate and specific. Some teachers also found that the GenAI tool can provide more extensive feedback than they typically have time to write, showing how a complementarity between teachers and GenAI might be beneficial. In this way, GenAI was, by some teachers, viewed on as a “second teacher,” enhancing the overall quality of feedback given to students.
Time-Saving Potential
Several teachers emphasized how receiving GenAI feedback drafts could save them time. While most teachers wanted to personalize the feedback, they described the drafts as useful starting points that can be adjusted rather than writing the feedback from scratch. Others described how the time-consuming part of assessment was often not grading itself, but formulating meaningful feedback in a way that students could understand. Here, GenAI offered a valuable shortcut. “I feel like I save some time, it’s a draft for my feedback. Then I go in and adjust and edit it a bit, maybe add something more personal. So I think it does something, I’m probably saving some time on this.” (T11)
Some teachers also stated that this kind of support could reduce the cognitive load of repetitive feedback tasks, freeing up time for other teaching tasks. “For me, it’s about efficiency. If this tool helps me become more efficient at my job and takes away some work, then I’ll use it, without a doubt. That means I get more time for the things GenAI can’t do, […] like calling parents or resolving conflicts, more interpersonal things.” (T8)
Taken together, these findings indicate that GenAI can meaningfully enhance teachers’ efficiency in feedback processes, especially when used to generate initial drafts. Teachers state that formulating clear and constructive explanations for students is often time-consuming, and that GenAI could free time for other relational and pedagogical tasks that it cannot replace. However, a critical question that remains concerns how teachers work with these drafts. Specifically, it is important to understand whether teachers tend to use the drafts largely as generated, or whether they invest additional time in revising, personalizing, and contextualizing them, and how this balance will evolve as teachers become more experienced with the system.
Inspiration and Scaffolding
Another frequently mentioned quality of GenAI was its role as a cognitive and linguistic support tool. Many teachers described how the GenAI drafts helped them get started more easily when writing feedback. Some teachers also emphasized how this starting point supported reflection and gave them ideas for how to formulate better feedback. “It’s great to have a little pointer and then build on that yourself. It gets me started much more easily than starting with a blank page.” (T9)
This indicates that GenAI feedback drafts not only have the potential to increase teachers efficiency, but also create opportunities for critical reflection. Although this support was primarily discussed from a teacher perspective, a few teachers also reflected that the tool might help students discover new angles or gain inspiration for what to revise or improve. “The strong students loved Learny. They got feedback all the time. […] ’Oh, I can write about that!’ So it’s a good starter.” (T21)
In this way, GenAI can also be viewed as a source of inspiration and scaffolding for students. By generating initial feedback drafts, it offers ideas that both teachers and students might take advantage of, in addition to encouraging reflection and improvement in teachers feedback practices.
Motivation
Several teachers noted that GenAI feedback can enhance student motivation by aligning with formative assessment principles. Along with emphasizing shortcomings, it highlights strengths and offers constructive suggestions for improvement. This positive framing is seen as especially beneficial for students with lower self-confidence and self-esteem. “And then there’s both what you’re going to do to improve it, and what you’ve done that’s good that you have to continue with. Because that’s also just as important, I think, especially for someone who may not be that super strong academically.” (T12)
The uplifting tone and availability of GenAI to give feedback were also viewed as a strength in itself, making students feel seen and encouraged. Some teachers pointed to how this positivity could reinforce a sense of competence and success. “There’s something good here, someone actually bothers to praise the students, it is perfect!” (T16)
Overall, teachers believe GenAI feedback enhances student motivation. Its positive tone and accessibility make students feel recognized and supported—especially those who may struggle with self-confidence. Teachers therefore note that GenAI serves as a kind of personal motivator, providing encouragement they may not always have time to give. They also observe that this support can improve students’ work quality, fostering a greater sense of achievement.
Theme 3: The Challenges of GenAI for Feedback and Assessment, and the Remedial Role of HI
This theme encompasses six sub-themes that shed light on the challenges teachers encountered when integrating GenAI into their feedback and assessment practices. These include GenAI’s lack of relational and contextual awareness, teachers’ limited visibility into the immediate feedback students receive from GenAI, the risk of fostering cognitive laziness, and the tendency of AI-generated feedback to be vague or linguistically complex.
Challenges in Feedback Quality
While several teachers praised the overall structure and usefulness of GenAI-generated feedback, others raised concerns about variations in clarity, relevance, and accuracy. The perceived quality of the feedback often depended on the specific assignment, the individual student, and the teacher’s expectations. Moreover, the language used by GenAI was too advanced or unnecessarily complicated for many students. “It’s an okay draft, but I have to be honest, I wouldn’t press send right away, it remains quite much work. Firstly, there’s a lot of feedback on quite little work. […]. I can’t always understand how the student is supposed to use that feedback well. You manage to pick out some things that are correct and such, but there’s a lot of text, there’s difficult language, and there’s some imprecise wording.” (T5)
This finding indicates that some teachers experience GenAI feedback drafts as misaligned with task goals or student needs, requiring substantial revision before they can be shared with students. Some teachers also pointed out how feedback could be too vague, long, repetitive, or misleadingly positive. “A weakness can be that it is very positive, maybe too positive. I think it can trick the students a little into thinking that they’ve done better than they actually have.” (T2)
Overall, teachers noticed several shortcomings when assessing the quality of the GenAI feedback, such as vague or generic content, unnecessarily difficult language, and misleadingly positive phrases. These qualities of generated feedback can make it less approachable and actionable for students, making teacher revisions necessary.
Lack of Relational and Contextual Awareness
One of the most frequently mentioned limitations of GenAI feedback was its lack of relational and contextual awareness. Teachers described how GenAI did not possess knowledge about their students and their individual needs, and feared that feedback, therefore, could feel distant or depersonalized if they did not contribute to it. “I have to adjust it so that the student recognizes me and what we have worked on. […] You have to write a little to make it a little more personal, so that the students do not feel that it is just a machine that is assessing their work and such.” (T19)
Others emphasized that meaningful feedback must motivate students and support them in reaching long-term goals and underscored how their knowledge about their students was necessary to achieve this. “Learny doesn’t know the student the way I do. […] When it comes to motivation, only I know how to motivate the student based on the long-term goal I’ve set. […] We won’t reach it unless the feedback is targeted.” (T4)
Overall, teachers underscored the importance of having relational and contextual awareness, including familiarity with the student’s learning history, emotional state, language proficiency, motivation, and personal goals. Furthermore, teachers have a good understanding of students’ cognitive and metacognitive capabilities and use this information when giving feedback. Therefore, GenAI’s lack of this awareness can cause situations where feedback is not appropriate, and teachers emphasize that effective feedback is not only about correcting content but also delivering it in a way that is supportive to the individual student. Because students might experience GenAI feedback as distant and depersonalized, they highlighted the need to adjust feedback to make it more personal and human-like.
Limited Understanding of Students’ State
Teachers were also concerned about GenAI’s limited understanding of students’ states, and that GenAI feedback might confuse, mislead, or even stress students. One common issue was the lack of consistency and connection across repeated feedback generations, resulting in feedback containing different comments on the same work even when nothing had changed, which caused confusion and uncertainty. “I think you will get students who will ask for feedback very often. And as I understood, it gives different feedback, regardless of whether a single improvement has been made or not. That you just stand in the same place and press feedback, and then you get five different feedback on the same work. […] That will only confuse the student.” (T2)
This finding indicates that teachers worry the endless stream of suggestions might make students feel their work is never good enough. Teachers reported that some students, especially high-achieving ones, kept requesting new feedback and became overwhelmed. Others highlighted the risk of students misinterpreting overly positive feedback as an indicator of high academic achievement, which could cause confusion later in the formal assessment process. “It’s never good enough. […] It doesn’t work, it’s awful for the high-performing kids who are never satisfied, and they keep getting told that you can do this and that better. […] They ask more and more questions, and in the end, they are almost outside of the task. […] They need to learn to say: ‘What I did is good enough. Now I can stop’.” (T21)
Therefore, teachers emphasize that GenAI’ limited understanding of the student’s current state was problematic, where inconsistencies in generated feedback could confuse students, an overload of suggestions may lead to more stress among high-performing students, and overly positive feedback can give students a false impression of their academic level.
Teachers’ Lack of Insight Into GenAI Feedback Students Receive
Several teachers reflected on the implications of not knowing the content of feedback provided to students by GenAI (when using the immediate feedback version). One teacher described the discomfort of being responsible for evaluations that were influenced by unseen GenAI feedback, while others noted how this made it difficult to assess students’ independent capabilities. “The ethical part of it I find difficult. One thing is that the responsibility lies with the school owner, but I am the one who is assigned the task of doing it. […] Let’s say that a student gets a bad grade. Then the student, and maybe parents, can come back later and say ‘Yeah, but I did what the feedback told me’.” (T14)
This raises a larger question about teacher responsibility when integrating technological tools into education. Some teachers also pointed out that conflicting feedback from GenAI and the teacher could lead to confusion about whom to trust, and make it difficult for teachers to build on or align with the GenAI input. “It can be a challenge for the teacher to continue that guidance afterwards, if you don’t know what the student has been guided on beforehand. Because then I think the student will probably be a bit unsure too, should I listen to what the teacher gives in feedback, or should I listen to Learny? […] at least if there is a bit of a discrepancy […] It can actually hinder learning.” (T15)
In more serious cases, teachers raised concerns about the potential for harmful or inappropriate suggestions in generated feedback, especially in sensitive subjects such as religion or politics: “Let’s say that for some strange reason, the student has received feedback about something that is ethically unacceptable. It could be a political message, it could be things that are not within the freedom of speech, and then, in a way, the student is stimulated through AI to just continue [working on] their product. […] What kind of control do we have on the quality [of feedback] in these cases?” (T10)
Taken together, teachers experience discomfort in not knowing what kind of feedback students receive. Because teachers are ultimately responsible for evaluating and guiding their students, they worry about being questioned about assessments they cannot fully justify. Teachers express that unseen GenAI feedback makes it harder to evaluate students’ work and that conflicting advice from teachers and GenAI can confuse students. Moreover, unfiltered or ethically problematic feedback on sensitive subjects poses serious concerns about the appropriateness and integrity of the learning process. All these concerns about the automated feedback highlight the potential remedial role the teacher can have by incorporating an HI approach.
Challenges With Student Self-Regulation
Several teachers voiced concern about students’ ability to self-regulate their use of GenAI feedback tools. In contrast to traditional classroom dynamics, where teachers control the timing, amount, and frequency of feedback, the GenAI system allows students to generate feedback whenever and however much they want. While this can promote greater independence and autonomy, it also places increased responsibility on students to manage their learning process effectively. “There should be a way to limit how often they can press ‘Help me’ […] It should be a maximum of 2-3 times. […] Feedback should be available after working for at least an hour with a task, not just five minutes. […] If you want to train them to think critically, you can’t give them too much help.” (T18)
Others noted that frequent feedback requests could reduce opportunities for students to work independently, reflect, and develop endurance over time, making them dependent on constant input. “I’m not sure, in the long run, if students learn to have perseverance in their work or not. That you can then kind of get feedback all the time about what needs to happen. Because there is something in learning to keep working on a task over time, alone, without getting feedback. Which is also valuable.” (T14)
Overall, teachers expressed ambivalence about giving students full autonomy in how they engage with GenAI tools. On one hand, they saw potential for these tools to strengthen students’ independence and ownership of learning; on the other, they worried that too much freedom could lead to over-reliance or superficial engagement. Some described how students struggled to regulate their use of GenAI and tended to seek feedback repeatedly rather than reflecting on their work. These concerns reveal a tension between promoting agency and ensuring productive regulation, highlighting the need for scaffolds that help students use GenAI feedback purposefully and critically.
Risk of Cognitive Laziness
Teachers expressed concern that while GenAI provides valuable support, it may also reduce the critical engagement and cognitive effort of both students and teachers. Although the tool can scaffold learning and streamline feedback, several teachers feared that excessive reliance on it could lead to passivity and diminish reflection and critical thinking. “If someone always tells you how to move forward, then you’ve lost a part of your critical sense on what you are doing.” (T18)
For students, this risk was associated with over-scaffolding, receiving too much ready-made guidance that could weaken their ability to think independently. For teachers, the concern was that habitual use of GenAI would make them stop thinking for themselves, such that GenAI would replace teachers’ professional judgment and creativity in feedback practices. “We have to be careful not to fall into the trap of thinking that this was easy […] thinking that we can just press a button and then our job is done and that we don’t need the teacher anymore.” (T21)
Overall, teachers emphasized that while GenAI can enhance efficiency, it should not replace human reflection or professional responsibility. Maintaining teacher oversight and pedagogical involvement was seen as essential not only for ensuring the accuracy and appropriateness of feedback but also for preserving cognitive engagement and agency in the feedback process.
Summary
Themes and sub-themes from the analysis with brief descriptions and example quotes
Discussion
The thematic analysis of the semi-structured interviews revealed a wide range of teacher experiences, attitudes, and reflections concerning the complementary and general use of GenAI in assessment and feedback practices. While the teachers had different backgrounds and taught different subjects, several common dispositions, perceived benefits, and challenges emerged through the analysis. These offer a nuanced picture of how teachers can integrate GenAI feedback into their practice.
Main Dispositions When Using GenAI in Feedback Practices
Earlier studies have found that teachers are open to adopting GenAI and AI-based EdTech in their teaching practices (Chiu et al., 2023; Han et al., 2024; Nazaretsky et al., 2022; Shankar et al., 2025). This is also supported by the findings from the interviews in this study, where most teachers expressed a generally positive and open-minded attitude toward the adoption of GenAI tools. Many were also curious and motivated to explore its potential, even when they felt unsure or had little experience with the technology. At the same time, several teachers in this study expressed that they were a bit skeptical at first and that learning to use GenAI tools came with a steep curve and some initial frustration. Therefore, they highlighted the importance of actually using the tools to become more confident, as well as collaborating with other teachers through team workshops or regular reflection sessions. This aligns closely with findings from Brandão et al. (2024) and shows the importance of hands-on activities and collaboration between teachers to build confidence and practical competence.
Another central finding across the interviews was teachers’ emphasis on maintaining professional autonomy and responsibility when using GenAI. While many teachers appreciated the efficiency and support offered by the tool, none viewed it as a substitute for their pedagogical judgment, and they consistently emphasized the importance of reviewing, adapting, and validating GenAI feedback before sharing it with students. This shows that teachers feel accountable for the quality and accuracy of the final feedback, regardless of whether GenAI contributed to it. This is a reassuring finding, particularly when earlier studies have stressed the significance of letting teachers review and override automated AI decisions to keep a certain level of ownership, agency, and control in GenAI-teacher collaboration (Banihashem et al., 2024; Coenen & Pfenninger, 2025; Giannakos et al., 2025; Nazaretsky et al., 2022). Furthermore, their trust in GenAI was often conditional, based on the teacher’s existing knowledge, familiarity with the content, and ability to justify the feedback to students. This finding has received little attention in previous studies and shows that trust in GenAI feedback is deeply intertwined with teachers’ professional confidence. In line with this, teachers also mentioned the discomfort of not knowing what students received feedback on from GenAI directly, as this could both make it difficult to assess what students achieved without GenAI and confuse students if teachers gave conflicting advice.
More broadly, teachers acknowledged that GenAI is likely to become increasingly central in education and expressed a professional responsibility to stay updated and prepare students accordingly. However, this transition brings challenges; earlier studies report teachers’ concerns about increased workload (Kumar & Sharma, 2025; Nazaretsky et al., 2022), ethical issues such as bias and data privacy (The-Open-Innovation-Team, 2024), and limited understanding of how AI systems function and how to use them responsibly (Chiu et al., 2023; The-Open-Innovation-Team, 2024). The initial introduction of the GenAI tool revealed that teachers had little prior experience with such technologies and insufficient knowledge needed for appropriate integration. This underscores the need to strengthen teachers’ AI literacy—the ability to understand, use, and critically evaluate AI tools—which many studies identify as essential for the safe and effective adoption of GenAI (Brandão et al., 2024; Han et al., 2024; Kumar & Sharma, 2025; Ley et al., 2025; Shankar et al., 2025). After integrating GenAI into their feedback practices, teachers also reported feeling overwhelmed by the multitude of platforms and the challenge of navigating them. Similar findings are noted by Kumar & Sharma (2025), where secondary school teachers felt left to manage GenAI integration independently. Collectively, these results emphasize the importance of leadership support, clear expectations, and institutional guidelines to enable responsible and sustainable GenAI use in education.
Balancing Efficiency With Teachers’ Judgment and Responsibility
This study identifies several concrete benefits of using GenAI to support feedback practices. The interviews showed that a common advantage was the overall quality of the AI-generated feedback. Several described the feedback as well-structured, pedagogically sound, and sometimes even formulated in a way that resembled how they would have written it themselves. This gave teachers confidence in using the tool to draft starting points for their own feedback, and let students receive instant feedback directly from GenAI. Consistent with earlier findings, our results indicate that immediacy is one of the most valued benefits of GenAI feedback, as it provides students with guidance exactly at the moment they need it, reducing waiting times and enabling timely revisions (Burner et al., 2025; Coenen & Pfenninger, 2025; Han et al., 2024; Lai et al., 2024). Teachers in our study also explained that such timeliness helped them manage classrooms more effectively and freed time for individualized support.
At the same time, the interviews revealed several concerns about using generated feedback. Teachers pointed to variation in the clarity, relevance, and accuracy of GenAI feedback and worried that overuse might overwhelm students or foster over-reliance and low reflection. Several also expressed discomfort in being responsible for evaluations without knowing what feedback the GenAI had provided to students. Consequently, most emphasized the need to review and adjust GenAI feedback before sharing it. This aligns with earlier research showing that while teachers often approve large parts of GenAI feedback—particularly on grammar and style—they still revise it to ensure appropriateness, accuracy, and a personal touch (Sung et al., 2025; Xu et al., 2025).
Moreover, because dialogue and tailored guidance remain central to formative feedback (Perry et al., 2020), it is essential that teachers use the time saved to follow up with students. Otherwise, students may be left to interpret and act on the AI-generated feedback on their own, increasing the risk of misunderstandings and inequitable learning outcomes, especially for those who struggle to self-regulate their learning. So although the results show that GenAI can produce feedback that aligns with teachers’ professional standards, and teachers want to use real-time GenAI feedback in their classrooms, there is a trade-off between efficiency and teachers’ judgment and responsibility. This challenge is particularly pronounced in the case of real-time GenAI feedback, as teachers have neither contributed to nor verified the content and may not even be aware of it, limiting their ability to align subsequent instruction. In contrast, when teachers generate and revise GenAI feedback drafts, they can exercise professional judgment and maintain a stronger sense of control and agency, albeit at the cost of additional time.
A key concern with both approaches, and real-time GenAI feedback especially, is that the interpretive work of assessment is performed largely by GenAI itself. Even when teachers revise GenAI feedback, this can create GenAI-teacher dynamics that undermine feedback and assessment (e.g., cognitive laziness and hinder teachers’ agency) (Fan et al., 2025; Giannakos et al., 2025). In particular, this workflow could inadvertently shift the teacher’s role from a proactive assessor who reflects and crafts feedback (and then potentially refines it with GenAI assistance) to a reactive editor (polishing feedback entirely generated based on the assessment of GenAI), creating the potential for anchoring bias, where GenAI’s initial interpretation shapes the teacher’s final judgment. In this sense, the system may pre-empt teachers’ evaluative reasoning, positioning educators more as validators than originators of assessment insight. Teachers themselves emphasized the importance of sustaining professional judgment and creativity. This raises a central question: how can teacher–AI (HI) sociotechnical ensembles and dynamics be designed to balance efficiency with teachers’ judgment and responsibility?
Reframing Hybrid Intelligence: Trade-Offs, Automation, and Teacher Agency
The findings reveal that teacher–AI collaboration involves inherent trade-offs and sometimes even tensions rather than a straightforward improvement in efficiency. While GenAI enables faster and more frequent feedback, increased automation may shift teachers’ roles from evaluators toward validators of GenAI output. This redistribution of cognitive work creates tension between scalability and professional responsibility: providing timely feedback at scale often requires delegating evaluative tasks to GenAI, yet doing so may weaken teachers’ sense of ownership and reduce opportunities for pedagogical reflection. Our results therefore suggest that efficiency gains in HI systems are not neutral but reshape professional practice by redistributing judgment, accountability, and creative labor. Rather than asking whether GenAI can match human feedback quality, a more critical question becomes how much evaluative authority teachers are willing, or able to delegate without compromising professional identity and educational responsibility.
Viewed through the Six Levels of Automation framework (Molenaar, 2022), our findings refine HI by showing that effective collaboration does not occur at a single optimal level of automation but through dynamic movement between levels. Conditional automation, where teachers review or adapt AI-generated drafts, preserved professional agency but demanded additional time, whereas higher automation enabled immediacy and scalability at the risk of reduced teacher involvement. Importantly, teachers did not reject automation itself; rather, they sought mechanisms that allowed retrospective oversight, intervention, and contextual adjustment. This indicates that HI should be understood less as keeping humans “in the loop” at all times and more as enabling meaningful opportunities for human influence across the workflow, even when direct participation is intermittent. In this sense, HI becomes a matter of designing sociotechnical arrangements that support flexible control rather than fixed divisions of labor.
Our findings therefore extend the concept of HI by highlighting relational and contextual limitations of AI in educational settings. Although GenAI can approximate the structure and tone of human feedback, teachers emphasized that meaningful feedback depends on knowledge of students’ emotional, motivational, and developmental contexts—dimensions that remain difficult to automate. This challenges implicit assumptions within some HI formulations that complementary strengths alone guarantee effective complementarity and collaboration. Instead, our results suggest that HI in education must account for professional judgment and human presence as constitutive elements of learning rather than optional safeguards. HI should thus be conceptualized not merely as performance optimization through human–AI combination, but as a negotiated balance between efficiency, trust, and pedagogical responsibility, where human agency functions as the coordinating principle guiding GenAI contribution.
Implications
Design Implications for HI-Aligned Feedback Systems
In this study, design principles refer to actionable sociotechnical configurations that shape how responsibility, decision-making, and expertise are distributed between teachers and GenAI across the feedback workflow. Rather than evaluating GenAI performance in isolation, these principles foreground complementarity—how GenAI augments, supports, or temporarily substitutes specific instructional tasks while preserving teachers’ professional judgment. For example, context anchoring through teacher rubrics and examples ensures that GenAI outputs reflect pedagogical intent, while configurable learner-level settings and transparency dashboards make GenAI behavior interpretable and adjustable within real classroom conditions. In this way, HI becomes an analytical lens for understanding how automation reshapes teacher cognition and decision-making, highlighting that effective collaboration depends less on technological capability alone and more on how systems are designed to sustain human agency and instructional reasoning.
Aligned with emerging research priorities, these principles also outline concrete directions for the development of educational AI systems. First, HI-aligned tools require AI affordances that explicitly support educational complementarity, including transparent models, editable feedback histories, and uncertainty signaling that allow teachers to calibrate trust without relinquishing control. Second, effective systems must be co-designed with stakeholders, embedding teachers’, students’, and school leaders’ perspectives into system functionality so that GenAI tools reflect the realities and constraints of classroom practice rather than functioning as external technical add-ons. Third, the principles highlight the need for professional learning models for hybrid systems that move beyond general AI literacy toward pedagogical reasoning with AI-generated data—supporting teachers in interpreting feedback analytics, monitoring student regulation, and making instructional decisions grounded in learning theory. Finally, the design implications acknowledge broader shifts in professional roles and identity: as GenAI redistributes cognitive and creative labor, systems must be intentionally designed to reinforce teachers’ roles as pedagogical orchestrators rather than passive overseers. Taken together, these implications position HI not merely as a conceptual framing but as a design-oriented research agenda in which educational AI evolves through iterative human–AI co-learning, sustained teacher involvement, and carefully balanced automation that enhances efficiency without undermining professional responsibility.
Teachers’ Professional Practice
The integration of GenAI into feedback practices presents both opportunities and challenges for supporting teachers’ formative assessment and feedback. Our findings indicate the importance of GenAI not being a replacement for teachers’ feedback; at the same time, it highlights the benefits of responsibly incorporating it as a supplementary tool that requires active engagement, adaptation, and pedagogical oversight. Our study highlights the importance of empowering teachers to maintain their professional autonomy and the responsibility to critically assess and revise AI-generated feedback before sharing it with students. This underscores the importance of protecting and supporting teacher agency when introducing GenAI into educational contexts. This includes giving them adequate time and space to experiment with GenAI, reflect on its role, and learn when and how it can best support student learning. Additionally, professional development should not only focus on giving teachers technical and practical skills, but also on increasing AI literacy and agency, as teachers need the ethical, pedagogical, and critical understanding of GenAI’s role. To facilitate meaningful and responsible use of GenAI, both pre-service and in-service teacher education should therefore include structured training and exposure to GenAI tools, which will provide teachers with safe opportunities to experiment, develop the necessary competencies, and integrate such tools into their practice.
School Development and Professional Learning
The findings also highlight a strong need for structured collaboration and shared professional learning around the integration of GenAI tools. Teachers expressed a desire for dedicated time to try out tools together, reflect as a team, and share both challenges and best practices. School leaders and decision-makers should therefore consider facilitating this through common and safe spaces (e.g., workshops), collaborative pilot projects, and integrating reflection around GenAI into existing professional development structures. This is essential to prevent isolated practice and uneven adoption, as noted as a concern in the interviews in this study. Furthermore, schools and school governing bodies (municipalities in our case) should be aware of the additional time demands placed on teachers during the early phases of GenAI adoption and allocate resources accordingly. When GenAI tools are introduced without support or structure, the risk increases that teachers will either reject the technology or use it uncritically.
Technology Design and Implementation
The challenges identified in this study point to several directions for improving the design of GenAI feedback tools. First, GenAI systems must be flexible enough to accommodate diverse subjects, educational goals, and classroom contexts. In this study, GenAI was used to support writing, language learning, and subject-specific feedback in areas such as math and science. These varied applications highlight the need for either broadly capable models or smaller, fine-tuned systems tailored to educational domains. Future systems should avoid “one-size-fits-all” approaches and integrate these insights by fine-tuning models based on teacher edits or enabling customizable feedback profiles by subject, level, or tone. Moreover, teachers expressed concern about the lack of transparency and explainability regarding the feedback students received independently. Improving the traceability, transparency and explainability of GenAI feedback-through the use of proper dashboards, logs, or interfaces that show why particular suggestions were made-would strengthen teacher oversight and trust. Integrating explainable AI (XAI) techniques, such as SHAP or LIME, could also help visualize GenAI reasoning and enhance teacher–AI collaboration.
Further, teachers emphasized the need to manage students’ use of GenAI to prevent over-reliance. Feedback systems should incorporate mechanisms that encourage reflection and responsible use, such as limits on feedback iterations or reflective prompts between revisions. Involving teachers in the design process, via participatory design approaches, is key to ensuring classroom relevance, pedagogical soundness, and sustainable use. Finally, feedback quality can be enhanced through improved prompting and model refinement. Research shows that prompts specifying role, tone, and feedback style can substantially improve output (Steiss et al., 2024). Leveraging teacher-edited feedback as training data and incorporating student feedback ratings on clarity and usefulness could create adaptive systems that better emulate teachers’ pedagogical standards.
Balancing Automation With Control, and Call for HI Sociotechnical Ensembles to Support Feedback and Assessment
This study underscores the need to carefully balance automation and teacher control in GenAI-supported feedback and assessment systems. Fully automated feedback—where students independently request and receive GenAI suggestions—has clear limitations. These include teachers’ reduced visibility into the generated content, limited opportunities to intervene, and the potential for feedback that lacks personalization or contextual relevance. Conversely, systems that require teachers to manually prompt and approve every feedback instance can slow down workflows and undermine the timeliness that makes GenAI tools valuable in the first place. While the teachers appreciated the ability to revise GenAI drafts to maintain pedagogical ownership and responsibility, they also found the manual generation process time-consuming and questioned whether it ultimately improved efficiency. These concerns reflect not only usability challenges and the need to develop new feedback habits but also broader design questions about how GenAI systems can support meaningful teacher oversight without adding to their workload. A promising direction lies in hybrid configurations, where teachers can monitor and intervene in student–GenAI interactions without approving every feedback instance. Such systems could preserve teacher flexibility and pedagogical integrity while enabling efficient workflows. This approach aligns with the principles of Human–AI Interaction, in which control and responsibility are shared between humans and GenAI systems, and calls for the next generation of HI systems that develop the needed socio (e.g., literacy, agency) and technical (e.g., explainability, transparency) capabilities, and dynamically attempt to strike the balance that is needed.
Limitations
Despite the novelty of this case study, several limitations should be acknowledged. First, as a single case study conducted in one secondary school using a specific GenAI platform, the findings may reflect the particular design of that tool—its model, interface, and workflow—which likely shaped teachers’ experiences and perceptions. Second, the small sample, drawn from a Scandinavian context with distinct cultural characteristics, may limit the transferability of the findings to other contexts. Teachers’ backgrounds may also have influenced the results, as several had limited prior experience with GenAI, reflecting early adoption rather than established practice. Third, aspects of the pilot design itself may have affected the outcomes. Fourth, the teacher–AI complementarity in the case study was a bit asymmetrical (AI was initially creating the draft, while other approaches where teacher or both sides contribute to the initial draft, might bring different results). Finally, the analysis was conducted by one researcher, and although systematic coding procedures were applied, inter-coder reliability was not assessed. Overall, the findings represent a single case shaped by one tool and one group of teachers. Future research should involve refining GenAI tools (e.g., models, interfaces), collecting richer data such as teacher trace analytics, and conducting large-scale, longitudinal studies to explore how educators and GenAI systems co-evolve in feedback practices over time.
Conclusion and Future Work
This study has shown that GenAI-supported feedback systems in educational settings must carefully balance automation and teacher oversight. Fully automated feedback, while efficient, risks diminishing teacher insight and allowing less contextual or personalized feedback; requiring manual prompting for every feedback instance ensures control but can erode the immediacy and workflow efficiency that make GenAI tools attractive. Our findings suggest that hybrid configurations—in which teachers can monitor and intervene in GenAI–student interactions without needing to approve every instance—offer a promising middle ground. To realize this potential, systems must improve the pedagogical fidelity and relevance of GenAI feedback and provide intuitive tools and practices so teachers can review and refine feedback with minimal overhead.
For future work, it will be essential to (a) measure the impact of different levels of teacher involvement on student outcomes and teacher satisfaction; (b) explore how system design can support seamless teacher intervention without disrupting feedback flow; and (c) investigate how these configurations scale across different subject matters, class sizes, and institutional contexts. This line of enquiry will be critical if GenAI tools are to augment, rather than replace, human judgment in feedback processes—preserving both efficiency and pedagogical integrity.
Footnotes
Acknowledgments
This work is funded by the Research Council of Norway (RCN) through the AI Centre for the Empowerment of Human Learning (AI LEARN), with a project number: 357493.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
No conflicting interests to report for this submission.
