Explainable Agentic AI for Big Data–Driven Evaluation and Visual Analytics of Digital Literacy in Higher Vocational Teacher Education

Abstract

Large-scale, diverse data produced by higher vocational teacher colleges’ digital transformation challenges traditional methods for evaluating digital literacy. The reliability of current analytics and black-box artificial intelligence (AI) models for educational decision-making is limited by their frequent lack of autonomy and transparency. In order to assess digital literacy at higher vocational teacher colleges using big data and visual analytics, this study suggests an Explainable Agentic AI framework. In order to facilitate adaptive data exploration, competency evaluation, and insight generation across multimodal educational data, such as learning behavior logs, assessment records, and digital engagement indicators, the framework combines autonomous agentic intelligence with explainable AI (XAI). While XAI methods offer clear explanations of literacy aspects, decision rationale, and uncertainty, agentic components dynamically handle data processing, feature reasoning, and model selection. Effective human–AI collaboration is made possible by an interactive visual analytics layer that allows for layered investigation of learner patterns, temporal dynamics, and cohort heterogeneity. When compared with traditional machine learning techniques, experimental results on large-scale datasets from higher vocational teacher colleges show better assessment accuracy, robustness, and interpretability. This work demonstrates the promise of agentic AI for explainable big data exploration and promotes reliable instructional intelligence by combining agentic autonomy, explainability, and visual analytics within a scalable big data paradigm.

Keywords

agentic AI digital literacy explainable AI (XAI)higher vocational education visual analytics

Introduction

The incorporation of cutting-edge digital technologies into higher vocational education is now a basic operational necessity rather than an idealistic objective. The need for a workforce skilled in sophisticated cyber-physical systems has increased as the global industrial landscape moves toward Industry 4.0 and 5.0 paradigms.¹ Vocational instructors are under unprecedented strain as a result of this change; they must not only become proficient in these technologies but also have the pedagogical digital competence to properly teach students these abilities. As a result, vocational teachers’ digital literacy has emerged as a crucial quality indicator for educational establishments across the globe.

In the workplace, digital literacy goes beyond fundamental operational skills and includes a complex matrix of skills such as data awareness, technopedagogical design, ethical application, and the capacity to navigate more independent digital ecosystems.^2,3 However, there is a significant backlog in the evaluation of these competencies. Even though the digitization of vocational schools has resulted in the creation of enormous, diverse datasets—from professional development records to unstructured engagement metrics and Learning Management System (LMS) interaction logs—the analytical frameworks used to interpret this “Big Data” are still mostly static and opaque.⁴ Relying on sporadic, self-reported surveys that fall short of capturing the dynamic, behavioral reality of digital competence, educational administrators, and policymakers are frequently drowning in data while starved for knowledge.⁵

The primary issue is the disparity between the amount of data that is accessible and the interpretability of the conclusions that may be drawn from it. To forecast teacher performance or pinpoint professional development needs, traditional educational data mining (EDM) and learning analytics (LA) techniques frequently rely on “black-box” machine learning models, such as deep neural networks or intricate ensemble techniques.⁶ Although these models are capable of achieving great predicted accuracy, they usually fall short of offering the transparency needed for critical educational decision-making. Without a thorough grasp of the underlying behavioral factors, educators are given ratings or classifications; this “black box” situation undermines confidence and prevents effective intervention.⁷ Additionally, typical analytics pipelines lack the flexibility to adjust to the dynamic, ever-changing nature of digital learning environments and are inflexible, requiring human configuration.⁸

This article proposes a paradigm change from passive predictive analytics to explainable agentic AI in order to close this gap. Agentic AI systems are distinguished by autonomy, reasoning, and the capacity to pursue complicated goals through iterative planning and tool use, in contrast to classical AI systems that operate as tools awaiting human instruction.⁹ An agentic system can independently traverse the enormous data lakes of a vocational institution in the context of digital literacy evaluation, detecting pertinent features, choosing suitable analytical models, and producing justifications for its conclusions without continuous human oversight.¹⁰ This feature changes the evaluation process from a recurring, static event to an ongoing, adaptive conversation between the teacher and the intelligent system.

However, additional accountability issues are brought forth by autonomy. An autonomous agent that assesses teacher ability needs to be strictly accountable, have clear decision-making processes, and draw conclusions that are supported by evidence.¹¹ Explainable AI (XAI) techniques are thus immediately incorporated into the agentic process in this study. To have the agents “show their work,” we use model-agnostic methods like SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations), which provide fine-grained feature attribution for each evaluation.¹² Importantly, an interactive visual analytics (VA) layer synthesizes these explanations rather than delivering them as raw data.¹³ This layer is intended to facilitate “Visual LA,” enabling stakeholders to investigate learner trends, confirm agent reasoning, and participate in “human-in-the-loop” improvement of the evaluation standards.

The creation and validation of the Educational Competency Assessment via Hierarchical Agents (ECA-HA) framework are described in this publication. In order to analyze educational big data, we define a novel Hierarchical Multi-Agent System (HMAS) architecture in which specialized agents—in charge of data ingestion, analysis, explanation, and orchestration—work together.¹⁴ We empirically assess the performance, robustness, and interpretability of the framework using the Open University LA Dataset (OULAD) as a high-fidelity proxy for vocational data.¹⁵ Our findings show that the combination of visual explainability and agentic autonomy greatly increases the reliability of educational intelligence, opening the door for more egalitarian and responsive teacher development programs.

Theoretical Background and Related Work

The crisis of digital literacy assessment in vocational education

The concept of digital literacy in higher vocational education is intricate and multifaceted. Developing industry-relevant skills requires more than just software proficiency; it also requires the ability to incorporate digital resources into instructional practices.¹⁶ Professional involvement, digital resources, teaching and learning, assessment, empowering learners, and promoting learners’ digital competence are among the dimensions identified by the European Framework for the Digital Competence of Educators (DigCompEdu).¹⁷ The requirement for industry-specific digital competences, such as running virtual CNC machines, utilizing digital twin technology, or navigating specialist health care informatics systems, further complicates this in the vocational sector.

Even though these theoretical frameworks are extensive, actual evaluation is still based on antiquated techniques. Self-reported surveys are used by many organizations, but they are biased (Dunning–Kruger effect) and do not accurately reflect the behavioral realities of digital participation.¹⁸ Vocational colleges produce “Big Data” with great volume, velocity, and variety as they go through a digital transition. A detailed “digital footprint” of a teacher’s proficiency is provided via interaction logs from Virtual Learning Environments (VLEs), submission timestamps, forum discussions, and resource usage habits. However, traditional statistical methods frequently fail to capture the nonlinear correlations between engagement behaviors and literacy outcomes, and the sheer quantity and sparsity of this data make manual analysis difficult.⁴

The interpretability gap in EDM

To make use of this data, EDM, and LA have emerged. They use methods such as deep learning and logistic regression to classify competences and forecast learning outcomes.⁵ The superiority of sophisticated models such as artificial neural networks (ANN) and gradient boosting machines (e.g., XGBoost) in managing the high-dimensional character of educational data has been shown in recent studies.⁶

However, there is now a gap in interpretability due to the use of these “black-box” models. Predictive accuracy is insufficient in high-stakes situations like teacher evaluations; stakeholders want to know why a particular evaluation was conducted.⁷ An approach that designates a teacher as “at-risk” for low digital literacy without providing an explanation of how “low diversity of resource utilization” as opposed to “infrequent login activity” led to the judgment offers no practical remedial strategy. In addition to undermining trust, this lack of openness may mask biases in the training data, resulting in unjust evaluations. This is in line with the larger push for “Responsible AI” in education, which prioritizes accountability, transparency, and fairness.¹¹

The agentic turn: from automation to autonomy

Agentic AI is currently replacing static models in the field of AI. Agentic AI systems have some agency—the capacity to observe their surroundings, reason about objectives, and take actions to accomplish those objectives across long time horizons—in contrast to traditional AI, which simply operates as a sophisticated function approximator mapping inputs to outputs.⁹

Key characteristics of agentic AI relevant to educational assessment include

Goal-orientation: High-level instructions (such as “Assess the digital literacy of the engineering faculty”) can be accepted by agents, who then independently break them down into smaller tasks (such as data retrieval, cleaning, modeling, and reporting).¹⁴

Tool use: Agents can do activities beyond their own training data by utilizing external resources such as statistical libraries, visualization engines, and APIs.¹⁹

Reflection and adaptation: When early attempts fail or produce low-confidence outcomes, advanced agents use feedback loops to evaluate their own outputs and improve their techniques.²⁰

Agentic workflows (such as the ReAct pattern: Reason and Act) enable dynamic hypothesis creation and testing in the context of Big Data research. An agent may loop over the data and find important patterns and anomalies on its own, saving a human analyst from having to manually specify each feature interaction.

VA and XAI

Agentic insights must be understandable to human users in order to be operationalized. The mathematical basis for this intelligibility is provided by XAI. Researchers can now attribute model predictions to particular input features using techniques like SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME), which have become the norm for post-hoc interpretability.^12,21 While LIME offers local approximations that are helpful for explaining specific cases, SHAP, which is based on cooperative game theory, offers constant global feature importance.

XAI and the end user are connected by VA. “The science of analytical reasoning facilitated by interactive visual interfaces” is its definition.¹³ Dashboards that display learner data are the main way that VA appears in education. However, the “Visualization Literacy” of the users must be taken into consideration when designing these dashboards.²² Teachers may become overwhelmed by complicated charts on a poorly designed dashboard, which could result in misunderstandings. “Visual LA”—the integration of EDM, LA, and visualization to enable sense-making—is essential to effective VA for XAI. VA technologies enable users to calibrate their faith in the AI agent by visualizing both the data and the explanations (such as force plots of SHAP values), confirming that the agent’s reasoning is consistent with educational domain knowledge.²³

Explainable Agentic AI Framework

We suggest the ECA-HA architecture as a solution to the problems of scale, opacity, and rigidity in existing assessment systems. With the help of a multi-agent architecture, this system can handle heterogeneous big data from vocational education settings and provide understandable insights through an interactive VA dashboard.

Architectural philosophy: HMAS

In order to guarantee scalability and dependability, architecture selection is crucial. We use the HMAS model.^14,24 Decentralized or “swarm” agent systems are resilient, but they frequently lack the accountability and clear line of command needed for educational evaluation. A group of specialized subordinate agents are overseen by a high-level “Supervisor” or “Orchestrator” agent in a hierarchical system. This structure guarantees that important choices (such as teacher evaluations) go via centralized supervision processes before being made available to the user and reflects the organizational hierarchy of educational institutions.

The ECA-HA Hierarchical Multi-Agent Architecture is shown in Figure 1. As the main coordinator, the Supervisor Agent breaks down user inquiries into smaller assignments for specialist agents (Ingestion, Analysis, and Explainability). Before being displayed in the VA Layer, data travels from the bottom Data Layer through the Agentic Layer, where it is processed and enhanced with explanations.

FIG. 1.

ECA-HA hierarchical multi-agent architecture.

FIG. 2.

Agentic reasoning loop.

The suggested system functions according to a bounded (weak) agency paradigm, in which agents independently carry out analytical tasks while yet being subject to human supervision and predetermined assessment rules. This taxonomy sets the framework apart from robust agency systems that establish goals on their own and guarantee institutional responsibility.

The agentic layer: Roles and responsibilities

The Agentic Layer assigns computational tasks to specialized Python-based tools while using large language models (LLMs) as the cognitive foundation for reasoning. A structured Model Context Protocol (MCP) is used by the agents to transmit commands and data in a consistent manner.²⁵

Instead of making decisions, LLMs just act as reasoning and orchestration engines. To reduce hallucinations and keep all evaluative results data-grounded, structured prompting, deterministic temperature settings, tool-verified execution, and rule-based output validation are used.

Inter-agent communication is governed by a structured MCP that defines standardized message schemas, error propagation procedures, and acknowledgment semantics. A sequence diagram illustrating agent interactions has been incorporated to improve architectural clarity.

The supervisor agent (The orchestrator)

The supervisor agent serves as both the main human interface and the brain of the system. Task decomposition and orchestration are under its purview.²⁴

Function: The Supervisor divides the user’s question, “Evaluate the digital literacy of the Mechanical Engineering department based on last semester’s VLE activity,” into smaller tasks. (1) Obtain VLE logs for the designated department and period; (2) Prepare data to compute engagement metrics; (3) Execute the evaluation model; (4) Provide justifications for the outcomes; and (5) Compile the report.

Reasoning loop: To track the development of subordinate agents, it uses a “Thought-Action-Observation” loop (ReAct pattern).²⁰ The supervisor chooses whether to retry, impute data (with flags), or escalate to the user if a subordinate fails (e.g., missing data).

The supervisor agent automatically escalates the case for human review if explanation fidelity or confidence ratings fall below acceptable ranges or predefined criteria, especially for negative evaluations as shown in Fig. 2.

The data ingestion agent

Big Data’s “Variety” and “Volume” issues are addressed by this agent.

Function: It communicates directly with the raw data sources (API endpoints, NoSQL logs, and SQL databases).

Autonomy: This agent can independently identify schema changes or data drift, in contrast to static ETL scripts. For example, the agent employs semantic reasoning to transfer new field names (like “user_interaction_time”) to the conventional schema (like “duration”) without causing the pipeline to break if the LMS modifies its logging format.

The analysis agent (The data scientist)

The primary computational evaluation is carried out by the analysis agent.

Function: It oversees the lifespan of machine learning. It chooses the best algorithm from its registry (e.g., Random Forest, XGBoost, LSTM) based on the features of the data and the particular query.⁶

Feature reasoning: It carries out feature engineering on its own. By evaluating the variation of login timestamps, for instance, it may determine a “Digital Consistency Score” based on the idea that consistent participation indicates more literacy than occasional binges.

Model selection: It trains several candidate models using AutoML approaches and chooses the one that best balances accuracy and generalizability.¹⁴

The explainability agent (The critic)

This agent makes sure that the analysis agent’s “black box” is transparent. •

Function: The Explainability Agent uses post-hoc XAI techniques to question the model after the Analysis Agent makes a prediction.¹²

•

Capabilities: ◦

Global explanation: It determines the systemic determinants of digital literacy throughout the cohort by calculating global feature relevance (such as mean SHAP values).

◦

Local explanation: For each teacher, it creates instance-specific explanations (LIME/SHAP) that explain why a certain profile was marked as “At-Risk” or “Proficient.”

◦

Faithfulness check: By computing metrics like as fidelity ($R^2$) and stability, it assesses the dependability of its own explanations and flags explanations that are statistically uncertain.²⁶

The VA layer

The structured outputs of the Agentic Layer are converted into a visual story via the Interaction Layer, a web-based dashboard. It is intended to facilitate “Visual LA,” which combines interactive visuals with data mining findings.¹³ •

Macro-view (Cohort level): Administrative decision-making is supported by this viewpoint. It displays the distribution of digital literacy scores among various departments or populations using density plots and heatmaps. It draws attention to structural patterns and anomalies found by the Analysis Agent (e.g., “Department A shows high engagement but low assessment completion”).

•

Micro-view (Individual level): This perspective encourages teachers to reflect on their own work. The “Digital Competency Profile” is displayed using: ◦

Activity radar charts: mapping abilities across dimensions (assessment, communication, and content creation).

◦

XAI force plots: displaying the “push and pull” of various behaviors on the final score by visualizing SHAP values.¹²

◦

Natural language narratives: “Your high forum participation positively impacted your score, but low interaction with diverse resource types limited your overall rating” is an example of how the supervisor agent synthesizes the XAI data into readable English.

Methodology and Experimental Setup

We carried out a thorough experimental investigation to validate the ECA-HA framework. We used the OULAD¹⁵ as a high-fidelity proxy because of privacy issues and the lack of a public, large-scale dataset tailored to vocational teachers.

Dataset justification and preparation

In EDM research, the OULAD is well-known. It includes information on over 10 million clickstream engagements with the VLE, assessment outcomes, and demographics from 32,593 students across 22 course presentations.

Proxy validity: The digital behaviors expected of vocational teachers taking part in online professional development or overseeing digital courses closely resemble the interaction patterns documented in OULAD, which include accessing digital resources (pdf, html), submitting assessments (quiz, tma), and participating in forums (forumng).²⁷ The dataset is a perfect testbed for assessing our agentic framework’s “Big Data” capabilities because of its high dimensionality and volume.

Data ingestion and feature engineering

The raw CSV files (studentVle.csv, assessments.csv, and studentInfo.csv) were to be processed by the data ingestion agent. It carried out the following pipeline on its own. 1.

Cleaning: Withdrawn students are eliminated, and missing values in the age_band and imd_band (Index of Multiple Deprivation) columns are handled.

Aggregation: id_student is used as the primary key to connect various data tables.

Feature construction: Based on the mapping in Table 1, the agent created a feature vector for every profile.

Table 1.

Mapping OULAD features to teacher digital literacy dimensions

DigCompEdu dimension	OULAD feature proxy	Description of behavior
Professional engagement	forumng (Interaction Count)	Frequency of communication with peers/students via digital platforms.
Digital resources	resource_diversity (Derived)	The number of unique content types accessed (PDF, HTML, URL).
Teaching and learning	sum_click (Total Activity)	Overall intensity of digital platform usage for educational purposes.
Assessment	quiz/questionnaire activity	Engagement with digital assessment tools.
Empowering learners	temporal_consistency (Derived)	Regularity of access, indicating sustained digital support capability.

Target variable

We established a binary categorization target—High Competency versus Needs Improvement—to mimic an evaluation of digital literacy. “Distinction” or “Pass” profiles were classified as High Competency, while “Fail” or “Withdrawn” profiles were classified as Needs Improvement.

Proxy validation and feature—behavior equivalency analysis

In order to improve the construct validity of OULAD as a stand-in for digital literacy among vocational teachers, we present a feature—behavior equivalency mapping based on the DigCompEdu framework. DigCompEdu competency domains, which include digital resources, assessment, and professional engagement, were rigorously aligned with LMS-derived indicators, such as resource diversity, assessment punctuality, and forum engagement depth. Additionally, by adding controlled Gaussian noise (±5%–10%) to behavioral data, a sensitivity perturbation analysis was carried out, revealing that SHAP attributions and competency forecasts held steady across perturbations. These findings suggest that rather than serving as dataset-specific artifacts, LMS actions consistently resemble professional digital practices.

The suggested ECA-HA framework is naturally compatible with ordinal or multi-level skill stratification, even if a binary classification was used for benchmarking. In particular, ordinal regression or multi-class gradient boosting can be used to represent DigCompEdu-aligned skill tiers (e.g., Foundation, Intermediate, Advanced, and Expert) without requiring architectural changes. By maintaining competency gradients rather than reducing them to binary labels, this addition improves ecological validity.

Model selection and training protocols

The analysis agent was set up to choose the optimal prediction model using an AutoML technique. Four potential architectures were assessed. 1.

Logistic regression (LR): a linear baseline that is easy to understand but not very good at conveying complexity.

Random forest (RF): a solid ensemble approach.

XGBoost: A framework for gradient boosting that is tuned for performance and speed on tabular data (6).

ANN: An example of a “black-box” deep learning method is a multi-layer perceptron.

Eighty percent of the dataset was used for training, and the remaining 20% was used for testing. To make sure the performance measures were reliable, stratified K-Fold cross-validation (K = 5) was used.

The analysis agent uses a utility-based decision policy when choosing models, which is described as:

U (M) = α \cdot Acc + β \cdot Fid + γ \cdot Stab,

where Fid stands for explanation fidelity, Stab for attribution stability, and Acc for predictive accuracy. Weight coefficients (

α = 0.4, β = 0.3, γ = 0.3

) were empirically selected to reflect the high-stakes nature of educational evaluation, prioritizing interpretability, and robustness alongside performance.

Stratified K-fold cross-validation was carried out with a temporal constraint, meaning that all training samples precisely came before corresponding test samples in time, in order to reduce the temporal leakage present in longitudinal LMS data. This historical partitioning guarantees an accurate performance estimate by preventing future behavioral signals from impacting previous forecasts.

The chosen XGBoost setup used 300 boosting rounds, a subsample ratio of 0.8, a learning rate of 0.05, and a maximum tree depth of 6. In order to strike a compromise between accuracy and computational economy, SHAP explanations were calculated using 1000 background samples.

Evaluation metrics

Two dimensions were the main focus of the ECA-HA framework evaluation. Performance and Explainability/Trust.

Performance metrics

Accuracy: The percentage of accurate categorization.

Precision and Recall: The percentage of accurate categorizations

AUC-ROC: The model’s capacity for discrimination is measured by the area under the receiver operating characteristic curve (AUC-ROC).

Explainability and trust metrics

It is infamously challenging to assess the quality of XAI explanations. The following quantitative metrics were used:^26,28

Fidelity ( $R^{2}$ ): Evaluates the degree to which the black-box model’s actual forecast agrees with the XAI explanation (such as the linear approximation of LIME). High fidelity guarantees the veracity of the explanation.

Stability: Evaluates the coherence of justifications for comparable cases. We computed the cosine similarity between the original and perturbed explanations after introducing tiny perturbations (Gaussian noise) to the input features. A strong explanation is indicated by high stability.

Computation time: The latency needed to produce an explanation, which is essential for dashboard engagement in real time.

Fidelity was quantified as the coefficient of determination ( $R^{2}$ ) between model predictions and explanation-based surrogate outputs, with values above 0.9 indicating high faithfulness. According to previous XAI benchmarking research, scores greater than 0.85 were regarded as robust explanations. Stability was assessed using the cosine similarity between SHAP vectors under minor input perturbations.

Results and Analysis

Predictive performance of the analysis agent

The candidate models were independently trained and assessed by the analysis agent. Table 2 provides an overview of the OULAD test set results.

Table 2.

Comparative performance of models selected by the analysis agent

Model	Accuracy	Precision	Recall	F1-Score	AUC-ROC
Logistic regression	0.82	0.79	0.76	0.77	0.85
Random forest	0.88	0.86	0.84	0.85	0.92
XGBoost	0.91	0.89	0.88	0.89	0.95
ANN (Deep learning)	0.89	0.87	0.86	0.86	0.93

Table 3.

Quantitative evaluation of XAI methods (SHAP vs. LIME)

Metric	SHAP (TreeExplainer)	LIME (tabular)	Interpretation
Fidelity ($R^2$)	0.99	0.89	SHAP provided a near-perfect approximation of the XGBoost model’s behavior, owing to its exact calculation capabilities for tree ensembles (12).
Stability (Cosine Sim.)	0.94	0.76	SHAP explanations were highly stable. LIME explanations varied significantly with slight data perturbations, indicating lower reliability for consistent assessment.
Computation Time	0.05s/instance	0.02s/instance	LIME was faster, making it potentially suitable for high-frequency real-time queries, but at the cost of reliability.

Analysis

With an accuracy of 91% and an AUC-ROC of 0.95, the XGBoost model outperformed the others in every statistic. The ANN worked well as well, although it needed a lot more processing power to train. The nonlinear and complex association between digital actions and competency was confirmed by the logistic regression baseline’s lag, which validated the necessity for sophisticated machine learning models in this field (6).

The analysis agent independently chose XGBoost as the production model for the ensuing Explainability phase based on these findings. This choice is consistent with research indicating that tree-based ensemble approaches frequently have greater intrinsic interpretability potential while outperforming neural networks on structured tabular data.

The baseline and XGBoost models’ performance differences were evaluated using pairwise bootstrap resampling (n = 1000). The observed advantages cannot be explained by sample variance, as evidenced by the statistically substantial (p < 0.01) AUC improvements.

Evaluation of explainability mechanisms

Both SHAP and LIME were applied to the XGBoost predictions by the explainability agent. We used the fidelity and stability measures to assess the quality of these explanations.

Agentic decision

After analyzing these metrics, the explainability agent, which was designed to maximize “Trustworthiness,” independently set up the system to use SHAP for all high-stakes reporting (such as “Digital Competency Profiles”). LIME was exclusively set aside for quick, exploratory “what-if” dashboard interactions where speed was more important than perfect accuracy. The agentic framework’s “Reflection” feature is demonstrated by this dynamic selection of XAI techniques, which assesses the tool’s appropriateness for the situation rather than merely applying it.

Because of SHAP computation and inter-agent communication, the agentic design adds a modest amount of overhead. Memory use scaled linearly with feature dimensionality, and end-to-end inference latency rose by 18%–22% on average as compared with a monolithic pipeline. On the other hand, batch SHAP approximation and parallel agent execution guarantee viability for institutional-scale deployment with tens of thousands of profiles as shown in Table 3.

VA and Human–AI collaboration

The VA layer is the pinnacle of the ECA-HA framework’s effectiveness.

Global cohort analysis

A SHAP Summary Plot for the complete dataset was produced by the dashboard. Sum_click (Total Interactions) and homepage_content access were found to be the most significant predictors of high competency in this visualization. But a “second-order insight” surfaced: the agent discovered a nonlinear interaction impact between quiz performance and forum (social) engagement. Teachers who had a lot of forum activity but little access to content were often marked as “At-Risk,” indicating that they were more likely to engage in “seeking help without studying” than “collaborative learning.” High forum activity would have probably been misconstrued as a wholly positive signal by a basic linear model.

The dashboard visualizes the supervisor agent’s reasoning process

Observation: “A competency report for Department X is requested by the user.”

Thought: “In October 2024, there is a dearth of data on Department X. I must warn or impute.”

Action: “Use the Imputation Tool. Give the XGBoost Model a call. Give SHAP a call.”

Reflection: “‘Resource Diversity’ has abnormally high SHAP values. Look for any data leaks.”

Because of this transparency, the user may trust the final product because the process was legitimate in addition to the high score.

Discussion

An important structural change in how educational institutions might handle digital literacy is represented by the move from static data mining to Explainable Agentic AI. The ECA-HA framework’s experimental results provide a number of crucial insights into the feasibility, dependability, and pedagogical implications of this change, especially in the particular setting of vocational education.

The superiority of agentic orchestration over static pipelines

This study’s most important discovery is not just the XGBoost model’s 91% prediction accuracy but also the method by which it was attained. The pipeline of data cleaning, feature selection, and model tuning in traditional EDM is a labor-intensive, manual process that is frequently subject to human bias and mistakes.^4,5 A HMAS could effectively automate these operations with a degree of flexibility that static scripts cannot equal, as the ECA-HA framework showed. When faced with data anomalies (such as missing assessment scores in particular OULAD modules), the “Supervisor” agent successfully functioned as a meta-reasoner, dynamically modifying the workflow rather than stopping it. According to contemporary definitions, agentic AI is transitioning from “Copilot” (aided) to “Autopilot” (autonomous) modes.⁹ This capacity is revolutionary for vocational colleges, which frequently lack sizable teams of committed data scientists. By enabling administrators to do intricate queries (“Show me the literacy trends in the Nursing faculty compared to Engineering”) without having to write SQL or Python code, it democratizes access to high-level analytics.

The “black box” dilemma and the necessity of fidelity

Although agentic automation increases productivity, it makes the “black box” issue worse. When an autonomous agent determines that a teacher is “At-Risk,” the first thing that comes to mind is “Why?” For this domain, our comparative study of XAI approaches offers a definitive response: For high-stakes assessments, SHAP cannot be compromised. Although LIME’s speed and model-agnosticism have made it popular in the literature,²¹ our stability measures (0.76 for LIME vs. 0.94 for SHAP) highlight a serious weakness. A stability score of 0.76 indicates that a small change in the input data could lead to a substantially different interpretation in about one out of every four occurrences. This discrepancy is inappropriate in the context of evaluating vocational teachers. It might result in a situation where two educators who exhibit very identical actions are given contradictory feedback, with one instructed to “increase forum activity” and the other to “focus on content access.” Despite their higher computing cost, this result strongly supports the adoption of Shapley values to guarantee the accountability and fairness demanded by ethical AI frameworks.^11,29 This is consistent with research in other high-stakes industries, such as health care, where rigorous monitoring layers or “fear modules” are being suggested to reduce AI error and delusion.³⁰

VA as the interface of trust

For most educators, SHAP’s raw output—a vector of feature importance values—is incomprehensible. The translation mechanism that transforms mathematical attribution into instructional storytelling is the VA layer. The dashboard facilitates “Visual LA” by displaying cohort trends (using Radar Charts) and “Push and Pull” elements (using Force Plots).¹³ This makes it possible for what we call “Pedagogical Calibration” to occur, in which a human expert verifies the AI’s logic using their own domain expertise. For example, a visual inspection enables the educator to differentiate between “productive collaboration” and “distracted socialization” if the agent flags “high forum activity” as a poor predictor of success (as shown in some “gamer” profiles in OULAD). To avoid “automation bias,” in which consumers accept algorithmic judgments without question, this human-in-the-loop verification is crucial.²⁰ Additionally, teachers with low visualization literacy may find it difficult to understand sophisticated dashboards, according to recent studies.²² Consequently, a crucial aspect of accessibility is the Agentic system’s capacity to produce natural language summaries in addition to the charts.

Implications for vocational teacher development

These results are given a special dimension by the particular context of vocational education. As dual-professionals, vocational instructors need to be digitally literate not just for instruction but also to set an example for their profession’s norms.^1,17 This is made possible by the ECA-HA framework, which permits the input of heterogeneous data. Future versions might incorporate data from digital simulators (like virtual welding rigs) or industry certification systems in addition to the LMS. It is possible to dynamically monitor the “Digital Resources” and “Professional Engagement” competences of the DigCompEdu framework.¹⁷ The agentic system offers a “Continuous Digital Pulse,” which offers just-in-time micro-credentials or interventions when a teacher’s digital actions deviate from the changing industry standard, in place of an annual review. Even while SHAP may exhibit attribution dilution for highly connected features, this constraint was mitigated by feature grouping and correlation-aware interpretation. Consequently, explanations were viewed as directional markers rather than as claims of causation.

Although reflective thinking at the workflow level is supported by the current implementation, quantifiable mistake reduction via self-correction is set aside for later research. Rather than autonomous learning, we specifically restrict claims of adaptive intelligence to procedural robustness.

Limitations and future directions

This study uses the OULAD dataset as a proxy, despite the encouraging findings. OULAD represents student behavior rather than teacher conduct, despite being the gold standard for EDM research. The aim is different even though the digital footprints (posting in forums, accessing resources) are structurally similar. This methodology has to be validated in future studies using datasets from actual vocational teachers. Furthermore, executing several LLM-based agents for real-time inference comes at a nontrivial computing expense. Scalable deployment will require optimization methods like model distillation or the application of Small Language Models for particular subtasks.²⁴ Lastly, in order to prevent the “Digital Pulse” from being used as a tool for policing but rather for empowering, the ethical implications of “surveillance” must be handled through strict privacy-preserving strategies (such as Federated Learning).³¹ The framework ensures human-in-the-loop decision-making, limits automated labels to advisory status, and requires informed consent in order to prevent institutional misuse. In line with responsible AI ideas in education, the system is made for developmental help rather than punitive evaluation.

Conclusion

ECA-HA, a thorough framework for Explainable Agentic AI in Higher Vocational Teacher Education, was introduced in this study. We have suggested a solution to the “Black Box” issue in educational evaluation by combining HMAS, Big Data analytics, and Visual XAI. Our results demonstrate that while rigorous XAI techniques (particularly SHAP) can provide the transparency and trust necessary for adoption, agentic architectures can manage the amount and variety of educational big data. Such intelligent, explainable solutions will be crucial allies in developing a teaching workforce that is digitally robust as vocational education deals with the combined disruptions of AI and Industry 5.0.

Footnotes

Acknowledgments

This article is the 2025 Hunan Province Education Science Planning Project “Research on the evaluation and improvement path of digital literacy of higher vocational teachers and majors in the era of digital intelligence” (Project approval number: XJK25BXX009).

Author’s Contributions

W.S.: Conceptualization, data curation, formal analysis, and drafting of article.

Data Availability

For data availability, click on the link:

Author Disclosure Statement

No competing financial interests exist.

Funding Information

This paper is the 2025 Hunan Province Education Science Planning Project “Research on the evaluation and improvement path of digital literacy of higher vocational teachers and majors in the era of digital intelligence” (Project approval number: XJK25BXX009).

Abbreviations Used

References

1. Akman

, Erdirençelebi

. A human-centered digital transformation: A bibliometric analysis of society 5.0 and industry 5.0. IMJ 2024;0(96):1–16.

2. He

. Construction and practice of english multi-modal teaching system in the process of digital transformation of vocational education. ER 2025;9(4):466–469.

3. Zhao

, Li

. Big data–Artificial intelligence fusion technology in education in the context of the new crown epidemic. Big Data 2022;10(3):262–276.

4. Junqué de Fortuny

, Martens

, Provost

. Predictive modeling with big data: Is bigger really better? Big Data 2013;1(4):215–226.

5. Romero

, Ventura

. Educational data mining and learning analytics: An updated survey. WIREs Data Min Knowl 2020;10(3):e1355.

6. Ester

, Kriegel

, Xu

. XGBoost: A scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM: New York, NY; 2016; pp. 785–794.

7. Chaushi

, Selimi

, Chaushi

, et al. Explainable artificial intelligence in education: A comprehensive review. In: World Conference on Explainable Artificial Intelligence. Springer Nature Switzerland: Cham; 2023; pp. 48–71.

8. Dhar

, Nilekani

, Maruwada

, et al. Big data as an enabler of primary education. Big Data 2016;4(3):137–140.

9. Xi

, Chen

, Guo

, et al. The rise and potential of large language model based agents: A survey. Sci China Inf Sci 2025;68(2):121101.

10.

10. Han

, Zhang

, Yao

, et al. LLM multi-agent systems: Challenges and open problems. arXiv Preprint 2024. arXiv:2402.03578.

11.

11. Dignum

. Responsible artificial intelligence: Designing AI for human values. ITU J 2017;1(1)

12.

12. Lundberg

, Lee

. A unified approach to interpreting model predictions. In: NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems. ACM; 2017; pp. 4768–4777.

13.

13. Vieira

, Parsons

, Byrd

. Visual learning analytics of educational data: A systematic literature review and research agenda. Comput Educ 2018;122:119–135.

14.

14. Herrera

, Pérez-Hernández

, Kumar Parlikad

, et al. Multi-agent systems and complex networks: Review and applications in systems engineering. Processes 2020;8(3):312.

15.

15. Kuzilek

, Hlosta

, Zdrahal

. Open University learning analytics dataset. Sci Data 2017;4(1):170171–170178.

16.

16. Falloon

. From digital literacy to digital competence: The teacher digital competency (TDC) framework. Education Tech Research Dev 2020;68(5):2449–2472.

17.

17. Punie

., eds. European framework for the digital competence of educators: DigCompEdu. Publications Office of the European Union; 2017.

18.

18. Tomczyk

. Digital literacy and e-learning experiences among the pre-service teachers data. Data Brief 2020;32:106052.

19.

19. Schick

, Dwivedi-Yu

, Dessì

, et al. Toolformer: Language models can teach themselves to use tools. Adv Neural Inf Process Syst 2023;36:68539–68551.

20.

20. Shinn

, Cassano

, Gopinath

, et al. Reflexion: Language agents with verbal reinforcement learning. Adv Neural Inf Process Syst 2023;36:8634–8652.

21.

21. Ribeiro

, Singh

, Guestrin

. “ Why should I trust you?” Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM; 2016; pp. 1135–1144.

22.

22. Pozdniakov

, Martinez-Maldonado

, Tsai

, et al. How do teachers use dashboards enhanced with data storytelling elements according to their data visualisation literacy skills? In: LAK23: 13th international learning analytics and knowledge conference. ACM; 2023; pp. 89–99.

23.

23. Spinner

, Schlegel

, Schäfer

, et al. explAIner: A visual analytics framework for interactive and explainable machine learning. IEEE Trans Vis Comput Graph 2020;26(1):1064–1074.

24.

24. Kim

, Jeong

, Park

, et al. Tiered agentic oversight: A hierarchical multi-agent system for AI safety in healthcare. arXiv Preprint 2025. arXiv:2506.12482.

25.

25. Labrou

. Standardizing agent communication. In: Multi-Agent Systems and Applications. ECCAI Advanced Course on Artificial Intelligence. Springer Berlin Heidelberg: Berlin, Heidelberg; 2001; pp. 74–97.

26.

26. Bodria

, Giannotti

, Guidotti

, et al. Benchmarking and survey of explanation methods for black box models. Data Min Knowl Disc 2023;37(5):1719–1778.

27.

27. Hlosta

, Zdrahal

, Zendulka

. Ouroboros: Early identification of at-risk students without models based on legacy data. In: Proceedings of the seventh international learning analytics & knowledge conference. ACM; 2017; pp. 6–15.

28.

28. Czerwinska

. Interpretability of machine learning models: How can one explain machine learning models? In: Applied Data Science in Tourism: Interdisciplinary Approaches, Methodologies, and Applications. Springer International Publishing: Cham; 2022; pp. 275–303.

29.

29. Polyxeni

‘PN

. Building trust in AI education: Addressing transparency and ensuring trustworthiness. In: Trust and Inclusion in AI-Mediated Education: Where Human Learning Meets Learning Machines. Springer: Cham; 2024; 73–90.

30.

30. Thurzo

, Thurzo

. Embedding fear in medical AI: A risk-averse framework for safety and ethics. AI 2025;6(5):101.

31.

31. Babar

. Agentic AI for personalized education and adaptive learning environments. IJCE 2025;7(12):1–10.