Abstract
Today, immersive technologies like Virtual Reality (VR) are regarded as disruptive tools in many domains, including education. While the body of literature in the field is growing, studies that present contrasting findings are not uncommon. In fact, although there is evidence of the benefits brought by VR in the educational processes, in some cases the effects of a possible trade-off between learning effectiveness and quality of the learning experience (or QoLE) may be observed. The two dimensions are difficult to disentangle, as besides learning effectiveness, other factors like motivation, technology acceptability, workload, presence, immersion, engagement, and usability come to play. This paper digs into the above scenario by focusing on the QoLE of immersive VR-based learning and comparing it with that of two conventional approaches (a physical prop-based one and a 3D desktop application). Separation of the two dimensions is pursued by imposing equality of the learning performance achieved with the three approaches, aiming at getting rid of possible confounding factors. From the results of the user study performed in the context of a STEM-related laboratory activity, the VR-based approach appeared to be generally superior to the prop-based approach and showed several advantages over the 3D desktop application.
Keywords
Introduction
Nowadays, technologies are revolutionizing the field of education (Delgado et al., 2015). According to Raja and Nagasubramani (2018), reasons behind this revolution include (i) the fact that technologies are being integrated in a growing number of curricula, that they can be leveraged as (ii) instructional delivery systems, as well as (iii) tools for aiding instruction, and (iv) that they offer the opportunity to enhance the entire learning process.
In the context of technology-aided education, a prominent role is played by cutting-edge immersive media like Virtual Reality (VR) (Allcoat et al., 2021), Augmented Reality (AR) (Zumbach et al., 2022), and Mixed Reality (MR) (Weng et al., 2019), and efforts are oriented towards the investigation of the advantages that immersive experiences could bring with respect to real, physical activities (Steffen et al., 2019). VR, in particular, is attracting the attention of researchers and practitioners in a variety of domains, encompassing entertainment (Nilsson et al., 2022), simulation (Pratticò et al., 2021), marketing (Dahane et al., 2022), support to disability awareness (Pivik et al., 2002), etc. Also, education is largely hit by this revolution, since VR enables the delivery of unprecedented learning experiences, for example, letting learners virtually visit places and times impractical or impossible to reach physically (Ng et al., 2023), access learning material in novel ways (Greenwald et al., 2018), etc. These opportunities are enabled by the features of the medium itself, which facilitates the exploration of complex contents in a way that could be hardly achieved by traditional methods (Hamilton et al., 2021). This is the case, for instance, of educational subjects that require the learners to apply abstract knowledge to solve problems and exercises, which is quite typical of science labs and similar activities (Weymuth & Reiher, 2021).
Researchers and educators are continuously pushing the boundaries of the VR medium with the aim to exploit its potential for educational purposes, leveraging its claimed advantages in terms of interaction, immersion, and engagement (Hamilton et al., 2021), or resorting to unconventional pedagogical models it can enable (De Lorenzis et al., 2023). Their goal is to make VR-based learning more effective than traditional approaches. Notwithstanding, the existing literature appears to show contrasting findings in terms of effectiveness of VR-based education, reporting both instances in which the use of VR leads to increased performance compared to traditional approaches (Allcoat and von Mühlenen, 2018; Shi et al., 2022), as well as studies in which it loses compared to them (Makransky et al., 2019; Mills et al., 2019; Petersen et al., 2022; Richards & Taylor, 2015).
As discussed in the works by Chun (2007) and Jacobsen (2000), a possible limitation to the adoption of VR-based learning tools as cogent alternatives to traditional approaches in daily practice may not be related only to their effectiveness. There are also other factors, for example, motivation, technology acceptability, cognitive load, presence, immersion, engagement, and usability, just to name a few, that could influence the experience with these tools.
This paper specifically focuses on the above factors, which relate more to the quality of the learning experience (in the following abbreviated QoLE) than to learning performance. These factors, and particularly, their interrelation with performance aspects, will be key for the design of next-generation VR-based learning tools. For instance, even considering the scenario in which VR-based learning tools are providing learning performance on par with traditional methods, the former would still represent a valid and compelling option should these tools be proven to deliver a higher QoLE.
Unfortunately, even though many studies that focused on the application of VR in educational contexts already analyzed both learning performance and QoLE (Evens et al., 2023; Jost et al., 2020; Kartiko et al., 2010; Lee et al., 2022; Pirker et al., 2017), it is rather difficult to clearly observe the two dimensions independently. In other words, the attempt to get the best from VR-based tools comes with a bias (in terms of dissimilarity from traditional approaches), which makes it problematic to assess the effects on QoLE that are intrinsically provided by the VR medium itself.
By moving from the above considerations, the objective of this paper is to analyze the QoLE of immersive VR-based learning experiences by comparing it against that of two conventional approaches via a user study. The considered educational context concerns helping learners to familiarize with and apply the concepts taught in a theoretical lesson through a practical exercise in a science lab experience. To arrange the study, a use case in the context of physics education was selected. The use case concerns General Relativity, and specifically focuses on the calculation of the space curvature by means of practical operations. Learners were invited to practice these operations using three different instructional tools: an immersive VR application, a non-immersive 3D desktop application, and physical props. The three tools were implemented by trying to avoid the introduction of possible biases in terms of learning performance; the aim was to make the comparison as fair as possible, thus allowing to observe the effects on the QoLE by isolating the contribution of the medium employed.
Related Work
In the last decade, the extensively explored field of technology-aided education has included VR among the learning instruments that are worth investigation (Petersen et al., 2022). In fact, the literature already reports many usages of VR-based learning tools at different education levels and in a variety of domains (Calandra et al., 2022; Hamilton et al., 2021).
Considering, for example, formal education, plenty of efforts were devoted to study the possible ways to exploit VR-based learning from K-12 (primary and secondary) (Pellas et al., 2020), to high school (Cho et al., 2024; Radianti et al., 2020) and university level (Calvert & Abadia, 2020). Because of the ability of VR to reproduce practical operations, one of the most promising applications appears to be in the context of laboratory experiences (Pirker et al., 2019), especially for STEM-related classes (Chang et al., 2020; Lee et al., 2022).
Despite the growing body of literature, a broad consensus on the effectiveness and the QoLE of VR tools against other learning approaches is still missing (Grassini et al., 2020; Makransky et al., 2019). According to Hamilton et al. (2021), this situation is due to the fact that, in many cases, the methods used to assess the outcomes of the learning process are not completely adequate, thus affecting the correct interpretation of VR effectiveness. One of the issues pertains to the fact that, in different works, comparative analyses are often conducted against different media (e.g., VR vs. printed material, VR vs. videos, etc.). In these cases, if the fairness of the comparison is not fully guaranteed, it may be hard to distinguish the actual contribution of technological features on learning outcomes (Petersen et al., 2022). Another issue concerns the fact that numerous studies are more focused on getting the most out of a given technology rather than considering the process from a learner’s perspective, either in terms of learning effectiveness or QoLE (Chandler, 2009; De Lorenzis et al., 2023).
The work reported in this paper specifically aims at investigating the effects of immersive VR-based tools on the QoLE, disentangling the latter from learning performance. There are indeed many studies that investigated these two dimensions for VR-based education, but the above separation has been difficult to achieve since performance-related aspects often act as confounding factors in observing the QoLE dimension.
For instance, in the work by Allcoat and von Mühlenen (2018), 99 learners were involved in a user study aimed to compare a traditional learning method based on a textbook against the use of videos and VR. The study assessed aspects regarding both learning performance and QoLE. The experimental results showed the superiority of the VR-based and the traditional method with respect to videos in terms of knowledge acquisition and understanding, as well as an improved retention of the learned concepts with the VR-based method. Concerning the QoLE, the study focused on engagement and emotional self-rating before and after the experiment. For both the dimensions, the VR-based method obtained higher preferences from the learners. Another example comes from the work by Shi et al. (2022), in which a user study was run to evaluate the effectiveness of an immersive VR serious game with respect to traditional methods. The study involved 100 learners and was conducted contrasting a group that experienced the VR game and took an evaluation against a control group that took only the evaluation. Results showed significant improvements in learning achievements and motivation to learn between the pre- and post-test conditions for learners in the VR group. Similarly, the work by Makransky et al. (2019) investigated the consequences of introducing VR to simulate science lab activities and the possibility to extend the principles of multimedia learning to immersive VR. A user study was conducted with 52 university students who were taught concepts via a desktop display or a VR headset. The results indicated that the use of VR made the students feel more present, but they also learned less and had a higher cognitive load than with the desktop display. Comparable conclusions were obtained by Allcoat et al. (2021). In this case, a user study involving 75 learners was performed with the aim to determine whether learning methods based on VR or Mixed Reality (MR) could be a valid alternative to traditional approaches. From the analysis of the experimental results, it was not possible to spot strong evidence that VR or MR were more effective than traditional methods. Nevertheless, higher levels of engagement and positive emotions were reported by the learners in the VR and MR conditions compared to the traditional learning condition. Greenwald et al. (2018) compared the use of a 2D application and of a VR simulation for the purpose of learning theoretical concepts. Although the learners perceived many advantages in the use of VR over the 2D approach, the results of an assessment performed on the two groups did not demonstrate any significant difference in terms of learning outcomes. In these works, however, the purpose was not to identify the sole benefits of VR in terms of QoLE. Thus, the traditional tools to which the VR ones were compared could not be assumed to guarantee the same level of learning effectiveness and, hence, a completely fair comparison.
Some other insights regarding the QoLE and its relations with learning performance can be obtained by looking outside the formal education context. As a matter of example, the study by Grassini et al. (2020) aimed to assess the capability of VR technology in improving the acquisition of practical skills compared to a less immersive 2D video. A user study was run, in which 30 participants were requested to carry out an assembling task after having experienced one of the two training methods. Based on experimental results, it was not possible to identify statistically significant differences for what it concerns performance metrics; nonetheless, the authors noted that the participants in the VR group who reported a higher sense of presence performed better than the others, as they were more engaged in the learning activities. These findings made the authors conclude that the sense of presence, and not the sole use of VR, contributes to the improvement of skills acquisition. The equality between the two approaches, however, was not tested statistically (the lack of a significant difference cannot be assumed as a proof of comparable performance).
The works seen so far analyzed both learning performance and QoLE. There are also examples of studies that focused only on a single component, though; in these cases, the lack of an analysis regarding one of the dimensions could make it difficult to validate the results and clearly determine whether possible sources of bias were considered in the experiment. This is the case of works like that by Pirker et al. (2017), where the authors conducted two user studies to evaluate a high-end VR application and a mobile one for educational purposes. Both the applications were contrasted with an equivalent desktop-based tool, and the experimental results showed that the high-end VR experience was deemed as more immersive and engaging than the desktop one, whereas no significant differences were found between the mobile VR application and the desktop one. No discussions on the learning effectiveness of the devised applications were provided. Richards and Taylor (2015) conducted a different user study involving 129 students with the aim to compare the knowledge gain associated with three learning methods, that is, a traditional lesson, a 2D application, and a VR-based 3D environment. The experimental results showed that the 2D approach delivered better learning outcomes with respect to the 3D one, probably due to the additional cognitive load and distractions associated with the use of the latter method. Finally, Alhalabi (2016) conducted a user study to investigate the training effectiveness of three immersive VR applications against a non-VR approach in the context of engineering education. Based on the obtained results, the author concluded that the use of any of the immersive VR applications was associated with a significant advantage in terms of training effectiveness over the non-VR approach and that the application leveraging the headset was superior to the other VR-based configurations.
The partial inconsistency of findings reported in the literature regarding the topics addressed in this paper is confirmed by the contrasting conclusions made in the works reviewed so far. In fact, on the one hand, there are works like those by, for example, Allcoat and von Mühlenen (2018), Shi et al. (2022), Pirker et al. (2017), and Alhalabi (2016) that acknowledged the various advantages brought by VR in the learning process. On the other hand, there are works like those by, for example, Makransky et al. (2019), Allcoat et al. (2021), Richards and Taylor (2015), Grassini et al. (2020), and Greenwald et al. (2018) that showed that it is not possible to clearly isolate the benefits of VR over traditional approaches and warned on the possible limitations of this technology.
By grounding on the above scenario, this work presents a new study that analyzes the effects of immersive VR on QoLE by keeping under control the learning performance component. In particular, the VR-based tool and the conventional instructional approaches leveraging respectively a 3D desktop application and physical props were implemented by making sure they deliver the same learning contents. By analyzing the learning gains and verifying that no learning bias has been introduced, it could be possible to gain insights about the QoLE offered by three considered media, separating it from the learning performance dimension. In order to determine the actual factors to consider for investigating the QoLE dimension, a number of works were analyzed (Al-Adwan et al., 2023; Araiza-Alba et al., 2021; Klingenberg et al., 2020; Lowell & Tagare, 2023; Song et al., 2021). The model proposed by Makransky and Petersen (2019) was ultimately chosen, as it encompasses a number of factors that have been considered separately or in an integrated way in other works (Klingenberg et al., 2020; Lee et al., 2010; Makransky & Lilleholt, 2018; Makransky & Petersen, 2019; Yang et al., 2023).
Case Study
The considered case study is centered on a STEM-related laboratory activity, that is, a practical learning experience that well represents a typical application field for VR-based education. The focus is on General Relativity, and the experience requires to apply a procedure for the calculation of space and time curvature in a point close to a black hole based on the approach described by Zahn and Kraus (2014). Typically, this topic is not included in secondary education programs since it can be difficult for the students to grasp the fundamental concepts of General Relativity when the teacher adheres to rigorous theoretical formalism and presents complex mathematical formulas. However, it is possible to simplify the explanation and involve students in exercises by leveraging a schematization based on simple geometrical shapes to represent the space around a black hole. The schematization used in this work is illustrated in the following.
2D Curvature and Sector Model
To introduce the concept of space and time curvature around a celestial body (one of the principles of General Relativity), it is possible to take inspiration from the Regge Calculus, a formalism for approximating the solutions of Einstein field equation. The fundamental idea behind this approach is to substitute the considered curved space with an appropriately designed Sector Model (SM) that will be used to calculate the curvature. Considering a 2D scenario, to create the SM of a curved surface (e.g., a spherical cap) it is first necessary to cover it with a series of quadrilateral polygons whose sides are segments of straight lines of the curved space (as shown in Figure 1(a)). Once the surface is completely covered, the SM is built by using planar polygons, called sectors, with the same proportions and symmetries of the original polygons (as shown in Figure 1(b)). The set of sectors is the SM of the initial curved surface. SM of a spherical cap: (a) original curved surface, (b) sectors of the SM, and (c) spherical cap and its approximation reconstructed by joining the sectors of the SM.
By joining the sectors of the SM so that the shared segments are connected, the resulting model is curved and represents the approximation of the spherical cap (as depicted in Figure 1(c)). Since the sectors are flat, the information about the curvature of the SM is contained only in the vertices of the model. By selecting a vertex P on the SM reconstructed in 3D by connecting the corresponding edges, it is possible to identify four sectors that share the considered point and wrap it without gaps (i.e., those highlighted in red in Figure 1(c)); however, if the four sectors are flattened on a plane, it can be noticed that the sectors sharing the vertex P are no longer sufficient to completely surround it. In fact, when comparing the area surrounding P on the approximated spherical cap with the area surrounding the same point on a flat surface, a positive deficit angle can be identified, and this denotes that the considered space is a curved surface in P. The deficit angle would be negative in presence of an excess of area, meaning that the sectors surrounding P overlap; or equal to zero if the considered space is not curved. The Gaussian Curvature K of a surface in P is proportional to the deficit angle and, in this case, can be estimated by dividing it by a quarter of the area of the four sectors sharing the vertex P (Figure 2). Area considered in the computation of the Gaussian Curvature (a quarter of the areas surrounding P).
ViSeMo
Exercises on 2D curvature and SM in a laboratory activity are usually paper-based, but the intrinsic problems of the medium (e.g., the time needed to prepare, print/draw and cut the sectors) often limit their use. An alternative was proposed by Weissenborn et al. (2018) and Zahn and Kraus (2019), who developed ViSeMo, a toolkit for teaching General Relativity in which the users have the possibility to explore the concept of 2D curvature of the equatorial plane surrounding a black hole by analyzing geodesic lines. In particular, the users can draw lines and compare how their behavior in the curved plane differs from the Euclidean behavior observed in non-curved spaces. ViSeMo can be considered an alternative, indirect method to analyze the effect of curvature with respect to the SM method presented in Section 2D Curvature and Sector Model, moreover, it offers the possibility to calculate deficit angles and areas to estimate K (although this feature currently works for spheres and saddles, while it is not implemented for the case of the black hole considered herewith).
3D Curvature and Sector Model
The concept of SM can be easily extended to a 3D scenario by passing from 2D curved surfaces to 3D curved spaces, where vertices become segments, segments become faces, and faces (the 2D sectors) become polyhedra (also called 3D sectors). The 3D sectors share faces, and the curvature is calculated around a segment shared, for example, among four polyhedra (as shown in Figure 3 for segment L). SM of a 3D curved space computed for a segment L surrounded by four 3D sectors: (a) no curved space; curved space characterized by (b) positive and (c) negative deficit angle.
As a consequence of the introduction of a third dimension, the curvature at a point in the 3D space has three components according to the orientation of the segment (for instance, a radial, a longitudinal and a latitudinal component). Figure 4 shows the 3D SM around a black hole and the three possible orientations of the segment. 3D SM around a black hole and the three orientations of a segment, that is, (a) radial, (b) latitudinal, and (c) longitudinal. Working with the 3D sectors surrounding each segment a different component of the curvature can be estimated.
The process to obtain the values of the curvature’s components is similar to the 2D case: • the curved space is divided in sectors, and the corresponding flat 3D SMs are built by keeping the proportions and symmetries of the original curved sectors; • the volume of the sectors that wrapped a segment L in the original space is compared to the volume of the SM sectors that wrap L in the Euclidean space; • a lack or a surplus of volume surrounding the chosen segment will result in a deficit angle (positive or negative). If there is a lack of volume (deficit angle greater than zero between the faces of two sectors) the space has a positive curvature along the direction defined by L, if there is an excess of volume (deficit angle smaller than zero between the faces of two sectors) the space has a negative curvature along the direction of L, if the sectors wrap perfectly around L the space is not curved along that direction.
Overall, the procedure to calculate the three-dimensional K at a certain point P is as follows: • select the eight sectors that share P; although more sectors could be considered, a tiled 3D space composed of eight polyhedra is enough to compute the three curvature’s components; • select a segment that contains P and is oriented along one of the three directions of the SM (for instance, for the radial direction it is possible to consider segment s
1
in Figure 5); • this segment is shared by four of the eight sectors; these four sectors wrap around P and identify a deficit angle (e.g., angle δ
1
in Figure 5 i.e. considered positive for the radial direction), which is associated to the whole segment; • select another segment that contains P and is oriented along the same direction of the previous segment (e.g., segment s
2
in Figure 5); this second segment is shared by the other four sectors that are yet to be used, which wrap around P and identify a second deficit angle (e.g., angle δ
2
in Figure 5) associated to the second segment. • the average of these two angles is the deficit angle associated with P, which can be used to calculate the value of this component of the curvature; • the area needed in the formula is a quarter of the areas of the four faces that wrap around (and share) P (for instance, the areas named A
1−4
in Figure 5 should be considered to compute δ
1
); • repeat these actions for the other two directions (e.g., longitudinal, and latitudinal) to find the other two components. Computing the 3D curvature along the radial direction.

Learning Tools
The study reported in this paper focused on the procedure for evaluating the 3D curvature at a point close to a black hole. To let the students practice this procedure, three learning tools were arranged. Each tool exploits 3D sectors designed following the metric described in the work by Hoyng (2006) and created using the Blender modeling tool. As anticipated, particular attention was put to ensure the delivery of the same contents in the three tools.
Desktop Application
The first tool consists of a desktop application inspired by ViSeMo, designed to automate the calculation of the deficit angles and areas required to estimate the curvature K in the considered 3D space.
The application was developed as a web tool that allows the users to: • translate and rotate 3D sectors; • attach or detach 3D sectors to form groups of sectors, thus easing manipulation; • activate or deactivate the opacity of the 3D sectors to simplify the identification of faces and overlapping sections (Figure 6(a) and (b)); • select faces to calculate deficit angles and areas (Figure 6(c)), this feature is accessible only if the sectors are combined in a correct way along one of the three directions considered in the procedure. Desktop application: sectors visibility can be switched from (a) opaque to (b) transparent, and (c) visualization of the calculation.

The controls provided by the application are inspired by those of common 3D graphics tools. In particular, the user can pan or rotate the camera by clicking on the background of the 3D scene with the right and left mouse button, respectively, and use the mouse wheel to adjust the level of zoom. Moreover, by clicking on a 3D sector (or group of sectors) with the left mouse button, he or she can move it in the scene. By releasing it over another sector, if the two sectors share a face they will combine to form a group. To facilitate the user during these operations, visual cues (like bounding boxes and highlights to show when two sectors can be attached) are provided.
To calculate a deficit angle, it is first necessary to assemble four adjacent sectors along one of the directions, then activate the Calculation Mode from a sidebar. This will disable the possibility of moving the sectors in the scene. In this mode, the user can select the two faces that share the segment and have a different orientation. The angle between the two faces (i.e., the complement of the angle between their normal vectors) is automatically computed and represents the deficit angle. To ease operations, the user can also activate or deactivate the transparency of the sectors, adjust the zoom, and visualize the name of the sectors using the corresponding controls in the sidebar. Once the angle is computed, the value is shown in the scene along with the areas of the faces that can be used, together with the angle, to determine the curvature K along the chosen direction.
Immersive VR Application
The second tool consists of an immersive VR application that was designed to offer the same functionalities of the desktop version but using a standard VR kit. The application was developed using the Unity game engine and the WebXR framework. In this application, the user is completely immersed in the learning experience, and can interact with the virtual elements (the 3D sectors and a menu panel) using his or her hands, which are tracked by the cameras embedded in the headset. Hand tracking is managed using the Mixed Reality Toolkit (MRTK), an open-source suite of Unity components provided by Microsoft, that was modified to work together with WebXR. This solution was chosen to replicate, using the VR technology, the natural interactions that would be enabled by physical props. Thus, the user can freely manipulate the sectors by grabbing, moving, and rotating them in the virtual environment, and can assemble them along one direction by bringing two adjacent sectors close together (Figure 7(a)). Immersive VR application: the user can (a) manipulate a group of connected sectors, and (b) visualize the results of a correct calculation.
To calculate a deficit angle, the user can leverage a series of features that can be accessed through a menu panel in the scene. From this menu, the user can make the sectors transparent, show their name, and enter the Calculation Mode; in this mode, he or she can select two faces by touching them with the index finger of either hand to calculate, if possible, the corresponding deficit angle. If two suitable faces are selected, the value of the angle and the faces perpendicular to the chosen direction will be automatically shown, together with the areas required for calculating K (Figure 7(b)). To facilitate operations, a series of visual cues are shown by the application: for instance, if two adjacent sectors are brought together, they will change color to highlight the possible interaction.
Printed Physical Sectors
The third tool consists of a set of physical props that the user can manipulate to find the data required for the calculation of the curvature. Typically, in a laboratory activity, paper-based sectors are used to this purpose, which are printed, cut, and then assembled using glue; paper-based sectors, however, can be difficult to handle and can easily deform or collapse during operations. To overcome these limitations, it was decided to leverage the 3D sectors modeled in Blender and print them using additive technology. Furthermore, to ease manipulation with these props and make it comparable with that of the other two tools, magnets were added to the sectors faces to keep them in place.
To calculate the deficit angle, the user can physically assemble the adjacent sectors along one of the directions (Figure 8(a)), then use a protractor to calculate the deficit angle. If the curvature determines an excess of volume and, consequently, two sectors are overlapping, it is possible to leverage trigonometry laws to find the correct angle by connecting the two sectors that should overlap and slide one of them on the shared face; the resulting gap corresponds to the deficit angle of the overlapping volume (Figure 8(b)). Printed physical sectors: (a) group of sectors showing a lack of volume, and (b) procedure to find the angle required for determining the curvature in presence of an excess of volume.
In order to enable a fair comparison between the three tools, when using the printed sectors the measurement step was replaced with a simpler operation: once the user has assembled the sectors to form a correct configuration, he or she can indicate two faces and ask an operator for the value of the corresponding deficit angle and the areas to be used for the calculation.
Experiment
This section describes the user study that was arranged to compare the three learning tools using a between-subjects approach. The sample size was set to be in line with related works recently published aimed at assessing the effect of VR on learning experiences (Bracq et al., 2019; Grassini et al., 2020; Zhou et al., 2018). Thus, 36 volunteers (28 males and 8 females), aged between 19 and 55 (
Procedure
The procedure included the steps illustrated in Figure 9. First, the participants were requested to attend a brief theoretical lesson (30 minutes), in which concepts regarding General Relativity and the calculation of the curvature were presented; to homogenize the delivery of these concepts among the participants, a video-lecture was used. After the video, the participants were allowed to ask the administrator for any clarifications on its content. Organization of the experimental procedure for the three groups.
Subsequently, the participants were invited to fill in a pre-test questionnaire, which included two sections aimed to assess, through a quiz, the level of knowledge of the participants before the experience, and to collect a number of subjective measurements regarding perceived motivation and self-efficacy.
The participants were then randomly split in three groups of the same size (12 participants each). Since the study followed a between-subjects approach, a different learning tool was assigned to each group. In the following, the three groups will be referred to as Printed Physical Sectors (PPS), Desktop Application (DA), and VR Application (VRA) according to the learning tool assigned to them.
The possibility to perform an analysis also between subgroups of different genders and/or ages was considered but had to be discarded due to the limited heterogeneity of the sample (and, consequently, to the limited validity of potential results). Only eight participants identified themselves as female and were randomly assigned to different groups (two to the PPS group, three to the DA group, and three to the VRA group); moreover, only six participants were outside the [20, 30] age range and were randomly assigned to different groups (one participant, 31 y. o., to the PPS group, four participants, 19, 32, 54, and 55 y. o., to the DA group, and one participant, 32 y. o., to the VRA group).
A new video (15 minutes) was administered to instruct the participants on how they could use the assigned tool to identify the angles and areas needed for the calculation of the curvature. Three videos were used, one for the PPS group, one for the DA group, and one for the VRA group. After having watched the video, the participants were invited to work on an exercise asking them to compute the components of the 3D curvature along the radial, latitudinal and longitudinal directions; to complete the exercise, they had to collect the needed data (i.e., areas and deficit angles) by using the assigned tools.
Finally, a post-test questionnaire was administrated that, like for the pre-test, the post-test, included both a quiz and a number of statements to collect subjective measurements (more details are provided in Section Evaluation Metrics). All the videos used in the experience are available for download 1 .
Apparatus
In order to ensure fairness in comparing the three groups, presentation and interaction were designed to be as similar as possible. Videos and questionnaires were presented on the same hardware and were in Italian to avoid linguistic barriers. The PPS group used printed props with an average size of (2.5 cm × 3 cm × 3.5 cm). The DA group used a laptop (equipped with an Intel Core i7-8750H 2.20 GHz CPU, 8 GB RAM, and Nvidia GeForce GTX 1060) with a 1920 × 1080 pixels display. Interaction with the application was enabled through mouse and keyboard. The VR group used an Oculus Quest 2, whose headset weights approximately 500 g and is endowed with two fast-switch LCD displays (1832 × 1920 pixels per eye at 72–120 Hz) with a field of view of about 90°. The participants were allowed to move in the virtual environment by real walking and, as anticipated, used free hands to interact with it.
Evaluation Metrics
The metrics used to measure the learning performance and QoLE are described below.
Improving SET Response Rates
The learning performance was measured considering the following three components: 1. Quiz score: the score obtained in the quiz administrated before and after the use of the learning tool. The quiz contained five multiple-choice questions with only one correct answer per question. The maximum score that could be obtained was 10. The quiz focused on aspects that were particularly stressed during the exercise, for example, sectors, areas, and angles to be considered when computing the curvature along a given direction. The pre- and post- test questionnaires included the same set of questions. 2. Exercise score: the score achieved by the participants when computing the 3D curvature along the three directions. The score was assigned by considering a number of aspects that impact on the correct computation of the 3D curvature: the proper selection of the areas (1 point) and of the two deficit angles (2 points), as well as the interpretation of the curvature and the consequent definition of the correct sign (1 point). The maximum score that could be obtained was 4, for each direction.
QoLE
The QoLE was assessed by means of structured interviews based on questionnaires that were conducted before and after the use of the learning tools. The questionnaires were split in several sections containing statements to be evaluated on a 1-to-5 Likert scale (from strongly disagree to strongly agree). Each section of the questionnaire was based on literature works and the components investigated were defined according to the study proposed by Makransky and Petersen (2019). In particular, the pre-test questionnaire evaluated the participants’ intrinsic motivation (Makransky et al., 2016) and self-efficacy (Makransky et al., 2016). The post-test questionnaire, in turn, investigated aspects regarding representational fidelity (Lee et al., 2010), immediacy of control (Lee et al., 2010), perceived usefulness (Davis, 1989), perceived ease of use (Davis, 1989), presence (Sutcliffe et al., 2005), perceived enjoyment (Tokel & ˙Isler, 2015), control and active learning (Lee et al., 2010), cognitive benefits (Lee et al., 2010), and reflective thinking (Lee et al., 2010). The post-test questionnaire also included the statements of the pre-test questionnaire to evaluate intrinsic motivation and self-efficacy, thus enabling a pre-post analysis. The questionnaire is available for download 2 .
Time needed by the participants to retrieve the data for computing the curvature was additionally measured (for each direction). This metric considers the time used by the participants (i) to assemble the correct configuration of the sectors using the learning tool, (ii) retrieve the data (deficit angles and areas), and (iii) write down the data. The time for calculating curvature values was not considered because it is not affected by the learning tool but rather is strictly dependent on the participant’s computation skill.
Results
Experimental results regarding the described metrics were analyzed using MS Excel and the Real-Statistics add-on.
For what it concerns the analysis of learning performance, comparisons between the quiz scores (considering the pre- and post-test results to evaluate the learning gain) were performed using the Scheirer-Ray-Hare Test to detect significant differences associated with the use of the different learning tools between the pre- and post-test conditions. When such differences were found, the Tukey’s post-hoc Test was used to highlight the learning tool(s) that performed significantly better than the others. Comparisons over the exercise scores were performed using the Kruskal-Wallis Test, followed by the Pairwise Mann-Whitney Test to detect significant differences between the three groups (with a significant threshold of p ≤ .05). When no significant differences were found, the TOST test was used to finally assess the equivalence between the learning tools.
Regarding the QoLE evaluations, comparisons were performed using the Kruskal-Wallis Test, possibly followed by the Pairwise Mann-Whitney Test to detect significant differences between the three groups (with a significant threshold of p ≤ .05), where not stated otherwise. Like for the quiz scores, the Scheirer-Ray-Hare Test followed by Tukey’s post-hoc Test was leveraged to analyze differences in terms of intrinsic motivation and self-efficacy between the pre- and post-test conditions.
Learning Performance Results
Considering the quiz scores, as anticipated the answers to the pre- and post-test tests were rated on a 1-to-10 scale. Results showed that the three groups experienced a significant knowledge gain associated with the learning experience but no significant differences between the three learning tools were found using the Scheirer-Ray-Hare Test (Figure 10). Quiz scores: comparison of the results obtained before and after the use of the learning tools. The asterisk indicates a significant learning gain associated with the learning experience (i.e., pre-post), regardless of the used learning tool.
The following pairwise TOST analysis on the results of the post-test showed that, for each comparison (PPS vs. DA, PPS vs. VRA, DA vs. VRA), the 90% confidence interval was completely contained in the (−1, 1) interval, thus suggesting that the considered effects were equivalent (Figure 11). The margin was set to −10% and +10% of the maximum score considering the scoring system used to evaluate the quiz (a 1–10 integer scale), as well as to have a reasonable estimation of the equivalence. Quiz scores: representation of the 90% confidence interval of the three pairwise comparisons of the considered tools, all contained in the (−1, 1) equivalence interval.
Moving to the exercise score, no significant differences between the three groups were found when computing the 3D curvature along the three directions. By performing pairwise comparisons between the three groups using the TOST test and setting the equivalence interval to (−1, 1), it was possible to show that for each direction and each comparison the 90% confidence interval was completely contained in the (−1, 1) interval, thus concluding that the considered conditions were equivalent. The values of the equivalence margin were set to −25% and +25% of the maximum score, considering how the exercise was evaluated (the final mark could only be an integer value between zero and four).
Overall, the obtained results confirmed that the three tools were all able to provide a satisfactory level of learning performance and could be considered as equivalent from this viewpoint. A similar outcome was observed already in other works, though with other tools; for instance, by Makransky et al. (2019), who compared immersive VR, desktop VR, and a conventional approach based on the use of a handbook, or by Araiza-Alba et al. (2021), who contrasted VR, 2D videos, and posters. However, in these studies, and specifically the former, the authors did not explicitly focus on disentangling the learning effectiveness and the QoLE; consequently, the lack of differences in terms of learning performance was not expected, and was considered instead as the result of possible limitations of the VR medium. Moreover, both the studies only managed to show that no significant differences were found between the different media, which does not imply the equivalence of learning performance. Due to this fact, it is not possible to exclude that the other qualitative outcomes highlighted by the authors were actually associated with the intrinsic characteristics of the considered medium, and not to possible confounding factors. Based on the above considerations, in this study, it was possible to proceed with the analysis of the QoLE, as no bias was introduced in the experiment for what it concerns the learning performance of the three approaches.
QoLE Results
The aggregated results concerning the specific QoLE dimensions that were investigated in the post-test questionnaire are summarized in Figure 12. Details regarding all the individual statements for each dimension are reported in Table 1. In the following, they are discussed focusing on statistically significant differences. Aggregated QoLE results for specific dimensions analyzed via the post-test questionnaire. QoLE results: Average scores for individual statements. Bold font is used to highlight the significant p-values (p < 0 .05). When the three-way comparison conducted with the Kruskal–Wallis Test (K-W Column) is significant, the significant pairwise p-values of the follow-up test are provided.
Representational Fidelity
Regarding the impact of “Representational fidelity” on the learning experience, the participants in the VRA group stated they were significantly more motivated to learn and work on the exercise compared to those in the other groups (statement 1.a). Reasons for this finding could be related to the fact that the sectors of the DA are not realistic enough and need to be manipulated indirectly using the mouse, whereas the PPS are more realistic, but are also difficult to manipulate and assemble. Similar outcomes were observed by Lowell and Tagare (2023), who elaborated on the “authenticity” of the learning context and its impact on learners’ self-efficacy and transfer of knowledge. Alternatively, this result could be possibly related to the novelty of the VR approach, as participants could be more motivated using a new, uncommon technology. Moreover, the physical sectors of the PPS were not proved to help the understanding of the exercise, whereas the digital sectors of the DA and VRA were deemed more helpful thanks to the possibility to change the transparency and their simple assembly mechanism (statement 1.c). This result was in line with the outcomes collected for the “Perceived Ease of Use” dimension (statements from 4.a to 4.d), described later in this section.
Immediacy of Control
For the “Immediacy of control” dimension, no significant results were obtained. This outcome can be explained by the fact that the three tools offered the same functionalities and, therefore, the same type of control over the task.
Perceived Usefulness
For the “Perceived usefulness” dimension, the VRA was found to be significantly more useful than the PPS in supporting the learning, due to a combination of a realistic interaction and an enhanced interface simplifying the assembly and calculation steps (statement 3.d). The perceived usefulness associated with VR, together with the resulting positive effects, were also confirmed by the work of Song et al. (2021), in which the authors showed that this dimension successfully mediated a positive change in terms of self-efficacy.
Perceived Ease of Use
Moving to the results about the “Perceived ease of use” dimension, all the statements (from 4.a to 4.d) showed that the DA and the VRA were significantly easier to use than the PPS, which were judged as difficult to hold and assemble, in particular during the calculation of the latitudinal and longitudinal components of the curvature. However, it is worth noticing that, in this study, no significant differences were found between VRA and DA, meaning that such perception was associated with the lack of physicality of the technology-based tools. This result is in partially line with the outcomes of similar studies like that of Zhou et al. (2018), who highlighted the improved ease of use brought by VR, although attributing the advantages to the naturalness of interaction.
Presence
For what it concerns the “Presence” dimension, the interaction with the learning tool was found to be significantly more natural for the participants in the PPS and VRA groups than for those in the DA group, since the former could manipulate the sectors directly whereas the latter were forced to use the mouse (statement 5.a). As stated before, this result is partially in line with the work of Zhou et al. (2018) but, as observed for statements 4.a to 4.d, the naturalness of the interaction is not totally responsible for the results obtained in terms of perceived ease of use. However, regarding the engagement of the learning experience (statement 5.c), the DA and VRA were rated more positively than the PPS, due to the relative novelty of the former and the general difficulties experienced by the participants in the handling and assembly of the latter. The benefits brought by VR regarding the sense of presence were also reported, for example, by Araiza-Alba et al. (2021).
Perceived Enjoyment
Results for the “Perceived enjoyment” dimension indicated that the VRA was considered as significantly more pleasant (statement 6.b) and fun (statement 6.c) than the PPS, as well as significantly more enjoyable (statement 6.a) than the PPS and the DA, probably due to the novelty and the natural interaction offered by the immersive approach. The rising of enjoyment due to the use of VR compared to more traditional approaches was previously reported in the study conducted by Makransky et al. (2019); however, since such a study was not designed to isolate the QoLE dimension and did not prove the equivalence between the considered mediums, it is difficult to say whether the positive increment in terms of enjoyment was fully associated with the intrinsic benefit of VR.
Control and Active Learning
Concerning the “Control and active learning” dimension, it was observed that the VRA performed significantly better than the PPS in promoting self-paced learning (statement 7.d), probably because the VR experience was the one that required fewer interactions with the administrator who was supervising the experience. More specifically, this result could be due to the intrinsic isolation that characterizes HMD-based VR experiences, since users are completely immersed in the virtual environment and have limited to no perception of the real world. Moreover, the VRA was also deemed as significantly more engaging (statement 7.e) than the PPS, since the interaction issues probably led to a less appealing experience, overall. Alternatively, this result could be associated with the use of a new technology, that could elevate the level of engagement in non-experienced users. The increase in the level of engagement when comparing VR with traditional learning tools was also demonstrated, for example, by Araiza-Alba et al. (2021).
Cognitive Benefits
Moving to results about the “Cognitive benefits” dimension, the DA and VRA were judged as significantly more helpful than the PPS in memorizing aspects regarding the considered exercise (statement 8.b), probably due to the possibility to visualize information (angles, areas) directly on the sectors. For the same reasons, and thanks to the possibility to easily rotate the sectors and control transparency, the DA and VRA were rated as significantly more helpful than the PPS in analyzing the problems tackled by the exercise like, for example, finding the required angles (statement 8.d). In both cases, no significant differences were found between DA and VRA. Similar results were obtained in previous works, for example, by Liu et al. (2022); however, the authors of the latter work highlighted the benefits of VR over non-immersive solutions, whereas such benefits could not be noticed when the QoLE dimension is isolated.
Reflective Thinking
Regarding the “Reflective thinking” dimension, the VRA was rated as significantly better than the PPS in linking new knowledge to the previous experiences (statement 9.b), probably due to the immersive features of the tool that allowed the participants to overcome the intrinsic limitations of the physical props. Probably because of the same reasons, the VRA was also considered as significantly more helpful than the PPS to let the participants become better learners (statement 9.c). For both the above aspects, no significant differences could be found with respect to the DA group, probably because, while overcoming the limitations of the physical props, the simplified desktop interface was not as natural as the immersive one. High values of reflective thinking related to the use of VR were also reported, for example, by Ye et al. (2022).
Pre-Post Analysis
The remaining dimensions regarding QoLE, that is, “Intrinsic motivation” and “Self-efficacy”, were investigated both in the pre- and post-test questionnaires, and the statistical analysis was conducted by contrasting the pre- and post-test results of the three groups together. Regarding the “Intrinsic motivation” dimension, no significant differences were found pre-post. This result is in line with the outcomes of the work by Klingenberg et al. (2020), where the authors observed an increase in motivation associated with the use of VR only when users experienced it after the alternatives, since this configuration probably makes it easier to notice the advantages of the medium. Moving to the “Self-efficacy” dimension, the results of the Scheirer-Ray-Hare Test showed a statistically significant gain associated with the experience, regardless of the learning tool used. In fact, after the experience, the participants in all the groups were more confident in tackling basic concepts of General Relativity (p = .027) and assignments on the considered topic (p = .009) and judged themselves as able to master the concepts that they were taught (p = .010). This finding is also in line with the study by Song et al. (2021), where a rise in self-efficacy related to the use of VR-based learning tools was observed.
Moreover, it is important to notice that in the post-test analysis the participants of the VRA group were found to be significantly more confident in their ability to tackle complex topics related to General Relativity (statement “I am confident that I understand the most complex concepts related to Relativity.”) than those in the PPS group (VRA M = 3.27 and SD = 0.92, PPS M = 2.75 and SD = 0.60, p = .018). No significant differences were found with respect to the DA group.
Completion Time
Finally, regarding completion time, a significant difference was found considering the radial direction (Kruskal-Wallis Test, p = .006). In particular, the PPS group was faster than both the DA group (PPS M = 105 seconds and SD = 41 seconds, DA M = 153 seconds and SD = 36 seconds, p = .033) and the VRA group (PPS M = 105 seconds and SD = 41 seconds, VR M = 205 seconds and SD = 93 seconds, p = .036). No significant differences were found between the DA and the VRA groups. Considering that the exercise along the radial direction was the subject of the second video aimed to instruct the participants on the way they could operate with the assigned tool, and that the participants of the PPS group experienced no particular technological barrier (meaning that they were not required to learn how to interact with a new application interface to complete the exercise), this result was expected. Moreover, it is worth noticing that the learning experience with the PPS was simplified for this experience as, differently than with the other tools, the participants were not requested to calculate the angles and the areas using the learning tool without any external support.
Conclusion and Future Works
The use of VR is steadily increasing in the field of education. Nevertheless, a complete picture of all its strengths and shortcomings is still not available. In particular, the interrelation between learning effectiveness and QoLE has not been specifically analyzed yet. Indeed, many studies investigated already the QoLE for VR-based learning tools but struggled to disentangle this factor from learning performance. In fact, these studies generally aimed to get the best from the considered tools: thus, they possibly introduced a bias, which made it problematic to draw definitive conclusions.
This paper contributes to advancing the research in this context by focusing on a practical learning activity and contrasting a tool that leverages immersive VR with a physically based approach and with a desktop application. The final goal is to show that, even though the three approaches reach equivalent learning outcomes (by design), immersive VR can provide a series of subjective advantages which can justify its adoption as a learning tool and can enhance the overall learning process, resulting in a more pleasant experience. These advantages are not linked to additional features that could have been made available thanks to the use of VR technology but are purely associated to the intrinsic benefits of the immersive medium. By decoupling the QoLE dimension, in fact, it is possible to understand the potential (and, sometimes, the limitations) of this technology. On the contrary, if the QoLE dimension is not isolated, as it usually happens in the literature, qualitative outcomes could be due to confounding factors and could lead to opposite conclusions.
In particular, starting from learning performance, it was shown that the use of the three tools was associated with a significant learning gain (pre-post), meaning that all of them were effective in supporting the given activity. Moreover, the absence of significant differences between the tools indicated that none of them was clearly better than the others. More specifically, a pairwise analysis showed that the three tools were equivalent, thus confirming the assumption on their objective effectiveness. This assumption was further confirmed by the results of the exercise, which were equivalent for the three groups.
Once established that the three tools could be regarded as equivalent in terms of learning performance for the considered activity, it was possible to move to the analysis of the QoLE, to establish whether the immersive VR experience could be associated with subjective advantages that are deriving from the intrinsic characteristics of the medium.
Considering the exercise completion time, no significant differences or equivalencies were found for the computation of the curvature along the latitudinal and longitudinal directions, probably due to the highly unequal variances of the three groups, whereas it was shown that the PPS group performed significantly better than the other two groups on the radial direction. It is important to notice, however, that the radial direction was the subject of the video lesson administered to instruct the participants on how to operate the learning tool. Considering that the participants of the PPS group did not use any specific technology besides the physical props, this result was somehow expected. Furthermore, it is necessary to consider that the PPS experience was simplified to perform a fair comparison between the three tools. In a real lesson, it would be necessary to use additional tools to measure the angles while holding the physical props which would lead to a longer and more complex learning experience, whereas the use of the two applications would be the same as in the performed experiment.
Looking at the results of the questionnaire, it is possible to highlight an overall appreciation for both the applications over the physical props, mainly due to their higher usability combined with their higher engagement, in particular for the immersive one. The physical props were judged as cumbersome to use, and their tangibility, which is their main feature and should have been their main advantage, proved to be an obstacle in completing the tasks required by the considered practical activity. For instance, it was difficult to combine the sectors and keep them assembled, even with the addition of the magnets, in particular, while working on the latitudinal and longitudinal directions. This fact hindered the overall understanding of the exercise, so that the physical props were perceived as a less useful learning tool despite the positive learning outcome. The difficulty in manipulating the sectors was also detrimental for the overall enjoyment and engagement associated with the learning experience.
As for the two applications, the VRA was generally more appreciated by the participants, since it was able to overcome the limitations of the physical sectors (as the DA), while keeping their tangibility and the naturalness of the interactions. This translated in a learning experience with more significant advantages over the physical one. In particular, the participants of the VRA group were more engaged and, consequently, were more interested in the experience, whereas no clear results were found for the DA group under this aspect. Furthermore, after the experience, the participants in the VRA group were more confident than the PPS ones in tackling other concepts related to the studied subject, whereas, again, no conclusion could be drawn for the DA group. This behavior was observed for almost all the dimensions investigated through the questionnaire, with the VRA performing clearly better than the PPS, and the results regarding DA being more often inconclusive. It is important to notice that the VR-based tool was clearly preferred to the desktop-based one in terms of fidelity, since it was more similar to the physical one. The VRA was also more enjoyable, since the interface of the DA, although offering the same functionalities, was also less intuitive and associated with a less fun experience.
Based on these findings, the VR-based experience emerged as the winner with respect to the other two in terms of QoLE, either by being clearly preferred to the physical one, or by offering a more pleasant experience than the desktop one. Since, as said, the three experiences were characterized by a comparable learning performance, these results were not biased by arbitrary choices made during the implementation phase but were a consequence of the intrinsic characteristics of the three approaches. The adoption of immersive VR over the other two for the implementation of a practical learning activity like the one considered in the experiment is thus justified, as hypothesized at the beginning of the study.
It is worth observing that the analysis reported in this paper may have been characterized by some limitations. For instance, one could argue that the equivalence of learning performance was due to the specific subject matter and the practical experience considered in the study. Therefore, in order to evaluate the external validity of the study, the three learning tools should be compared also under different scenarios. Another aspect that could be worth investigating is the nature of the intended target users, as the participants involved in the study could have played a significant role in the observed outcomes. For this reason, although results could hold for other university students, further experiments should be carried out by considering learners characterized by different levels of education, ages, genders, cultures, etc., investigating also possible differences between groups.
Finally, the software library used to implement the interface of the VR application was characterized by a series of problems (e.g., unstable hand tracking), which hindered the overall immersive experience; although this aspect can be considered as negligible for the reported study since it negatively affected only the VR experience (which proved nevertheless superior to the other learning tools), in the future it would be advisable to modify the application to get a more precise depiction of the intrinsic benefits of the immersive medium.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has been carried out in the frame of the VR@POLITO initiative. Research was supported by Programma Operativo Nazionale (PON) “Ricerca e Innovazione” 2014-2020 – DM 1062/2021 funds.
