Abstract
This study presents a mobile app that facilitates undergraduate students to learn data science through their own full body motions. Leveraging the built-in camera of a mobile device, the proposed app captures the user and feeds their images into an open-source computer-vision algorithm that localizes the key joint points of human body. As students can participate in the entire data collection process, the obtained motion data is context-rich and personally relevant to them. The app utilizes the collected motion data to explain various concepts and methods in data science under the context of human movements. The app also visualizes the geometric interpretation of data through various visual aids, such as interactive graphs and figures. In this study, we use principal component analysis, a commonly used dimensionality reduction method, as an example to demonstrate the proposed learning framework. Strategies to encompass other learning modules are also discussed for further improvement.
Introduction
In the past decades, data science has drawn considerable attention as a solution to uncover hidden information that leads to new insights and well-informed decision-making from big data (Wani & Jabin, 2018). To this end, data science utilizes various analytical methods and algorithms that are derived from multiple disciplines, such as statistics, computer science, and knowledge in particular application domains (Baumer, 2015). Many companies and organizations are trying to exploit data science for business opportunities and/or public good. The growing popularity of data science has created a high demand for data scientists.
Universities have responded to this demand by establishing schools, degree programs, and courses for data science to strengthen students’ competitiveness in the job market (Fayyad & Hamutcu, 2021). Yet, while many undergraduate students recognize the importance of data science, many of them are reluctant to take the pertinent courses because they find difficulties in statistics, mathematics, and/or programming (He et al., 2019; White, 2019). Instructors also have their own challenges. As the course enrollment size continuously inflates, instructors’ typical strategy for scaling up classes is to fill up the courses with decontextualized practices (Donoghue et al., 2021). This prevents students from having the necessary contextual knowledge to interpret data, leading them to see data merely as meaningless numbers (Davies & Sheldon, 2021).
To enhance learning outcome in data science education, a number of studies have suggested using mobile devices, such as smartphones and tablets, as supplementary learning tools (Díaz-Sainz et al., 2021; Ong et al., 2021). Nowadays, as mobile devices are readily available to students, mobile learning is considered a practical approach to improve students’ learning experience inside and outside the classroom (Aljawarneh, 2020; Norman et al., 2011). On a closer view, various visual aids (e.g., interactive graphs and figures) available on mobile learning promote students’ interest, curiosity, and thereby increase their motivation and engagement in learning (Angga et al., 2016; Nuanmeesri, 2021). Recent studies also indicate that students can develop social and teamwork skills by participating in learning activities together with mobile devices (Baecker, 2022; Go et al., 2022). Thus, developing mobile apps for data science learning holds enormous potential in helping undergraduate students, who are avid users of mobile technology, to cultivate their data science abilities.
Yet, there are several considerations that need to be addressed in the design of the learning app for effective data science education. First, the way concepts in data science are presented should be easy to understand in order to accommodate different levels of students. In particular, students with little knowledge and skills are more likely to lose their motivation if they find the content difficult. To reduce learning demands, the app should incorporate data visualization techniques to provide a geometric interpretation of data. This visual representation illustrates and helps students to understand how data science methods transform data into information(Unwin, 2020). Second, the dataset used for learning should be familiar and relevant to students so that they can build sufficient contextual understanding of the data. The high context-awareness of data helps students develop customized analytical approaches by enabling data-driven decision-making (Wolff et al., 2019). Thus, context-rich data could encourage students to seek different data analytical methods, which is exactly what data scientists are performing every day.
In this paper, we propose a mobile app to facilitate undergraduate students to learn concepts and methods in data science with their self-motion data. This app is specially designed to collect full body motion data via a built-in camera in a mobile device. Thus, the users can easily create datasets which are personally relevant and context-rich from their own body motion. The app also allows students to interactively explore the geometric interpretations of data with various visual aids so that they can learn data science in an effective manner.
Method
System Overview
Students will go through four stages when using the proposed app: method selection, brief method introduction, data collection through self-body motion, and method learning with the collected data (Figure 1). In the first stage, students decide which data science method they would like to study. Once a method is chosen, the app introduces the basic concepts of the method and pertinent background knowledge. Students are then prompted to collect their full body motion data with the help of instructional videos. During the data collection process, students can see 2D stick images superimposed on the images and compare these stick images with the human body on the screen (Figure 2). This allows students to realize that motion data are being created. Students can repeat the data collection process as many times as they want until the collected data is in accordance with what is demonstrated in instructional videos. After data collection, the app will process the data by using the selected data science method. Students can then interactively explore the geometric interpretation of the collected and processed data through a series of graphs to enhance their understanding of a data science method.

Four major stages of using the proposed learning platform.

Learning app screen for full body motion data collection (left). Selected 12 key points of the human body (right). 1-left shoulder, 2-left elbow, 3-left wrist, 4-right shoulder, 5-right elbow, 6-right wrist, 7-left hip, 8-left knee, 9-left ankle, 10-right hip, 11-right knee, 12-right ankle.
Full-body Motion Data Collection
In the past few years, the computer-vision community has developed several computation-efficient convolutional neural network (CNN) architectures for the deployment on mobile devices with limited hardware resources. For example, MobileNet and its variants (e.g., ShuffleNet; Zhang et al., 2018) use
Recently, researchers at Google presented BlazePose, a lightweight CNN architecture tailored for real-time inference of human pose on mobile devices (Bazarevsky et al., 2020). Inspired by MobileNet, BlazePose splits a standard
Once students initialize data collection, the app starts to read image frames from the embedded camera in YUV420 format. This format allows real-time display owing to its less transmission bandwidth. The app then converts the image frames into RGB bitmaps and feeds them into BlazePose at a rate of 10 frames per second. BlazePose consists of two sub-networks: a body detector performing pre-processing and a pose tracker detecting body key points. The detector standardizes the size of the inputs along with pose alignment to facilitate the subsequent key point detection. Specifically, the detector calculates four parameters for the first input frame: the bounding box for a person’s face, the midpoint between the left and right hip joints, the size of the circle circumscribing the person, and the torso incline angle. Based on these parameters, the detector aligns the pose and crops the full body region from the input image. This cropped image is then reshaped to a
Learning Modules of Data Science Methods
In this section, we use principal component analysis (PCA) as an example learning module. PCA is one of the basic dimensionality reduction techniques that are commonly covered in undergraduate data science courses. The learning dataset is generated from a video in which a person walks forward. Note that this walking motion should be captured from a 45-degree view angle for 10 seconds while the person being captured is located at the center of the camera view (Figure 2-left). The collected dataset is then represented by
The first step of PCA is to standardize the dataset X. Each element
In PCA, we are interested in reducing the dimensionality of data. This inevitably comes at the expense of information loss, but the goal is to trade the minimum loss for the maximum simplicity by retaining most of the data structure (i.e., variance) (Shlens, 2014). From a geometric point of view, this means projecting data onto smaller dimensional subspaces that spread the data most widely. In the app, students will see a related demonstration example of data projection from 2D to 1D space, in which they will notice there is a line that maximizes the variance of the projected data. Algebraically, this can be achieved by the dot product of data and the eigenvectors of the covariance matrix
Students need a clear understanding of the covariance matrix because it determines the eigenvectors onto which
Students will learn how to derive the PCs from their data
where
From Eq. (3) we can see that
The app explains that the eigenvalues

An example of the interactive scree plot. Students can check the value of

PC1 and PC2 over time for a walking motion.
Discussion
Despite the great potential of mobile learning, there is only limited work on developing mobile learning platforms for data science. In this work, we sought to fill this gap by developing a dedicated mobile learning app for undergraduate data science education. The proposed learning app could engage and motivate students to learn data science with features that students cannot otherwise experience in traditional settings (e.g., interactive visual aids and self-data collection). Therefore, introducing this app in data science courses could complement the existing classroom curriculum. We expect that our app will serve as a supplementary learning tool for undergraduate data science courses with which students can generate personally relevant and context-rich data.
At this time, we only developed one learning module, which is for learning PCA. To cover a wide range of methods in data science, new learning modules will be introduced to the app. These modules will cover methods commonly used in data science. Following the PCA module, the next module we could develop is a k-mean clustering module. The k-mean clustering module will guide students to pose several body postures. Among these postures, some are similar to each other (e.g., walking vs. running) while some are not (walking vs. drinking). Each posture will be captured multiple times. The dimensionality of the posture datasets will be first reduced to two (i.e., PC1 and PC2) by PCA for data visualization. Students will then see that the resulting PCs of two similar postures also have similar values and thereby form a cluster in a scatter plot. In addition, students will also notice that some less similar postures could be clustered together if the parameter k, the number of clusters, is less than the number of the performed postures.
Real-world data analysis rarely ends with a single method. Rather, it is a common practice to analyze data with a series of methods, each of which has its own role and facilitates the next step. For example, in the k-mean learning module abovementioned, k-mean is applied on PCs. These PCs is derived from the PCA learning module. Since the first two PCs only contain partial but critical information of the original posture data, applying k-means clustering can be computationally efficient and still achieve high performance. We believe such a connection between methods is as critical as understanding how each method works. Therefore, we plan to develop interconnected learning modules so that students can cultivate their thinking ability between methods, experience the entire analysis process, and consolidate what they have learned.
In addition, since the full-body motion data are high-dimensional, dimension reduction techniques provide a great remedy for turning the data into visible 2D dataset. Therefore, we will keep exploiting dimension reduction techniques as pre-processing methods for data visualization and delivering other pertinent methods.
There are several limitations in the current app to be addressed. First, because the adopted pose estimation model is specialized in capturing a single person without occlusion, students should avoid the scenarios where there are multiple people or objects block the person of interest when they collect human motion datasets. Otherwise, the key point detection algorithm may detect the motion of another person or may be less accurate because of the view block. Second, given limited hardware resources of mobile devices, real-time inference of 3D human pose is yet difficult to implement. Compared to 3D pose analysis, the current 2D localization results could be more deviated from the actual locations due to the absence of depth information, which is the reason that the proposed app currently guides students to follow certain capturing conditions. We leave the improvement of the pose estimation model as future research along with new learning modules.
