Abstract
The teaching effect of college physical education classroom needs to be combined with artificial intelligence system. From the actual situation, the current college physical education classroom is mostly based on manual teaching and manual management, so the teaching effect is not good. In order to change the traditional teaching mode and improve the classroom detection effect, based on the open Internet of Things and cloud computing technology, this paper builds a real-time monitoring system of college physical education classroom, and proposes a number of new and improved algorithms, which provide a theoretical and technical basis for the application of automatic identity positioning in large scenes. Moreover, this study obtains field scenes through field image data collection and field data processing, and then combines the regional scenes with field measured data to verify accuracy and trends to obtain students’ morphological characteristics. In addition, this paper designs practical experiments to verify the system performance. The research results show that the intelligent system constructed in this paper has certain effects and can be applied to physical education.
Keywords
Introduction
The teaching effect of college physical education classroom is of great significance to the all-round development of college students’ moral, intellectual, and physical development. Many students have weak physical qualities, and students have low motivation to exercise. Therefore, real-time monitoring and guidance of students in physical education classes can effectively improve students’ exercise efficiency [1].
In the context of educational intelligence, the intelligent classroom video recording system began to receive attention, and the research work was quickly carried out. The intelligent classroom video recording and broadcasting system [2] is an educational technology system that uses multimedia technology and computer network technology to record classroom teaching in real time. The system moved the traditional studio to the ordinary classroom, which not only can record the classroom teaching situation in real time, but also can broadcast live and on-demand. The intelligent classroom video recording and broadcasting system does not require manual operation during the entire classroom teaching recording process, and only needs to place the camera in a fixed position in the classroom. After that, under the control of the computer, it uses image processing methods to enable the camera to automatically record the classroom situation, and it can be broadcast live in real time. The study of the intelligent classroom video recording and broadcasting system is of great significance. It not only solves the problem of the traditional manual recording and broadcasting system interfering with classroom teaching, but also integrates multimedia teaching equipment such as computer screens and electronic whiteboards. Moreover, it broadens the teaching scope of the classroom, further improves the sharing of high-quality teaching resources, and expands the scope of the use of educational resources, thus providing a new video recording method for the recording of classroom teaching [3].
The main detection and tracking objects in the classroom video recording and broadcasting system are the teachers who teach and the students who stand up and speak. Therefore, how the camera accurately tracks and locates the instructor and the students who stand up in a complex classroom teaching environment is the key to whether the classroom video recording system can effectively record. In the follow-up shooting of the instructor, the teacher walked around the front and back of the lecture, and part of the body was blocked by the platform. This requires the recording and broadcasting video system to be able to quickly and accurately perform target detection and tracking, and at the same time be able to overcome the difficulty of the target being partially blocked. In the tracking and shooting of the students who stand up to speak, the classroom video recording and broadcasting system must accurately detect the students who stand up to speak among many students, so as to implement tracking. Moreover, the recording camera should choose the appropriate magnification according to the position of the classmate standing up.
Related work
After more than ten years of research and development, the performance of the intelligent classroom video recording and broadcasting system has been continuously improved, and the functions have been continuously improved. Many schools, especially primary and secondary schools in the stage of basic education, have gradually seen the value of the intelligent recording and broadcasting system [4]. In the second-generation recording and broadcasting system, the recording of computer courseware is also realized. When the lecturer uses the computer courseware, the system will use the split-screen technology to record the teacher and the courseware at the same time [5].In recent years, Suzhou Keda, Shanghai Excellence and other companies have started to study classroom recording systems based on image recognition technology. However, the recording and broadcasting system [6] based on image recognition technology is still in the experimental verification stage due to high technical requirements and many difficulties. With the development of technologies such as video encoding and decoding, cloud computing and big data, the current classroom video recording and broadcasting system has been recorded from the initial standard definition to high-definition recording. At the same time, it has changed from the previous separation of “recording” and “broadcasting” to the combination of real-time recording and online live broadcasting. At the same time, due to the relatively low demand for educational recording and broadcasting systems abroad, there is still little research abroad. Foreign classroom recording video systems generally only have one camera to directly shoot teachers, and few students or blackboards are shot, so I won’t go into detailsin thispaper. In recent years, with the development of technologies such as video surveillance, unmanned aerial vehicles, and classroom video recording and broadcasting, research on target detection and tracking technology in image sequences has received increasing attention from governments and scholars in various countries [7].IBM cooperated with Maryland State University in the United States to develop a real-time video surveillance W4 system, which combines geometric analysis to realize the detection and tracking of people and various parts of the human body in outdoor environments. The human motion tracking system developed by the Massachusetts Institute of Technology can reconstruct the three-dimensional structure of people in the indoor environment, so as to realize the detection and tracking of people [8].In order to promote the development of intelligent vehicle technology, the US Department of Defense established a research laboratory related to visual navigation to study the detection, tracking, and positioning of moving targets in complex backgrounds [9].The EspritViews project carried out by the European Union mainly studies the detection and tracking of people and vehicles in public environments [10].At present, many universities have also set up related laboratories, and the research mainly involves video surveillance in specific scenarios, classroom video recording and broadcasting, and target detection and tracking in some military.
In the teaching automatic recording system, the video image processing technology involved mainly includes two aspects, namely the tracking technology for teachers [11] and the recognition technology for students’ classroom actions [12].Some experts and scholars have conducted research on two types of video image algorithms, teacher tracking and student motion recognition [13].The main task of the teacher tracking module is to obtain the current position of the teacher by analyzing and processing the video captured by the panoramic camera when the teacher deviates from the center of the field of view of the conference camera. At the same time, it sends corresponding rotation, zoom, focus and other commands to the conference camera, so that the tracking teacher is always within the field of view of the conference camera [14].The main task of the student motion recognition module is to analyze and process the video captured by the panoramic camera to determine whether there is a student’s current standing up or sitting down movement. When a student stands up, the system sends a corresponding instruction to the conference camera to make the conference camera aim at the standing student. When the student sits down, the system switches the conference camera to a panoramic view [15].Compared with the early system based on signal receiving equipment, the peripheral equipment of the education recording and broadcasting system based on video tracking and identification technology is greatly reduced, the cost is reduced, the requirements for installation are low, and the teaching mode can be normalized, so it is very popular with users. This recording and broadcasting system based on video tracking and identification technology will also be the ultimate development direction of the classroom teaching recording and broadcasting system [16].The article [27] addresses the issue such as enormous volume of bigdata and come up with the concept of SmartBuddy to form brilliantly and savvy environment utilizing human practices and human elements. The article [28] talks almost the development of coordinated non-cyclic chart for video coding calculations for movement estimation in parallel reconfigurable computing frameworks. Moreover, the partitioning algorithm plays a major part to speed up the video processing. The article [29] dealt exploiting IoT and BigData Analytics utilizing Hadoop environment in genuine time situations. Execution of IoT-based Smart City is accomplished by the above-mentioned processes. The article [30] centers around IoT and its significant job in sophisticating the human practices and endeavors. This paper additionally managed the assortment of different information from different assets that are associated with the web. The literature [31] addresses the different issues within the field of vehicle communication with the recommendation of a common bound together and scattered range detecting demonstrate. The application of the shared cognitive paradigm minimizes struggle and different obscure problems [32, 33].
Camera calibration
The purpose of the camera calibration is to obtain the offset and distortion parameter information of the principal point (f) of the orientation elements in the camera. If the camera calibration accuracy is not enough, the result of data post-processing will not reach the expected effect, which will affect the accuracy of later photogrammetry and waste manpower and resources. The camera used in this paper is a non-measurement digital camera with large optical distortion. Therefore, the camera needs to be corrected to obtain the main distance (f), the coordinate position (x0, y0) of the image principal point in the image plane, and 5 distortion parameters. Distortion parameters include radial distortion and centrifugal distortion. The radial distortion expression is shown in formula (1), and the centrifugal distortion expression is shown in (2) [17]:
In the formula, k1k2k3 ——radial distortion parameter;
x, y ——corresponding image point coordinates;
r——image points differ;
Δx r Δy r ——the correction number of objective lens distortion in (x, y) direction.
The expression of centrifugal distortion is:
In the formula, p1p2centrifugal distortion parameter.
Based on the above calibration theory, this paper uses EasyCali-brate fully automated camera calibration software for camera calibration. The software uses the flat screen grid of the TV screen as the calibration board as shown in Fig. 1, and the flat control net is placed at five angles for shooting. The plane control field at each angle adopts the camera’s two postures in vertical and horizontal directions, and the high and low positions are shot according to the nine-square grid method. Each group takes a total of 36 photos and a total of five groups, so that the two-dimensional plane inspection field produces a three-dimensional effect relative to the camera, and the automatic measurement of the mark points is realized [18].

TV screen flat grid.
Before shooting, the camera sensitivity, aperture and other parameters are set, and the camera shooting mode is set to manual. At a distance of 2-3 m from the plane grid of the TV screen, the camera frame is fixed and the camera frame height is adjusted. The camera was placed on the shooting frame to shoot the flat grid and took 180 photos. The photos taken after each period are imported into the automatic calibration software, and the distortion parameters of the four-phase camera are shown in Table 1.
Camera distortion parameter table
The GPS/IMU auxiliary triangulation system is composed of an inertial measurement unit (IMU) and a positioning system GPS. The attitude angle of the inertial navigation system in the survey area is obtained through the IMU, including three angles of pitch angle (ω), roll angle (φ), and track angle (κ) value. Since the three-axis gyro system in the IMU system has a certain angular deviation value from the auxiliary coordinate system of the image space of the camera, it is necessary to use the formula (3) to obtain the attitude parameters of the aerial camera [19].
In the formula,
φ, ω, κ——attitude value obtained by IMU;
Δφ, Δω, Δκ——the deviation value between the coordinate system of the IMU inertial measurement unit and the auxiliary coordinate system of the image space.
After obtaining the three pose parameters of the photo, the spatial information of the camera site is calculated according to formula (4) to obtain the external orientation elements of the photo.
In the formula, X A , Y A , Z A ——the coordinates of GPS phase center A in the geodetic coordinate system;
X S , Y S , Z S ——The coordinates of the camera projection center S in the geodetic coordinate system;
μ, υ, ω——The three coordinate components of the phase center A in the image coordinate system.
The method of using GPS/IMU-assisted aerial triangulation effectively reduces the number of field image control points, making regional network adjustments under the condition of a small number of ground control points. The specific empty three solution principle is as follows [20]:
(1) In-image orientation
The internal orientation of the image data is to restore the relative positional relationship between the photo and the camera used according to the frame positioning coordinates in the acquired photo and the camera verification parameters, and establish the correlation between the photo element coordinate system and the image plane coordinate system. At present, the internal orientation of the image data is mostly automatic internal orientation, that is, according to the nature of the image frame being symmetrical and the frame is rotated at any multiple of 90°, the standard frame template is established for the camera, and the template algorithm is used to quickly and automatically identify and position the punctuation of each frame. Based on the camera verification parameters and theory, the transformation value between the coordinates of each pixel and image point in the camera is obtained by the affine transformation algorithm. The transformation formula is shown in (5):
In the formula, h0h1h2k0k1k2is the orientation parameter value in the obtained image, which can be obtained by adjustment operation.
(2) Automatic image matching and relative orientation of the same name
Image data matching methods mainly include gray value matching algorithm and feature (point, line, surface feature) matching algorithm. Gray matching mainly expresses the relationship between the two by the correlation of the pixel gray value of two adjacent images. In addition, feature matching mainly uses feature operators to select feature points from overlapping areas of adjacent photos and performs local multi-point relaxation algorithm matching on each feature point to identify the same-named image points in each image. Then, through the relative orientation, the relative position of adjacent photos is restored, and a geometric model proportional to the real surface is established. The schematic diagram of relative orientation is shown in Fig. 2 [21].

Schematic diagram of relative orientation.
(3) Control point semi-automatic measurement and regional network adjustment
With the application of photogrammetric area network adjustment, the number of field image control points that need to be deployed is greatly reduced, and the accuracy requirements of field measurement control points are becoming higher and higher. The operation of image control points on the photogrammetry system software needs to be calibrated and pinpointed, and manual positioning is required. Then the automatic turning point is performed through multi-image matching to obtain the coordinates of the same image control point on adjacent photos. The coordinates of the obtained photo connection points are used as the original observations to perform regional network adjustment with a small number of ground control points, and to analyze the aerial triangulation [22].
At present, the commonly used regional network adjustment methods mainly include regional network adjustment, independent model method regional network adjustment and beam method regional network adjustment. Among them, the beam method has the highest adjustment accuracy in the area network. The adjustment model is a collinear equation, the adjustment unit is a single beam, and the image point coordinates are the observation values. Meanwhile, the unknown is the external orientation elements of each photo and the ground coordinates of the point to be determined. The error equation of the regional adjustment of the beam method is directly listed by the observation value of the image point coordinates. Since the adjustment formula of this method is based on the principle of the three-point collinearity of the camera station, the image point and the image point corresponding to the ground point, and is linearized by the collinear equation, it is necessary to provide the positional coordinates of the out-of-position element of each photo and the ground coordinates of the unencrypted point approximation. The collinear equation formula is shown in (6):
In the formula, (x, y, z)——the coordinates of the image point a in the image space coordinate system, z = - f;
R——rotation matrix;
μ, υ, ω——The coordinates of image point a in the auxiliary coordinate system of image space [23].
The filtering technique is usually used to separate the ground points of the point cloud. The theoretical basis of the filtering technique is the sudden change in elevation between the two points of the point cloud, or the local interruption phenomenon. At present, the commonly used UAV image matching point cloud filtering algorithms are as follows:
(1) Filter algorithm based on mathematical morphology
Mathematical morphology filtering algorithm is based on the expansion and erosion operations in image processing. It is a region-based, top-down filtering algorithm from local diffusion to global.
Corrosion: S is completely included in X, and the formula is shown in (7):
Expansion: The structural element B translates a to obtain Ba. If Ba hits X, point a is recorded. The set of points a that meets the above conditions is called the result of X being expanded by B. The formula is shown in (8):
In the formula, X——Signal to be processed [24].
This filtering method uses a moving window regular grid data structure to filter the point cloud of the entire area. It has the advantages of simple operation and fast operation speed. However, when it is interpolated into a regular grid before filtering, it will lose a lot of important terrain information and cannot handle areas with dramatic terrain changes very well.
(2) Filtering algorithm based on moving window
The moving window filter is an evolution of the surface-based filtering method, which is similar to the mathematical morphology filtering algorithm in setting the height difference threshold. The basic idea is: the ground is regarded as a continuous and slowly changing surface, and the parameters with weights are used to fit the terrain surface, and finally the filtering of the experimental area is realized.
(3) Filtering algorithm based on terrain slope
The greater the difference in elevation between two adjacent points in the point cloud data, the greater the slope value formed. A point with a larger elevation value is less likely to be a ground point. The algorithm mainly determines whether the target point is a ground point by comparing the height difference between the target point and its adjacent ground point and the design threshold. It is assumed that the terrain slope does not exceed 30%, the filter function is shown in (9):
In the formula, Δhmax (d)——the given threshold function of distance;
σ——standard deviation;
d——the distance between two points, that is:
The key of this algorithm is the determination of slope threshold. The gradient filtering algorithm is based on loose conditions, so the filtering error is large, and the filtering effect is not very ideal. For different terrain data, different data training sets must be selected to recommend the optimal kernel function, and the adaptability is poor. There are certain limitations in determining the size of the threshold based only on the distance between two points[25].
According to the first law of geography proposed by American geographers, “Everything is related, and things that are closer are more relevant.” Inverse Distance Weighted (IDW) is based on the “first law”, which assumes that the influence between variables decreases as the distance between sampling locations increases. The core of the inverse distance weighting algorithm is weight, that is, each sampling point has a greater or lesser effect on the interpolation point. The distance weight of the known point to the unknown point is adjusted according to the weight function. The weight function is shown in formula (11) [26]:
In the formula, λ j ——the weight of point j;
d j ——the distance between point j and the point to be interpolated;
u——weight index.
It can be seen from the formula that the weight is the attenuation function with the distance as the independent variable. The greater the distance between the sampling point and the point to be interpolated, the smaller the weight, and the closer the distance to the interpolation point, the greater the weight. When the interpolation point is beyond a certain range of the sampling point, the weight is so small that it can be ignored. The value of any point to be interpolated is the sum of the weights of each sampling point, as shown in formula (12):
In the formula, z j ——the elevation value of the jth point;
z p ——the height value of the point to be inserted;
The interpolation method is simple in principle and convenient in calculation. The interpolation operation passes through each sampling point and considers the contribution weight of different discrete points to the interpolation point according to the distance relationship. It is an accurate DEM interpolation algorithm. Therefore, inverse distance weighted interpolation (IDW) is susceptible to sampling points, causing obvious protrusions or depressions near the sampling points. The last subsidence data is taken as an example, and the route generation graph obtained by IDW interpolation algorithm is shown in Fig. 3.

Route generation diagram.
The Chebyde interpolation algorithm is abbreviated as SPD, which is the earliest standard derivative distance weighting process proposed by South African geologists. The weighting formula is shown in (13):
In the formula, w i ——weight;
d i ——the distance from point i to the interpolation point;
r——the distance is adjusted.
The improved Shepard algorithm has two variations to adjust the weight:
(1) In the entire data set or a given radius, the weight w
i
is adjusted according to the distance of the furthest point. It is assumed that the furthest distance from the interpolation point is r within a certain range, then the modified weighted formula of reciprocal distance is shown in (14) and (15):
In the formula, δ——smoothing factor, when δ = 0, it is the precision interpolation algorithm, and when δ ≠ 0, it is the non-precision interpolation algorithm;
u——weight index.
(2) Combined with the local quadratic polynomial adjustment weight w
i
, the elevation value participating in the reciprocal weighting function does not use the original elevation value of the original sampling point, but uses the fitted quadratic polynomial to modify the elevation value. The specific formula is shown in (16):
In the formula, z j ——the point to be interpolated;
Q i ——quadratic polynomial function.
The last subsidence data is taken as an example, and the connection point results obtained by using the Chebed interpolation algorithm are shown in Fig. 4.

Results of connection points.
Radial basis function interpolation (RBF) is a general term for precise interpolation operators. The principle of this interpolation algorithm is that any surface can be approximated by a linear combination of multiple surfaces. The algorithm is similar to the geostatistical interpolation algorithm. Except for non-collinearity, the RBF algorithm does not need to analyze the advantages of the semi-variogram and does not require any assumptions about the relevant sampling points.
The RBF algorithm usually consists of two parts, as shown in formula (17):
In the formula, z p ——the height of the point to be interpolated;
λ i ——the weight value of point i;
d i ——the distance from the i-th sampling point to the point to be interpolated;
φ (d i )——radial basis function;
f j (x)——”trend” function is a basic polynomial with degree less than m.
The radial basis function interpolation algorithm can select different types of radial basis functions φ (d i ), such as multiple quadric surfaces (MQF), reciprocal multiple quadratic surfaces (IMQF), thin plate spline (TPSF), multiple functions (MLF), multiple functions (MLF) to interpolate discrete data. This algorithm is different from the IDW interpolation algorithm as an accurate interpolation algorithm in that the IDW interpolation algorithm completely interpolates based on the sampling point, and cannot calculate values higher or lower than the sampling point, while the radial basis function algorithm based on the trend function can calculate the value above or below the sampling point.
In this paper, the thin plate spline function (TPSF) is selected as the radial basis function to interpolate the discrete subsidence data. The radial basis function is shown in formula (18), and the subsidence basin is shown in Fig. 5.

Contours of local contours.
In the formula, c——smoothness factor;
d i ——the distance from the i-th sampling point to the point to be interpolated.
The last subsidence data is taken as an example. The contours obtained by RBF algorithm are shown in Fig. 5.
The Kriging algorithm is mainly based on regional variables, and the mutation function is the main tool to study natural phenomena that are both random and structural in spatial distribution. Kriging interpolation algorithms include simple Kriging, ordinary Kriging, pan Kriging, cooperative Kriging and more than 20 different deformation forms. However, all of its Kriging interpolation algorithms are based on small mutations, as shown in formula (19).
In the formula, m——the average value of all sampling points in the whole area;
n——all sampling points participating in the interpolation algorithm in the specified search area centered on point x0;
m (x0)——the average value of sampling points in the search area.
The main core of Kriging interpolation to find the unknown point lies in solving the weight λ
i
. This weight must satisfy the unbiased estimate to minimize the estimated variance. Among them, the mathematical expression of the unbiased estimation of the algorithm is shown in (20):
The estimated variance expression is (21):
Among them,
The above formula is the covariance function, and then according to the relationship between the covariance function and the mutation function, the standard Lagrange multiplication is used to construct the function F. Then, F is partial derivative to establish n + 1dimensional linear equations, that is:
Known point data is substituted into formula (23) to obtain λ i .
The last period of subsidence data is taken as an example. The contour obtained by using Kriging interpolation algorithm is not smooth as shown in Fig. 6.

Contour lines are not smooth.
The local polynomial method is one of the commonly used interpolation methods. Usually, the first-order polynomial, the second-order polynomial, and the third-order polynomial are the basis functions. Common interpolation functions are shown in formula (24). The local polynomial method needs to find a polynomial function suitable for sampling data modeling when interpolating the discrete sampling points. The higher order interpolation function is used to interpolate the same data, the higher the accuracy, the greater the fluctuation of the surface model. Therefore, when using this interpolation method for data processing, a reasonable order of interpolation function must be determined first.
The interpolation method is an inexact interpolation algorithm. The interpolation does not require the surface to pass through all sampling points. By considering the trend characteristics of discrete data, the trend surface of the sampled data can be fitted well. It is applicable to the case where the surface structure of the model is simple and the surface needs to be partially smoothed.
Through the above analysis, you can get real-time monitoring of sports classrooms under the complex sports environment, and you can monitor students in real time. On this basis, the corresponding model can be constructed and analyzed for system performance.
Based on the above analysis, this paper combines the open Internet of Things and cloud computing to build a real-time monitoring system for physical education classrooms. The results obtained on this basis are shown in Fig. 7.

The real-time effect monitoring system of college physical education classroom based on open IoT and cloud computing.
After that, the system performance can be verified, and the system performance can be analyzed. This study mainly analyzes the performance of the system in capturing motion and the effect of the system’s own data transmission. First of all, the analysis of the real-time monitoring effect of the efficient physical education class is carried out. A total of two physical education classes take physical education classes together, and all students are monitored in real time. Moreover, a total of 60 students are counted during the sports monitoring process. This study mainly counts the students’ sports action recognition, sports position recognition, and sports time statistics. The results are shown in Table 2 and Fig. 9.
Statistical table of system operation monitoring efficiency

Real-time monitoring images of student motion capture.

Statistical diagram of system operation monitoring efficiency.
As shown in Fig. 9, the statistical accuracy of the system monitoring results are all above 74, so the recognition efficiency is high.
On the basis of the above analysis, the system performance analysis is conducted, and the single group data transmission speed of 60 persons identified by the system is counted. The results are shown in Table 3 and Fig. 10.
Statistical table of data transmission speed

Statisticaldiagram of data transmission speed.
As shown in Fig. 10, the data transmission speed of the system are all below 85 ms, and the transmission speed is relatively fast, which meets the actual needs.
Information technology has greatly promoted the modernization of education. In the teaching process, mastering the identity information of students is the first step for teachers to understand the learning status of students. Aiming at the difficulty of locating students’ identity in low-resolution images of sports teaching scenes, this paper proposes a method based on the combination of open Internet of Things and cloud computing technology to locate students’ identities and actions in sports scenes. Moreover, this paper proposes several new and improved algorithms, which provide a theoretical and technical basis for the application of automatic identity positioning in large scenes. In addition, this paper uses field image data collection and field data processing to obtain the measured area scene and combines area scenes with field measured data to verify the accuracy and trend, and obtain students’ morphological characteristics. Finally, this paper builds a system model based on actual needs, and then performs performance verification on the system model. The research results show that the proposed method has good performance.
