Abstract
BACKGROUND:
Physical education and training are essential ways to improve the physical quality of the nation, and China has incorporated “building a healthy China” and “fitness for all” into its national development strategy, integrating a strong sports nation into the Chinese dream.
OBJECTIVE:
The study of digital recording and automated training in sports is of profound value. Motion capture technology can digitally record the training process in a digital physical education training system. At the same time, accurate modeling and calculation can analyze the training effects and give appropriate guidance and feedback. This study develops a new and improved hierarchical K-means algorithm by combining the known classification algorithm K-means with a hierarchical algorithm.
METHODS:
The performance of the old and new algorithms are compared and then applied to physical education training data to produce clustering results and analysis to reduce the model, which is used to reduce the number of parameters in the model and improve the recognition speed.
RESULTS:
The experimental results demonstrate that the relevant models proposed in this study achieve an average accuracy of 91.27% and 92.26%, respectively, which is better than a single network model and can effectively use big data for health event detection.
CONCLUSION:
The empirical results show that the improved model algorithm outperforms the single network model and can detect health events using big data.
Keywords
Introduction
Compared to the growing group of physical education and training, the adequate supply of sports in China is seriously inadequate: in terms of the number of coaches, there is a shortage of sports coaching talents in the central and western regions; in terms of the distribution of the quality of coaches, the proportion of high-level coaching talents is low, and the aging trend is serious [1]. Sports training simulation: Sports training simulation is an experimental technology science that reproduces and simulates the teaching experience of physical education teachers or coaches, their training means, the organizational scheme of managers, and the training process of physical education trainers through computer simulation technology, to achieve the interpretation, analysis, prediction, organization, and evaluation of the physical education training situation [2]. Sports teaching and training can be mainly applied to developing sports teaching and training simulators, visualization of human biological information monitoring and calculation, organization and management of large-scale sports teaching and training meetings, optimization and innovation of action technology, sports teaching and training, etc. [3]. It should be noted that the research of sports training simulation is just starting and is a new technology. There are many theories and methods to be introduced and studied.
Physical training is a trend of high, brutal punishment and punishment with rapid development, and modern technology is widely used in physical training [4]. Modern sports need constant scientific and technological intervention to maximize human potential, which requires integrating knowledge from disciplines related to sports science and using systems science (e.g., systems simulation) to study the intrinsic laws of physical education and training [5]. Sports training simulation is an experimental technology science.
There has been an explosion of information on keeping healthy. Suppose we simply rely on users to search on the Internet through search engines, on the one hand, due to their relative lack of expertise. In that case, they cannot find suitable physical education training and habits for keeping healthy quickly and efficiently, and on the other hand, it is difficult to obtain health advice through search engines according to their physique and lifestyle habits. On the other hand, it is difficult to obtain health advice based on the user’s personal constitution, lifestyle, and diet. This is why it is so important to have a recommendation system that can recommend different health events according to the user’s physical state and behavior. The progression from health to illness is cumulative, especially within the body [6]. An individual’s health status is not only controlled by genes within the body but is also influenced by lifestyle and living environment [7]. Maintaining good health to a large extent is possible by changing the lifestyle and living environment to reduce the risk of disease. Combined with the above background, this paper explores constructing a user-health portrait model based on individual user information, identifying and monitoring user activities in real-time by creating a deep learning model based on mobile phone sensor signals and admitting the underlying mobile phone sensor signals uploaded in real-time from mobile phones [8]. Then, according to the recommendation algorithm, it generates suitable health events for users, guides users to perform health events, reduces the risk of chronic diseases in healthy people, promotes group health, and provides low-cost, personalized health recommendation services for sub-healthy people.
Health risks are an inevitable negative factor in the development of society, from the government to the public, and they are becoming more and more complex, further increasing their destructiveness, spread, unpredictability, and difficulty of prevention and control. The contradictory nature of health risks in the communication process often increases the media’s uncertainty in the process of information processing and communication, resulting in the lack of public values, such as false news in the media. The academic community and the mainstream media must face the challenge.
It seems that, in today’s increasingly complex media environment and iterative media technology, how the mainstream media can use the public values presented in the reporting process before, during, and after a health risk event to defuse a series of negative impacts caused by the health risk event and the impact on social stability has become a complex problem that the mainstream media must face and urgently need to solve, and is also a test for modern society [10]. At the same time, whether or not a practical combination of factors and systems can be established to maximize the effect of public value communication in mainstream media has become a challenge for the modernization of mainstream media and their ability to deal with health risks [11]. For P.E. trainers to perform at a reasonable level, professionals must regularly provide some reasonable counseling and diagnosis [12]. However, the health of P.E. trainers is often uncertain and ambiguous, and most psychological services only provide straightforward responses to cross-cutting and complex health conditions. As the number of P.E. trainers increases, psychological professionals cannot effectively consider each individual’s physiological condition. The challenge is how to effectively personalize services to the health status of individual P.E. trainers.
The paper’s main contribution is developing and implementing a health-event-based recommendation algorithm for physical education training. This research aims to improve sports training quality by integrating knowledge from health events, analyzing recommendation algorithms, and applying them to a recommendation system. Specifically, based on K-means theory and classification algorithms, the H-K algorithm is being improved to better adapt to sports training quality and increase data accuracy. The empirical results show that the enhanced model algorithm is more effective than a single network model at detecting health events, as demonstrated by improved data accuracy.
Related works
Studies [13] have mentioned that because physical education training and lifestyle habits significantly impact physical health, many researchers at home and abroad have conducted in-depth studies. Scientific and rational physical education and training can reduce the incidence of chronic diseases. The majority of researchers have already researched recommendation algorithms in the health field with good results. The term “health big data” is not emerging; as early as the 1990s, many foreign research institutions began researching and constructing health testing big data. For example, in 1993, the United States put forward the “regional medical information structure” concept. It started to establish health service institutions in the region. By the beginning of this century, the United States had more than 9 million users and 26.5
A previous study [15] mentions that in addition to the significant progress made in health data collection in the United States, the Australian National Diabetes Dataset and the ISO Emergency Medical Care Dataset represent modern significant data research on health testing and are commonly used in medical research. They are also commonly used in medical research today. Mobile phone manufacturers such as Apple, Google, and Huawei have launched health platforms and health-related mobile phone applications designed to record the user’s heartbeat, body temperature, blood pressure, and physical education training and to provide the user with appropriate physical education training protection measures. Clustering is not a way of machine learning that groups data, observations, or vector components [16]. At present, technology collection is developing rapidly. Data collection is an integral part of mining technology and is widely used in various fields. Examples are machine learning, data mining, literature retrieval, image segmentation, and pattern classification. Even biology, psychiatry, and archaeology have been addressed.
Clustering, one of the coincidental aspects of exploratory data analysis, has been discussed and studied by many experts in many fields, reflecting its broad appeal and usefulness. However, due to the differences between different disciplines, having a familiar concept and method to evaluate clustering is difficult. By its very nature, cluster analysis is particularly well suited to exploring the intrinsic connections within data sets and assessing their structure [17]. The fact that physical education and training are rapidly evolving towards the high, complex, precise, and sophisticated makes it possible to make greater use of modern technological tools.
To maximize the potential of human beings, modern sports need continuous scientific and technological intervention, including making full use of relevant knowledge of sports science through mathematical simulation methods and internal legal education [18]. In recent years, simulation systems have emerged as hot spots for simulation, quality simulation, interactive simulation, image and simulation, multimedia, and intelligent simulation. Comprehensive domestic and foreign research on health applications, whether launched by mobile phone manufacturers or large Internet companies, must contain health-related apps and health data platforms [19, 20, 21]. The mobile phone APP side focuses mainly on data collection, while the data platform is a data warehouse for storing the collected user signs and physical education and training data.
However, data collection is limited to sensors integrated into mobile phones or applications within mobile phones, which is not enough in an era where intelligent wearable devices and smart monitoring devices are ‘all over the place.’ Our Health Check Big Data platform contains real-time monitoring data from ‘wearables’ or other sensor devices via the network and medical clinical data via a unified access protocol provided by Health Check Big Data [22]. This is the strength of this application, which has a robust data platform as the data source for the application, as well as a wealth of clinical data and test reports from medical institutions, which can be used to describe better and present the user’s health status and better ‘quantify the self.’ Sathyaprakash et al. [23] proposed a technique to forecast healthcare risks within the e-healthcare sector, emphasizing privacy and efficiency. Kumar et al. [24] investigated how plants react to environmental conditions by employing an expert system that integrates Artificial Neural Networks (ANN) with a crop production system.
In the health field, some research results have been achieved in daily activities, sports, training, and other health activities. Still, most of the above recommendations are based on user preferences and geographical location. They do not fully use the user’s daily activities, and the evaluation of the user’s overall health status is insufficient, so the recommended health behavior may not be suitable for users. Therefore, it is necessary to introduce user health status, real-time behavior data, and physiological information to help users build a reasonable lifestyle and recommend health events from multiple dimensions.
Hierarchical K-means algorithm optimization
Data sets are one of the main methods of data extraction, and they are mainly used to divide data sets into multiple categories (groups) [25]. The general summary includes raw data preparation, extraction function, proximity measurement, result summary, or evaluation. Figure 1 shows the typical sequence of the previous step, including reactions. In this case, the summarized results may affect the extraction function after similarity calculation.
Main process of clustering algorithm.
The similarity function is used to calculate data, with Euclidean distance and cosine similarity being the most commonly used. Euclidean distance is often used for structured data, while cosine similarity is most often used for vector space models of text. In addition, the following definitions are available, as shown in Eq. (1).
Where
The k-means model is a simple and effective algorithm applicable to blocks of large sample data [26]. The remaining objects assigned to each group will be displayed as a new combination, and the average value of all data in the new combination will be calculated, such as a new dataset, until the clustering data does not change or arrive at a certain elucidation value, the specific process described in Fig. 2.
K-Means algorithm diagram.
For any data object
The contour coefficient is a method of interpreting and validating the clustering results, and the contour coefficient can be defined in Eq. (4).
Or write it as in Eq. (5).
Because users have different preferences for physical training, the number of physical training is also different. To find the list of physical education training sessions that best describe the user’s preferences, all activities performed by the user can be normalized and expressed as Eq. (6).
Where:
Using Bayes’ theorem, the posterior density of the data
Where
Then
Where
Improved H-K clustering algorithm
The process of traditional sports training can be summarized in Fig. 3. At the beginning of the training, the trainee chooses, or the trainer specifies, the final goal of the whole training process. Accordingly, the skill indicators corresponding to the goals and their weight distribution are determined [27]. The trainer then uses a series of standard tests to determine the trainee’s initial skill level. After considering the final goal and the trainee’s current level, the trainer gradually releases the exercises to the trainee.
Traditional sports training process.
This paper further refines the traditional physical training process by regrouping the concepts and connections from the original framework into four modules: domain knowledge, trainees, physical education training evaluation, and controllers. The ideas, mathematical expressions, and detailed module requirements are also explained. This experiment provides pseudo-code for each significant aspect of system operation to improve understanding of the framework. Finally, we compared the pre- and post-quantified physical education training framework from the perspectives of both sources of system error and the prerequisite assumptions for a valid system, the complete framework of which is shown in Fig. 4.
General sports auxiliary training framework based on knowledge.
Improve the proposed K-Means algorithm to ensure the best raw data and complex time and solve the defects of the existing K-Means algorithm. Let
To verify these algorithms, iris, abalone, and data selected by the International Graduate School will be in the database. Open-source language and operating environment for statistical analysis and graphics (such as commercial software MATLAB). The sentence structure is similar to C, but there is more statistical analysis and actual data of C (especially in the process array bracket). This article uses a powerful language to extend the process array to applications, as shown in Table 1.
List of experimental data sets
After sorting and transforming the data, group analysis will be done according to the improved algorithm. Figure 5 shows the most groups, 2. This means the number of groups exceeds 2, but it is not the final result. First, you can use the hierarchical algorithm to collect the original data of the following k-means and calculate based on the improved algorithm.
Profile coefficient line diagram.
Therefore, the experimental data is divided into two types by layering, and then the improved algorithm is used to group the average data (k-means) of the two data types. After pre-calculation, data can be grouped and exported to Table 2.
Initial cluster centers formed after hierarchical clustering
Health event detection algorithm model.
To obtain better results in subsequent model improvements, the initial model’s hyperparameters should first be modified to find suitable hyperparameters to improve the model’s accuracy [28]. One of the more important hyperparameters in the CNN model is the number of feature maps; the initial model value is 8. Next, try values such as 8, 16, 32, 64, 128, 256, etc., to find relatively suitable parameters for the model.
Firstly, two one-dimensional CNN layers are set up, with the first layer reading the input sequence and projecting the generated result onto the feature map. The second convolutional layer operates on the feature maps created in the first layer, further amplifying their salient features. Each convolution layer uses 64 feature maps and reads the input sequence with a kernel size of 3-time steps. A pooling layer is set up to reduce the feature map to a quarter of its original size [29]. The feature map is then expanded into a long vector, which is used as the decoding layer input. A single hidden layer LSTM model with 200 units is then set up, and a fully connected layer with 200 nodes is set up after the LSTM layer to map the features learned by the LSTM to the sample tag space. Based on the incremental study of the above model, a parallel CNN and GRU model for human activity recognition is finally constructed in this paper, as shown in Fig. 6.
The overall architecture of the model in this paper is divided into three parts: the first part is to extract features through two convolutional layers, the second part is to obtain the temporal relationship in the sensor signal features through two GRU layers, the third layer is to expand the signal generated by GRU through the fully connected layer and input all the data from the fully connected layer into the Softmax function to obtain the classification result of human health finally. This paper analyses the requirements for the various services involved in the recommendation system for health events, focusing on quantifying user health status and physical education training as well as the core services in the recommendation system. To quantify physical education and training, the initial list of activities needs to be generated by initializing the user’s health status and filtering the activities from the database based on the user’s status. We collect primary data about the user’s body (age, height, etc.) as well as dynamic data (real-time heart rate, etc.), store the information in the corresponding unstructured database, calculate the range of activities that the user can maintain healthy and safe based on the above primary data, and then filter the user’s suitable activities from the range. Figure 7 shows the process of generating the initial physical education training list.
Initialize the generation process of the physical education training list.
Once the data records are fully queried and counted, the results are saved to Redis so that subsequent users can query them quickly. There is no need to wait long for a MongoDB database query; Fig. 8 shows the overall system architecture.
Overall system architecture.
The functional modules of the whole system are described. Figure 9 shows the functional modules of the recommendation system as a whole, from registration and login to the management of personal health, where the annotation function allows both the collection of data sets.
CPU calculation time comparison table
Schematic diagram of recommended system function module.
In summary, this chapter introduces the hierarchical clustering algorithm and the improvements to them before adding a critique of the advantages and disadvantages of the algorithms and the improved algorithms. Then, the improved k-means and k-means classification algorithms are combined with the enhanced k-means classification algorithm, and finally, a critique of the improved algorithm is made to facilitate later practical application with the physical education training source data.
Optimization results of the improved algorithm
To compare the algorithm’s performance, we used the improved K-Means-h-k algorithm to collect the data downloaded from the International Cycling Union website and compared the operation efficiency and collection degree. Table 3 and Fig. 10 show the CPU running time comparison.
As shown in the figure above, with the increase in data sets, the CPU at runtime is more prominent than the existing k-means. With the rise of time calculation and capacity, the k-means gradually improve the collection of hierarchical algorithms. This is because the algorithm for predicting coefficients has been improved. Only optimize the customer’s value to reduce the complexity of the algorithm. You also selected the algorithm effectiveness evaluation’s accuracy to display the group’s statistical level, using S data to verify the clustering effect, as shown in Table 4.
Comparison of the accuracy of the two algorithms on the Iris dataset Table (%) As shown by the above table, the improved H-K algorithm has improved the accuracy over the traditional K-means algorithm; although the improvement is not much, the effect is closer. The delivery indicates that the improved algorithm is significantly more efficient in running the algorithm for small sample data sets, and the accuracy has also improved.
Improved the K-Means algorithm applicable to the data set to collect 22 aspects such as mathematics education sources, social support, and physical education trainer burnout to obtain the final clustered data shown in Table 5.
Comparison of accuracy of two algorithms on Iris dataset (%)
Comparison of accuracy of two algorithms on Iris dataset (%)
Final cluster center
Comparison of CPU operation time.
At the end of the 3-month teaching experiment, control classes were tested on physical education test items using the same test content and according to the same test procedures and test criteria, and the results were compared with the pre-experimental test results, as shown in Table 6.
Comparison of boys’ sports test scores
Comparison of boys’ sports test scores
As shown in Table 7, there was a highly significant difference in the one-minute jump rope test scores of boys and girls in the experimental class before and after the experiment.
Comparison of the results of the one-minute rope skipping test between the experimental class and the control class after the experiment
In the previous section, experiments were conducted on the standardization of the data, and the experiments used accuracy to measure the effectiveness of the model. By using the same model and training and predicting the data before and after standardization separately, the accuracy results were obtained, with an accuracy of 90.14% (
Effect of data standardization on experimental accuracy.
Experimenting with the normalization dataset is a little more effective and gives the algorithm a small performance boost. Therefore, to improve the performance of subsequent algorithms, all subsequent algorithms will be experimented with the normalization dataset. A benchmark test using the algorithm containing only GRU and LSTM and the joint algorithm of CNN
Comparison of experimental results
In summary, this chapter illustrates the health event detection results, introduces the neural networks used in them, and then constructs a health event detection model based on neural units such as CNNs and GRUs, which is validated on a public dataset with an accuracy of 91.28%, which is better than the results of a single network. An accuracy of 92.67% was also achieved on the individual dataset, enabling an optimized path for the subsequent identification of health-related events in real-time and the detection of physical education training actions.
Based on health event detection, recommendation algorithms as the core, and recommendation systems as a result, this paper carries out a recommendation algorithm based on health events from the exposition and analysis of health events and the demonstration and research of recommendation algorithms. The recommendation algorithm is applied to the recommendation system. Based on K-means theory, classification algorithm, and characteristics of mathematical education data source, in this paper, the h-k algorithm has been improved to better adapt to the quality of sports training, and the data accuracy has been improved to 91.28% and 92.27%. The empirical results show that the improved model algorithm is more effective than the single network model and can use big data to detect health events.
Data availability
The experimental data used to support the findings of this study are available from the corresponding author upon request.
Funding
No specific funding was received to support this research.
Footnotes
Conflict of interest
The authors declared no conflicts of interest regarding this work.
