Evaluation of student physical fitness by integrating FP-growth algorithm

Abstract

In education, it is crucial to comprehensively evaluate the physical fitness of students. The limitations of information processing and analysis efficiency make it difficult for traditional evaluation methods to reveal deep physical correlation patterns. Given this, this study will focus on innovative evaluation methods that combine frequent pattern growth algorithms. An Apriori association rule model based on transaction compression and hash optimization is proposed to address association classification between physical fitness indicators. Moreover, this study optimizes operational efficiency through preprocessing techniques and hash acceleration strategies. It introduces enhancement parameters to accurately identify and establish strong association rules to achieve efficient and accurate evaluation of student physical fitness. The results showed that by comparing the running time of K-means + FP-growth and improved FP-growth under different support levels, the improved FP-growth tended to stabilize after a support level of 0.2%. The optimized model improved the execution efficiency by 82.87%–88.4% compared to Apriori and FP-growth in physical measurement data processing. The effectiveness and reliability of the improved algorithm were verified by measuring strong association rules with the introduction of enhancement degree. This study is expected to better understand the physical fitness status of students, and provide new ideas for educational decision-making and practice, which has profound practical significance for promoting innovation in physical fitness assessment methods.

Keywords

FP-growth algorithm physical fitness Apriori algorithm data mining assess

Introduction

In the current field of education, student physical fitness assessment (SPFA) has become a core link in measuring and promoting the physical and mental health (P-MH) development of adolescents. The comprehensive implementation of the “Healthy China 2030” strategy and the deepening of quality education in China have put forward higher requirements for the physical and healthy development of students.^1,2 Students are the future of the country, and evaluating their physical health can better understand the physical condition of contemporary students, laying a solid talent foundation for the development of the country. This is also the motivation for research. In the context of the big data era, traditional physical fitness assessment (PFA) methods are limited by their singularity and shallowness, and cannot deeply explore and accurately identify the complex factors and internal correlation patterns that affect students’ physical health. Especially when faced with student physical fitness (SPF) data that includes multiple dimensions such as strength, endurance, and flexibility, effectively extracting key information and conducting in-depth analysis has become an urgent issue to be addressed. Currently, the cross-integration of education and information technology has brought new opportunities to solve this problem. Data mining technology (DMT), as an important tool in modern data analysis, especially frequent itemset (FIS) mining techniques represented by frequent pattern growth (FP-growth) algorithms, has attracted attention due to its efficient ability to process large-scale data.^3,4 However, in the specific application of SPFA, the algorithm has not been fully developed and optimized. It has certain limitations in processing high-dimensional and nonlinear data, such as redundant rule mining and excessive memory usage.⁵ Based on the industry background and development status, this paper aims to explore the innovative integration of FP-growth algorithm into the SPFA system. By improving algorithms and optimizing data processing procedures, an evaluation model has been constructed that can reveal potential patterns in students’ physical health and meet the needs of refined management. This study combines the latest developments in various fields to address the shortcomings of traditional evaluation methods and provide new theoretical basis and technical support for improving the scientificity, accuracy, and practicality of SPFA. The innovation of this study lies in the introduction of improvement progress to measure strong association rules (SAR), emphasizing the effectiveness of the improved algorithm in association rule mining (ARM).

This article is divided into five sections. The “Introduction” section introduces the research background, problems, and solutions of SPFA. The “Related works” section reviewed the research results of SPFA and summarized the difficulties and shortcomings of the methods. The “Building an SPFA model integrating FP-growth algorithm” section introduces the optimization design method of the SPFA model with integrated FP growth algorithm. The “Performance verification of SPFA model integrating FP-growth algorithm” section designed performance verification experiments to validate the effectiveness of the proposed FP growth model in SPFA analysis. The “Conclusion” section summarizes the research methods, analyzes the experimental results, and points out the shortcomings and prospects of the methods.

Related works

The theory and practice of SPFA have always been a focus of attention in education, sports science, and health. Many studies have emphasized that physical fitness is an essential indicator for measuring the P-MH status of adolescents. Traditional evaluation methods are often limited to the analysis of a single dimension or static data, making it difficult to fully reveal the complex factors and dynamic changes that affect students’ physical health. Gao et al. investigated the impact of leadership development on leadership effectiveness from four dimensions: physical, socio-emotional, spiritual, and psychological, to explore the potential benefits of leadership development in higher education organizations. This cultural change had brought positive effects, including an increase in enrollment, student retention rate, student placement rate, as well as an improvement in teacher research and academic project rankings.⁶ Rahawi et al. launched the EIM Campus Program, aimed at improving healthy behavior on university campuses by promoting and guiding sports activities. The stress level significantly decreased after intervention, and there was also a statistically significant improvement in sleep quality. Although the overall physical activity time has improved, it has not reached a statistically significant level.⁷ Amer et al. conducted a cross-sectional survey of 617 Indian university students aged between 18 and 30, aiming to investigate the association and interaction between physical activity and sleep quality on the mental health of university students. The level of physical activity was significantly negatively correlated with anxiety and depression scores, while poor sleep quality is significantly positively correlated.⁸ Hirotaka et al. evaluated 138 healthy elementary school students using physical fitness tests and academic achievement tests, aiming to explore the relationship between physical fitness and academic performance of students in grades three to six of elementary school. There was a significant correlation between physical assessment scores and grades. The correlation was low, suggesting that this relationship may be influenced by time changes.⁹

The FP-growth algorithm, as an effective tool for data mining, is specifically designed to find FISs in large-scale datasets. The frequent pattern tree (FP-tree) facilitates the mining of FISs, circumvents the intricacies of multiple scans of the dataset, and thus enhances efficiency. Yong et al. used the FP-growth to mine strong correlation factors that affect electricity demand, and applied the Generalized Regression Neural Network (GRNN) algorithm to achieve electricity demand prediction, aiming to lift the performance of electricity knowledge text mining. By setting the mini-support and smoothing factor of GRNN reasonably, good performance in power text mining could be achieved.¹⁰ Wang et al. designed a split-based parallel FP-growth aimed at addressing the challenges faced by student course data analysis in universities in terms of storage capacity and computing power. The association rules between course grades obtained from this provided useful suggestions for the learning methods of students and the teaching methods of teachers.¹¹ Zhang et al. proposed a parallel mining algorithm aimed at solving the problem of FP-tree being stored in independent memory. In this way, the algorithm did not need to generate FP-tree for the transaction database, thus solving the problem of independent memory storage.¹² Li et al. proposed an improved FP-growth algorithm that adds a new tail attribute to the existing frequent item header table, thereby accelerating the construction process of FP-tree. The target was to investigate the collection, integration, and analysis process of tire quality data by analyzing abnormal quality issues during tire manufacturing. This improved FP-growth could effectively improve the efficiency of correlation analysis for tire quality anomaly data.¹³

Scholars have proposed various improvement methods for Natural Language Processing (NLP) tasks such as automatic text summarization and deep learning analysis to enhance processing efficiency and result quality. Kirmani et al. proposed an automatic mixed text summarization technique to address the challenge of quickly extracting core content from massive amounts of text information. This method combined the advantages of extraction and abstract summarization, retaining key sentences in the text while generating simplified expressions through natural language. The results showed that this method outperformed single type summarization in multiple evaluation metrics and could more efficiently convey the main points of the original text.¹⁴ The current automatic text summarization lacked abstraction ability. Mohd et al. proposed a hybrid summary generation method that combines NLP features, sorting algorithms, and WordNet. This method first sorted and selected key content from text sentences based on multiple features. Subsequently, a word replacement strategy was introduced to convert some vocabulary into simplified expressions, thereby enhancing the abstraction of the abstract. The results showed that this method outperformed traditional pure extraction summarization methods in information coverage and language conciseness.¹⁵ Asudani et al. systematically reviewed various mainstream methods and their applications regarding the selection of word embeddings and deep learning models in text analysis. By comparing the performance of different models in tasks such as text classification and sentiment analysis, a strategy for selecting embedding methods and deep structures based on domain requirements was proposed. The results showed that using domain specific word vectors combined with LSTM models could significantly improve the accuracy and efficiency of text analysis tasks.¹⁶

In summary, comprehensive quality education is increasingly receiving attention, and SPFA has become a research focus. However, traditional physical fitness tests suffer from incomplete data reflection and low processing efficiency. The FP-growth algorithm can improve data processing efficiency by constructing FP-tree. Therefore, integrating the FP-growth algorithm into SPFA is expected to comprehensively analyze data, explore underlying patterns, provide scientific support for personalized education, and promote students’ comprehensive growth.

Building an SPFA model integrating FP-growth algorithm

Construction and analysis of improved Apriori-based optimization algorithm

In recent years, socio-economic progress and changes in lifestyle have brought about prominent physical health issues among adolescents, such as obesity, myopia, and declining exercise ability. The Chinese government has raised the attention to the P-MH of young people to an unprecedented level in the “Healthy China 2030” national strategy. Therefore, scientifically and strictly implementing student PFA is a specific response to national policy requirements and a key measure to solve and improve the current physical health problems of young people.^17–19 In this context, the application of DMT is particularly important. This technology utilizes diverse algorithmic tools to explore, discover, and extract potential patterns, associations, and novel insights from massive data on SPF, aiming to provide information with decision support value. The ability to predict future trends and accurately identify the complex interactions and inherent laws between SPF is of significant benefit. Through such data analysis, it is possible to more effectively understand and optimize the physical development status of adolescents. Figure 1 shows the process of data mining.

Figure 1.

Flow chart of data mining for student physical fitness assessment.

In Figure 1, data mining is a systematic process that includes stages such as data collection and preprocessing, feature selection and transformation, model establishment and evaluation.²⁰ First, the multidimensional physical fitness test data are collected from students and the data quality and consistency during the preprocessing stage are ensured. Next, DMT (such as association analysis, cluster analysis, and regression analysis) is used to reveal the relationships between physical fitness indicators and the physical characteristics of different groups from massive data. Based on these insights, a scientific SPFA model is constructed. This model can accurately evaluate the current physical health status of individual students and predict their development trends. This provides personalized teaching plans and health management recommendations for educational decision-makers, effectively improving the overall physical health level of adolescents. Figure 2 shows the process of data preprocessing after collecting multidimensional physical fitness test data from students.

Figure 2.

The data preprocessing process after collecting multi-dimensional physical fitness test data from students.

In Figure 2, data preprocessing covers key operations such as data cleaning, missing value handling, outlier detection and processing, data transformation, feature engineering, data integration, data normalization, and data dimensionality reduction.²¹ ARM occupies a vital position in the construction of the SPFA model. By calculating the support between different physical fitness indicators, potential associations and patterns between various dimensions of physical fitness can be discovered. Among them, support is an important indicator in ARM, which is the possibility of the quantity of transactions containing both $A$ and $B$ in the database appearing in all transactions. The mathematical expression is shown in equation (1)

S (A \Rightarrow B) = P (A \cup B) = \frac{S_c (A \cup B)}{T_c}

(1)

In equation (1), $S$ represents support and $T_c$ represents the amount of all transactions. $S_c (A \cup B)$ is the gross transactions including $A$ and $B$ . Confidence is used to measure the strength or credibility of association rules. The confidence level represents the probability that the result (conclusion of the rule) will also occur simultaneously under the condition that the condition (premise of the rule) is met.²² That is, the probability with both $A$ and $B$ transactions are shown in equation (2)

C (A \Rightarrow B) = P (B | A) = \frac{S (AUB)}{S (A)} = \frac{S_c (AUB)}{S_c (A)}

(2)

In equation (2), $C$ represents the confidence level. $P (B | A)$ represents the conditional probability that includes both $A$ and $B$ . Enhancement is an indicator in association rule analysis that measures the ratio of the probability of including one transaction to the expected probability of another transaction occurring independently of each other. This indicator evaluates the degree of correlation between the observed probability of joint occurrence and the expected probability of two transactions being independent of each other. The calculation method for the degree of improvement (ID) is shown in equation (3)

L (A \Rightarrow B) = \frac{P (B | A)}{P (B)} = \frac{C (A \Rightarrow B)}{P (B)}

(3)

In equation (3), $L$ represents the ID. The ID>1 indicates that the correlation between $A$ and $B$ is higher than that of random occurrence. ID = 1 indicates that $A$ and $B$ are independent of each other. To effectively identify indicators correlated with physical fitness, this study proposes an improved Apriori-based association rule optimization (ARO) model. This model adopts the Apriori, which combines transaction compression and Hash, to optimize the efficiency while ensuring mining accuracy.^23,24 The improved model first applies transaction compression by merging identical transactions using a hash table. Each transaction is sorted and converted into a Hashable tuple, and the count of duplicate tuples is aggregated, thereby reducing redundant data processing. Next, hash optimization is integrated into FIS generation. Hash tables are used to calculate the number of occurrences of items in compressed transactions. Each project is mapped to the transaction ID set that contains it, enabling fast support for computation by adding the counts of compressed transactions. In the process of generating candidate itemsets, hash connections accelerate the creation of k-itemsets by merging (k-1) itemsets and pruning infrequent candidate itemsets using a hash table, and quickly retrieve support counts through the transaction ID set. For rule generation, the hash table stores the support values for FIS and effectively calculates confidence by referencing pre-computed antecedent support. Figure 3 shows the process of mining FIS using the Apriori algorithm.

Figure 3.

Apriori algorithm mining frequent item set flow.

The Apriori algorithm mining process in Figure 3 typically has two main phases: first, by scanning the transaction database, the frequency of individual items is identified to generate FIS. Then, by combining FISs, iteratively to generate FISs containing more items until they cannot be regenerated into new FIS. Figure 4 shows the efficiency method flow of association rules based on improved Apriori.

Figure 4.

Optimization model construction process based on improved Apriori association rule efficiency.

In Figure 4, the first step is to extract data from a physical testing center of a certain school, preprocess it, and present it in a form suitable for association rule algorithm processing. Next, the improved Apriori is used to mine the correlation of sports data. This process generates a large number of association rules that describe the relationship between physical fitness indicators. Under the support confidence framework, these rules are filtered to identify association rules with practical significance.

Construction of an evaluation efficiency optimization model based on improved FP-growth

When using the Apriori under the support confidence framework to explore the association rules between SPF and other possible influencing factors, this method may have the problem of redundant mining results. In view of this, this study further utilizes quantitative analysis methods and adopts the optimized FP-growth to deepen the potential relationship between student physical health and multiple background variables. The FP-growth establishes an FP-tree structure to quickly and effectively identify frequent patterns in large-scale datasets, thereby assisting research in gaining a deeper understanding of the complex social factors and their interaction mechanisms that affect student physical health. Figure 5 shows the process of FP-growth mining FIS.

Figure 5.

The FP-growth mining frequent itemset process.

In Figure 5, the FP-growth algorithm first constructs an FP-tree by scanning the transaction dataset through a single pass. Nodes represent different project items, while edges reflect the inclusion relationships of different items in transaction records.^25–27 During this process, the algorithm maintains a double linked list header structure in parallel to efficiently locate frequently occurring items. After completing the FP-tree construction, the algorithm will generate a corresponding conditional pattern library for each frequent item registered in the head table, and recursively establish a conditional FP-tree, gradually revealing all FISs. The pseudo-code for the FP-tree construction process is shown in Figure 6.

Figure 6.

Pseudo-code for FP-tree construction process.

Figure 6 shows the pseudo-code of the FP-tree construction process in the FP-growth algorithm. However, when dealing with extremely large datasets, the original FP-growth algorithm may significantly increase memory consumption due to the need to fully construct the FP-tree structure. Meanwhile, in highly repetitive FIS scenarios, the process of repeatedly constructing conditional FP-trees for each frequent item may become a computational bottleneck, thereby affecting overall efficiency. Therefore, this study further improves the FP-growth and proposes an FIS mining algorithm built on hash for FP-growth. Figure 7 shows the flowchart for constructing an improved FP-growth algorithm model.

Figure 7.

Improved FP-growth algorithm model construction process.

In Figure 7, the minimum support and confidence are first determined to ensure the discovery of meaningful association rules. The data are converted into a format that is available to the FP-growth, and then the algorithm is used to efficiently mine frequent patterns.^28,29 On this basis, rules that do not meet the minimum confidence level are eliminated, resulting in SAR.³⁰ The results are analyzed and the parameters are adjusted. If the rules are too few or redundant, they need to be re-mined. Finally, the association rules are evaluated to ensure the provision of information with practical application significance. After optimizing the core association rule algorithm, to further improve the feature extraction efficiency of physical fitness data, attention mechanism and object detection related technologies are introduced from the model architecture level to construct a more complete evaluation system. This study makes improvements on SENet and further utilizes the ECANet module. The difference between ECANet and SENet modules is that they remove the fully connected layer and use 1D convolution to improve feature fusion efficiency. The size of the convolution kernel is determined by an adaptive function to ensure multi-channel interaction.^31–33 The convolutional kernel size adaptive function is shown in equation (4)

k = | \frac{\log_{2}^{(C)}}{r} + \frac{b}{r} |

(4)

In equation (4), $k$ represents the size of the convolution kernel, $C$ represents the number of channels, and $b$ and $r$ are constants. Figure 8 shows the ECA mechanism.

Figure 8.

Structure of ECA’s mechanism.

In Figure 8, the structure diagram of the ECA mechanism mainly includes three parts. First, the input feature map is pooled through global average pooling to obtain the global feature vector. Next, 1D convolution is used to perform feature interaction on the global feature vectors, and finally normalized attention weights are obtained through activation functions. These weights are used to weight feature maps, enhance useful information, and suppress unimportant information. The target classification network receives the feature map regions output by the region of interest pooling network and classifies and locates the targets within them. The process expression of its convolution operation is shown in equation (5)

α_{n}^{l} = f (\sum_{i \in M_{j}} (α_{m}^{l - 1} \times k_{m, n}^{l}) + b_{n}^{l})

(5)

In equation (5), $f (x)$ represents an activated layer. The first feature of $α_{m}^{l - 1}$ is the convolution of the previous layer, and its convolution kernel $k_{m, n}^{l}$ is this layer. The characteristic of $α_{n}^{l}$ is the bias value $b_{n}^{l}$ of this layer. To accurately adjust the position of the region proposal box, this study uses a bounding box correction algorithm. This algorithm can accurately adjust the position of the proposal box to better align with the target. In an image, the target window is usually represented by a 4D vector, which includes the coordinates of the upper left corner, and represents the width and height. The process of correcting bounding boxes consists of translation and scaling transformations. The correction formula can be represented by equation (6)

{\begin{cases} G_{W}^{'} = A_{W} . \exp (d_{w} (A)) \\ G_{h}^{'} = A_{h} . \exp (d_{h} (A)) \end{cases}

(6)

The four variables $d x (A)$ , $d y (A)$ , $d w (A)$ , and $d h (A)$ in equation (6) are unknown variables required to complete the bounding box regression. Linear regression is a common statistical method used to analyze the linear relationship between independent variables and continuous objective variables. Object detection can use linear regression models to correct the position of candidate boxes, improving accuracy and precise positioning. This correction can adjust the position of the current candidate box based on its properties, making it better match the actual target position. The specific formula is shown in equation (7)

{\begin{cases} t_{x} = (G_{x} - A_{x}) / A_{w} \\ t_{y} = (G_{y} - A_{y}) / A_{y} \\ t_{w} = \log (\frac{G_{w}}{A_{w}}) \\ t_{h} = \log (\frac{G_{h}}{A_{h}}) \end{cases}

(7)

In equation (7), $[t_{x}, t_{y}]$ is the translation amount and $[t_{w}, t_{h}]$ is the scaling factor. By defining a set of learning parameters $W$ , the output result $Y$ of the input feature vector $X$ after linear regression operation can be made close to the true value, $Y = W X$ . This set of learning parameters is applied to the feature maps extracted by convolutional neural networks, defined as $\emptyset$ , which serves as input. The linear regression operation will perform four transformations on the input: $d x (A)$ , $d y (A)$ , $d w (A)$ , and $d h (A)$ , namely translation, rotation, scaling, and distortion. By minimizing the objective function, the learning parameters can be adjusted to make the output of linear regression as close to the true value as possible. This definition and adjustment process can improve the model’s ability to represent features and minimize the objective function as shown in equation (8)

d * (A) = W_{*}^{T} \cdot \emptyset (A)

(8)

In equation (8), $W_{*}$ is the parameter to be learned. $*$ represents the target window $x, y, w, h$ . $\emptyset (A)$ is the feature vector of the input Anchor. $d * (A)$ represents that the loss function can be used to evaluate the deviation between model prediction results and actual results in machine learning and optimization algorithms, as well as the relationship between students. During the model training process, the loss function serves as the core tool for evaluating model performance and accuracy. By gradually adjusting model parameters to minimize the loss function, the accuracy of model predictions is optimized, striving to minimize the gap between predicted results and actual values. The specific loss function formula is shown in equation (9)

l o s s = \sum_{i}^{N} {(t_{*} - W_{*}^{T} . \emptyset (A))}^{2}

(9)

In equation (9), $N$ represents the total number of anchors. $W_{*}^{T}$ is the offset from the candidate box Anchor to the true position Ground Truth.

Performance verification of SPFA model integrating FP-growth algorithm

This section tested and analyzed the improved FP-growth’s performance, mainly analyzing the performance of different methods under different support levels. This study conducted performance validation experiments on the research method in terms of execution efficiency, data preprocessing results, SAR results, and other aspects.

Performance verification of ARO model based on improved Apriori

This study conducted experiments on an Intel (R) Core (TM) i5-5200U 2.20 GHz processor, with 4 GB of memory, Windows 10 operating system, PyCharm runtime environment, and Python 3.8. This study uses physical examination data from 5000 students (academic year 2023–2024) from three universities in a certain province, covering freshmen to seniors. The data collection complies with the national student physical health standards. Among them, the demographic characteristics are shown in Table 1.

Table 1.

Some students’ physical test results.

Statistical dimension	Classification	Number of people	Proportion (%)
Gender	Male	2848	56.96
Gender	Female	2152	43.04
Age (years)	17–19	1842	36.84
	20–22	2566	51.32
	23–24	592	11.84
Professional category	Science and engineering	2134	42.68
	Literature and history	1625	32.50
	Eurhythmics	1241	24.82

The descriptive statistical results of health indicators are shown in Table 2.

Table 2.

Descriptive statistical results of health indicators.

Index	Sex	Mean value	Standard deviation	Minimum	Maximum
Height (cm)	Male	175.6	5.2	158	192
Height (cm)	Female	163.8	4.8	145	178
Weight (kg)	Male	65.8	7.3	48	92
Weight (kg)	Female	52.3	6.1	38	75
BMI	Male	21.3	2.1	16.5	28.7
BMI	Female	19.8	1.8	15.2	27.3
Lung capacity (ml)	Male	3850	620	2200	5100
Lung capacity (ml)	Female	2680	450	1500	3800

To ensure the model’s generalization ability, multiple strategies are adopted to control the risk of overfitting. In terms of dataset partitioning, 5000 student physical test data are divided into training and testing sets using an 8:2 ratio, and a 10-fold cross-validation is used to comprehensively evaluate the model. During each training session, the model is trained using 9-fold data, and the remaining 1-fold is used for testing. In terms of parameter adjustment, the grid search method is used to optimize key parameters such as support and confidence. By setting reasonable support and confidence ranges, the optimal combination of evaluation indicators is selected to reduce the risk of overfitting caused by manual parameter tuning. At the same time, a minimum lift threshold is introduced when training to prune the association rules, forcing the filtering of rules with dependencies close to random, further reducing overfitting. The data utilized in the experiment are sourced from the physical fitness center of a specific educational institution. These data include basic information such as student identification numbers, gender, and height, as well as more detailed information regarding physical fitness, including results from 50-meter sprints, standing long jumps, and 800-meter long runs. The experiment selects the physical examination results of 5000 undergraduate students in their first academic year, and extracts data related to the physical examination items using the student registration number as the main key. To improve the reliability of the results, the study preprocesses the selected 5000 data. The first step of preprocessing is to clean the missing data, and the second step is to reduce irrelevant data. The third step is to convert the data. Table 3 shows the physical examination scores of some students.

Table 3.

Some students’ physical test results.

Student number	Sex	Weight (kg)	Height (cm)	Vital capacity	Standing long jump (cm)	The seat body bends forward	Pull-up	Sit ups	800	1000
A	Male	67	180	4225	173	20	7	/	/	4.25
B	Female	55	165	2548	163	23	/	50	3.45	/
C	Male	66	178	3548	189	24	8	/	/	4.32
D	Female	54	160	3000	170	21	/	50	3.25	/
…	…	…	…	…	…	…	…	…	…	…
E	Male	77	173	4110	2.3	17	7	/	/	4.21
F	Male	72	175	3850	1.89	18	7	/	/	3.41

In Table 3, the student registration number served as the primary key, including gender, height, weight, lung capacity, 50 m, standing long jump, sitting forward bending, 800 m (female), 1000 m (male), pull-up (male), and sit ups (female) attributes. Due to gender differences, the corresponding testing items also varied. Therefore, the physical measurement data about gender were extracted separately. This study analyzed the Apriori algorithm, transaction compression-based Apriori algorithm (TC-A), hash-based Apriori algorithm (H-A), and Apriori combining transaction compression and hash (TC + H-A) under five support thresholds (STs) of 0.01, 0.1, 0.2, 0.3, and 0.4. The efficiency comparison chart under different support levels is shown in Figure 9.

Figure 9.

Efficiency comparison diagram under different support degrees.

According to Figure 9(a), as the ST increased, the Apriori algorithm and TC-A gradually approached each other in terms of running time. Although the gap between the two gradually decreased, TC-A slightly led in efficiency. In Figure 9(b), the improved TC + H-A algorithm performed better, especially in association rule analysis. When the threshold was 0.01, 0.1, and 0.2, excessive association rules might lead to redundancy. When the threshold was 0.4, there were fewer rules, but important information might be missed. Considering the number of rules and the effectiveness of information, 0.3 was chosen as the optimal ST. With a ST of 0.3, ten different confidence thresholds (CTs) (0.1–0.95) were used to validate the Apriori algorithm, TC-A, H-A, and an improved algorithm of TC + H-A, respectively. The result of running time is shown in Figure 10.

Figure 10.

Efficiency comparison diagram under different confidence levels.

In Figure 10(a), as the CT increased, the running times of Apriori and TC-A gradually converged, with TC-A being slightly more efficient. In Figure 10(b), H-A performed better than Apriori and TC-A. TC + H-A achieved better results, significantly faster than the other three algorithms. Too many redundant rules might be generated when the CT was 0.1–0.8, while there were fewer rules when the CT was 0.95. When the confidence level was 0.9, the generated rules covered the correlation of various indicators and had a significant impact on the key indicators of physical fitness. Therefore, this study chose 0.9 as the optimal CT. This study further utilized these four algorithms for ARM. The specific results of comparing execution efficiency time are shown in Figure 11.

Figure 11.

Comparison of execution efficiency of four algorithms (unit: second).

The comparison between Figures 11(a) and (b) showed that the proposed TC + H-A algorithm had better execution efficiency. Compared to the classic Apriori, the execution efficiency of physical measurement data for girls had increased by 86%, and for boys it had increased by 93%. Compared to TC-A, the execution efficiency of physical testing data for girls increased by 80.57%, while for boys it increased by 89.32%. Compared to H-A, the execution efficiency of physical testing data for girls increased by 48.1%, while for boys it increased by 55.69%. This indicated that when applying these four to male-body measurement data, the execution efficiency was higher compared to female data, especially when the dataset size increased, the improvement effect was more significant.

Performance verification of SPFA model based on improved FP-growth algorithm

To further verify the performance of the evaluation efficiency optimization model based on the improved FP-growth, this study further took the physical fitness test data of undergraduate students from a certain university as the experimental object. Thirteen types of attribute data correlated to physical test scores and physical fitness were extracted from the database, including student ID, gender, height, weight, lung capacity, etc. To ensure the reliability of the results, the experiment also carried out preprocessing operations such as data cleaning, data reduction, and data conversion on the original data. The minimum confidence used was 0.3. Under different minimum support levels (0.1%, 1%, 2%, 3%, and 4%), the mining time of Apriori, FP-growth, FP-growth based on k-means improvement, and the proposed improved model were compared. The results are shown in Figure 12.

Figure 12.

Performance comparison of different support levels.

In Figure 12(a), between support levels of 0.1% and 1%, the Apriori and FP-growth methods showed a faster decrease in runtime. The Apriori method had a runtime of around 36 when the support was 0.1%. In Figure 12(b), k-means + FP-growth and improved FP-growth showed a decreasing trend in runtime with support levels between 0.1% and 2%. After a support level of 0.2%, the improved FP-growth method showed a running time of 2.4. Overall, the improved FP-growth method had relatively less runtime and exhibited superior performance. The results of comparing the running time of four models with 10 different CTs ranging from 0.1 to 0.9 with a 3% support are shown in Figure 13.

Figure 13.

Performance comparison of different confidence levels.

In Figure 13(a), when the support was 0.1%, the running times of Apriori and FP-growth methods were around 21 and 8.5, respectively. The overall running time of both methods showed a decreasing trend, and the Apriori method had a relatively longer running time. When the support was 0.1%, the running times of the k-means + FP-growth and improved FP-growth methods in Figure 13(b) were around 7 and 3.5, respectively. The overall running time of both showed a decreasing trend, and the improved FP-growth method had relatively less running time. Overall, the improved FP-growth method had relatively less runtime and exhibited superior performance. Due to differences in physical fitness testing between men and women, ARM was used to compare the running time of the two types of data, as displayed in Figure 14.

Figure 14.

The running time of different ARM models in male and female physical fitness data testing.

The comparison between Figures 14(a) and (b) indicated that the proposed optimization model was more efficient than Apriori and FP-growth. In the female physical testing data, the algorithm execution efficiency improved by 82.87%, and in the male data, it improved by 88.4%. Compared to FP-growth, the data execution efficiency of female students had increased by 61.85%, while that of male students had increased by 66.72%. In Figure 14(b), the proposed optimization model improved efficiency by 43.27% in processing female physical measurement data and 50.11% in processing male physical measurement data. This study further validated the ARO model grounded on the optimized Apriori. The CT was set to 0.9 and the ST was 0.3. Table 4 shows the data of mining using the FP-growth model.

Table 4.

Strong association rules (girls).

Serial number	Support degree	Rule	Lift degree	Confidence degree
A	0.57	50m:C => ldty:C	1.027	0.910
B	0.46	800m:C, 50m:C => ldty:C	1.022	0.910
C	0.30	800m:C, fhl:C=> ldty:C	1.020	0.910
D	0.50	800m:C=> ldty:C	1.016	0.900
E	0.37	ywqz:C, 800m:C=> ldty:C	1.020	0.910
F	0.35	fhl:C, 50m:C => ldty:C	1.030	0.920

In Table 4, this study introduced ID to measure SAR. The support of all rules in the table was greater than 1, indicating a positive correlation between the antecedent and consequent of the rules, which were all valid SAR. Compared to the association rules obtained from the study, the quantity and content were the same, further verifying the effectiveness and reliability of the improved algorithm in the previous chapter. To better validate the performance of the improved FP-growth algorithm, other similar research methods were selected for comparison. The comparison methods included the fuzzy physical fitness analysis method (FPFAM) designed by Lasunon and the mixed hesitation fuzzy language factors (MHFLF) designed by Zheng et al.^34,35 The comparison results of accuracy and mean square error (MSE) of different algorithms are shown in Table 5. Among them, two contemporary algorithms, Frequent Pattern Fast Mining and Multi-Hash FIS Mining, have been added for comparison with the algorithm proposed by the research institute.

Table 5.

Comparison of accuracy and MSE of different algorithms.

Algorithm	Accuracy				MSE
	Number of experiments (%)				Number of experiments
	1	2	3	4	1	2	3	4
Apriori	87.51	85.24	86.73	88.34	3.21	3.74	3.65	3.52
FP-growth	89.97	90.12	91.65	89.19	2.98	2.72	3.02	3.14
k-means + FP-growth	91.77	92.34	93.46	92.55	2.71	2.66	2.54	2.82
FPFAM	94.68	93.62	93.37	93.89	2.49	2.57	2.36	2.40
MHFLF	95.25	96.23	95.49	96.18	2.07	1.86	1.73	1.95
Improved FP-growth	98.42	99.11	98.72	99.56	0.98	1.23	0.82	1.12

From Table 5, the maximum accuracy value was 99.56%, which appeared on the improved FP-growth algorithm, followed by MHFLF and FPFAM. The minimum accuracy of Apriori algorithm was 85.24%. The maximum and minimum values of MSE were 3.74 and 0.82, respectively, appearing in the Apriori algorithm and the improved FP-growth algorithm. This indicated that the improved FP-growth algorithm had higher accuracy, smaller errors, and better performance.

Conclusion

The widespread development of student sports activities and the increasing importance of PFA are becoming more prominent. To further explore how to integrate the FP-growth algorithm for SPFA, this study combined the FP-growth algorithm and proposed an optimized prior algorithm association rule model, which was improved using transaction compression and hashing techniques. At the same time, an evaluation efficiency optimization model based on the improved FP-growth algorithm was designed, which accelerated the algorithm’s operation and developed more accurate SAR. The results indicated that the TC + H-A algorithm performed the best when the CT was 0.9. Compared to the classic Apriori, the execution efficiency of physical testing data for females had increased by 86%, while for males it had increased by 93%. The optimized model improved the execution efficiency by 82.87%–88.4% compared to Apriori and FP-growth in body measurement data processing. The introduction of ID to measure SAR validated the effectiveness and reliability of the improved algorithm. This study contributes to the improvement of school physical education teaching plans and health management strategies, and is also beneficial for the scientific formulation and implementation of policies for the physical health development of adolescents. It reflects the positive role of the cross-integration of social science and data science and technology in promoting social progress.

However, there are potential biases and scalability challenges in the research. At the data level, the sample mainly consists of undergraduate students from universities. These samples have problems such as a single age group, uneven regional and population distribution, and only covering traditional physical measurement indicators without incorporating behavioral data, which affect the generalization of the model. At the algorithmic level, processing real-time data in the Internet of Things faces issues such as low efficiency in incremental transaction processing and significant differences in cross-age group characteristics. Future work can build an age stratified association mining framework, train models separately on datasets divided by age, and use transfer learning to achieve feature sharing and rule fusion, achieving cross-age group improvement. Meanwhile, an incremental FP-growth algorithm can be developed for real-time fitness applications, maintaining a dynamic FP-tree structure and introducing a thermal decay factor to adapt to the timeliness of data.

Footnotes

ORCID iD

Ting Wang

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Hebbi

Mamatha

. Comprehensive dataset building and recognition of isolated handwritten Kannada characters using machine learning models. Artif Intel Appl 2023; 1(3): 179–190.

Bhosle

Musande

. Evaluation of deep learning CNN model for recognition of Devanagari digit. Artif Intell Appl 2023; 1(2): 114–118.

Hasanvand

Nooshyar

Moharamkhani

, et al. Machine learning methodology for identifying vehicles using image processing. Artif Intel Appl 2023; 1(3): 170–178.

Preethi

Mamatha

. Region-based convolutional neural network for segmenting text in epigraphical images. Artif Intell Appl 2023; 1(2): 119–127.

Abba Haruna

Muhammad

Abubakar

. Novel thermal-aware green scheduling in grid environment. Artif Intell Appl 2022; 1(4): 244–251.

Gao

. Improving the physical quality of students by prescription teaching mode. Rev Bras Med Esporte 2021; 27(3): 331–334.

Chen

. An exploration of math anxiety of students: perspectives of intelligence and motivation theory. Adv Manag Sci 2023; 12(2): 13–16.

Amer and Ghrouz

Noohu

Dilshad Manzar

, et al. Physical activity and sleep quality in relation to mental health among college students. Sleep Breath 2019; 23(2): 627–634.

Hirotaka

GIMA

Koji

SEKI

. A longitudinal study of the association between physical fitness and academic achievement in school-age children. Rigaku Ryoho 2019; 34(1): 119–124.

10.

Yong

Zhan Long

Jun Di

. Power knowledge text mining based on FP-growth algorithm and GRNN. CSN 2021; 48(8): 86–90.

11.

Wang

Jiao

. Research on association rules of course grades based on parallel FP-growth algorithm. J Comput Methods Sci 2020; 20(3): 759–769.

12.

Zhang

, et al. An improved FP-growth algorithm based on projection database mining in big data. JIHMSP 2019; 10(1): 81–90.

13.

Ding

. Data analysis of tyre quality based on improved FP-growth algorithm. Comput Model Eng Sci 2019; 30(2): 244–251.

14.

Kirmani

Hakak

Mohd

. Hybrid text summarization: a survey. In: International conference on soft computing: theories and applications (SoCTA 2017). Berlin: Springer, 2019: 63–73.

15.

Mohd

Shah

Bhat

, et al. Sumdoc: a unified approach for automatic text summarization. In: Proceedings of the 5th international conference on soft computing for problem solving (SocProS 2015). Singapore: Springer, Vol. 1, pp. 333–343, 2016.

16.

Asudani

Nagwani

Singh

. Impact of word embedding models on text analytics in deep learning environment: a review. Artif Intell Rev 2023; 56(9): 10345–10425.

17.

Peng

Lihong

Haidou

, et al. Investigation the influence of miniaturized RFID tag sensor on coupling effect. Sensor Rev 2021; 41(4): 425–435.

18.

Srinivasan

Ali

UHH

. Energy harvesting wireless sensor for achieving self-powered structural health monitoring system. Circ World 2020; 46(4): 307–315.

19.

Zheng

. The connotation and realization of student-centeredness at Cardiff university. J High Educ Res 2022; 3: 102–107.

20.

Kolokoltsev

Iermakov

Tretyakova

, et al. Physical activity as a factor to improve the quality of student life. Obraz Nauka 2020; 25(5): 150–168.

21.

Bienstock

Shankar

Kim

, et al. Physical activity measures in lumbar laminectomy patients: a prospective comparison of fitness tracker measures versus patient-reported outcome measures. Spine J 2020; 20(9): 169–170.

22.

Bai

, et al. Anti-collision algorithm based on slotted random regressive-style binary search tree in RFID technology. IET Commun 2022; 16(10): 1200–1208.

23.

Xuan

. Efficient anti-collision algorithm for RFID EPC generation-2 protocol based on continuous detection. Int J Wireless Inf Network 2020; 27(1): 133–143.

24.

Luo

Jing

Chen

, et al. A new underdetermined NMF based anti-collision algorithm for RFID system. ISA Trans 2021; 123(9): 472–481.

25.

Zhang

Yue

Shi

, et al. A fuzzy PID algorithm-based attitude control method of suspension-type small rail vehicles. J Vib Eng Technol 2022; 10(1): 111–139.

26.

Gagie

Navarro

, et al. Tree path majority data structures. Theor Comput Sci 2020; 833(4): 107–119.

27.

Sheng

Liu

, et al. A group-based binary splitting algorithm for UHF RFID anti-collision systems. IEEE T Commun 2020; 68(2): 998–1012.

28.

Samizade

Shen

, et al. Passive browser identification with multi-scale convolutional neural networks. Neurocomputing 2020; 378(8): 238–247.

29.

Chowdhury

BDB

Masoud

Son

, et al. A dynamic data driven indoor localisation framework based on ultra high frequency passive RFID system. Int J Sens Netw 2020; 34(3): 172–187.

30.

Pappan

Anandhakumar

. RMF based target position estimation and collision probability forecasting technique in freeway junctions. Int J Heavy Veh Syst 2020; 27(1): 145–163.

31.

Hailemariam

Lai

Jayadi

, et al. A knowledge-based query tree with shortcutting and couple-resolution for RFID tag identification. Comput Commun 2020; 160(3): 779–789.

32.

Munfarijah

Lucia

. Implementation of FP-growth algorithm in determining food package recommendation in Sunan Giri ribs meatball restaurant. IJCA 2020; 176(24): 15–20.

33.

Barrett

Gaskins

Haug

. Higher education under fire: implementing and assessing a culture change for sustainment. JOCM 2019; 32(1): 164–180.

34.

Lasunon

. Fuzzy analytical method (FAM) for physical fitness assessment. FJMS 2021; 131(1): 55–68.

35.

Zheng

. A novel weight-derived method and its application in graduate students’ physical health assessment. Int J Intell Syst 2021; 36(1): 200–236.