Abstract
Faced with the situation that the elderly people at home have dangerous behaviors, the study explores various aspects of motion target detection, real-time target tracking and behavioral pose recognition and classification, using behavioral poses in videos as samples. To tackle the challenges in detecting motion targets, a target detection method based on Gaussian mixture model (GMM) and four frame difference method is proposed; A tracking technique incorporating Kalman filter (KF) is investigated to trail the behavioral changes of the elderly in actual time. A seven-layer convolutional neural network (CNN) is constructed to face the problem of inaccurate behavioral pose recognition. Through relevant experimental analyses, the outcomes show that the increased GMM detection way has a complete profile and the accuracy is significantly improved. The KF target tracking technique can trail the object trajectory in actual time and steadily, with the smallest trailing error value of 0.19. The classification accuracy of the CNN pose recognition model is 95.87%, and the pose classification time is 27 seconds. Its performance is superior to the mean shift algorithm, particle filter algorithm, and Cam Shift algorithm in all aspects. When applied in practice, it can accurately identify whether the elderly’s behavior is abnormal and ensure their daily health.
Keywords
Introduction
With the advancement of medical care, the average age of our population has increased significantly. The proportion of the aging population has gradually increased, with relevant data showing that the current number of elderly people in China exceeds 300 million [1]. As the physical quality of the elderly continues to decline, their movements become slower and their memory gradually declines. They become highly vulnerable to injury in the event of a fall, with very serious consequences. Most of the elderly live alone, and if they are inadvertently injured, they are often unable to be treated in time, causing great harm to their health [2]. In the face of the above problems, real-time detection and tracking of abnormal behavior in the elderly, health testing of the elderly, and issuing warnings are of great significance for timely rescue of elderly people in danger [3]. The elderly’s health detection and recognition methods can determine their health status and risk factors in real time. When the elderly engage in unhealthy behaviors, they can accurately identify their current status and promptly remind their families or hospitals to achieve early detection and intervention of potential health problems in the elderly. This can help them maintain good physical health and quality of life. Current target detection techniques have certain shortcomings, and the study combines Gaussian Mixture Model (GMM) with the four-frame difference method for motion target detection; a mainstream motion target tracking technique is introduced for the sake of achieving actual time trailing of the elderly’s behavior. On the basis of the detection and tracking of moving targets, a method for the manual extraction of features to recognize abnormal human poses was initially proposed. As technology progressed, the use of machine learning algorithms for classification training has significantly improved the accuracy rate for recognizing human behavior [4]. Convolutional Neural Network (CNN) can be applied to a variety of scenes and pose detection and recognition, and stands out among the elderly pose recognition models; therefore, the study constructs CNN elderly pose classification and recognition models. It is expected that the researched motion target detection, tracking and pose recognition techniques can be effectively applied in elderly health detection, ultimately leading to enhanced elderly safety. The research aims to conduct real-time detection, tracking, and classification of abnormal behaviors that affect the health of the elderly, so that they can receive timely assistance in case of danger and provide better protection for their safety. The above research has important practical application significance, helping to improve the quality of life and safety level of elderly people, and providing strong decision-making and intervention support for social workers and medical personnel. This has great application value. The research structure is mainly divided into four parts. The first part is a review of relevant research results; The second part is the design of a moving object detection algorithm that integrates Gaussian mixture model and four frame difference method, a real-time tracking algorithm for detected objects based on Kalman filter, and an improved CNN elderly pose classification model; The third part is to verify the effectiveness and feasibility of the proposed methods in the study; The paper concludes with a summary of the research.
Related work
With the growing elderly population, attention has shifted towards their health concerns. Iazzi and other researchers studied a detection system for elderly people’s fall posture, utilizing a support vector machine to classify abnormal behaviors. The detection outcomes on the public data set show that the detection system has a low error rate and can effectively detect the elderly’s fall posture [5]. The research team of Liaqat uses sensor equipment to monitor the daily activities of the elderly, and uses a variety of classification algorithms to classify the behaviors of the elderly. Relevant outcomes show that the accuracy of the classification algorithm is more than 90%, surpassing that of the comparison algorithm [6]. To improve the sleep quality of the elderly, Tang et al. proposed an electronic mattress to monitor the sleep posture and sleep quality of the elderly, using CNN to analyze various sleep postures of the elderly, and experimental outcomes show that the effectiveness of this approach, with classification accuracy at 90% [7]. Tay CZ et al. proposed a novel unlabeled gait estimation and tracking algorithm to reduce the risk of serious injuries during exercise. This algorithm can automatically capture human joints for posture evaluation and analyze human motion. The proposed system is implemented on the Intel Up Squared board and can achieve 9 frames per second, with a gait recognition accuracy of 95% [8]. Huang S et al. designed a subway personal ticket evasion behavior detection method based on skeleton sequence and time series to alleviate the adverse effects of subway station ticket evasion behavior. The experimental results showed that the proposed method can effectively identify human body status and detect personal ticket evasion behavior, including jumping and squatting [9].
Divya scholars used a posture prediction system for the identification of abnormal activities of the elderly, achieving the purpose of real-time target detection as well as abnormal activity classification. Performance test results showed that the system can handle different types of poses in different environments. Moreover, the system exhibited good anti-interference capability for environmental changes [10]. Mandischer N et al. proposed a multi-pulse detection system based on laser and radar sensors to address the limited field of view of target objects in practical complex scenarios. The proposed tracking pipeline was trained and widely validated on a new dataset, and the results confirmed that the radar tracker achieved state-of-the-art performance, with laser and fusion trackers outperforming recent methods [11]. The research team of Berlin used convolutional filters to detect and rank poses by calculating image scores. Comparative experimental outcomes show that the detection way studied has practical applications and is highly feasible [12]. Su scholars combined radar technology to detect human motion characteristics, using signal waves generated by the target for feature analysis. Relevant data showed that the technique can accurately detect postural changes during falls, as well as fall warnings [13]. Nour et al. used a posture estimation system to detect the daily life of elderly people living alone and to perform elderly health care interventions. The results of the study indicate that the system has potential to effectively monitor the safety of elderly individuals and can be applied in the healthcare field [14].
Through a brief review of the results of domestic and international research scholars, it is found that health monitoring for the elderly has attracted the attention of many scholars, of which posture classification and recognition is an indispensable part. Among many classification algorithms, CNN is widely used. Therefore, the study applies CNN to identify abnormal postures in the elderly, aiming to make an important contribution in the field of elderly health detection by detecting dangerous behaviors in a timely manner.
Research on the detection and identification of abnormal human posture in health testing of the elderly
Moving object detection algorithm incorporating GMM and four-frame difference method
As the physical quality of the elderly decreases, once they fall and other behaviors often cause physical damage, it is crucial to detect abnormal behavior of the elderly and give a distress signal in time. Before abnormal pose recognition, the motion target needs to be detected first [15, 16, 17]. Several commonly used object detection ways are optical flow way, background difference way, inter-frame difference way, Gaussian mixture model, etc. Compared to other methods, GMM has greater advantages, and the study optimizes it and applies it to motion target detection [18, 19, 20]. GMM is a combination of Gaussian models with different parameters according to set thresholds, firstly, background modeling is performed, and a pixel point of a video image is set to be represented by
In Eq. (1),
In Eq. (2),
In Eq. (3)
For the sake of distinguishing the background and foreground of the graph, the ratio of the pixel points
For video, each frame needs a background distinction. The GMM analyses whether the pixel values satisfy the first
In Eq. (6),
In Eq. (7),
The process of detecting a moving target using the GMM and the four-frame difference method is shown in Fig. 1. The initial image sequence is obtained and denoised. Then, a background model is obtained using the GMM, and the detection area is obtained using the four-frame difference method.
Flow Chart of GMM combined with four frame difference method for target detection.
Among the numerous methods for target tracking, Kalman Filtering (KF) can achieve actual time tracking and detection of multiple targets with a relatively simple operation process, and is used to trail the behavior of the aged in actual time [23, 24, 25]. The state equation and the observation equation are shown in Eq. (9).
In Eq. (9),
Principle of KF real-time target tracking.
When predicting the target’s position using the KF, due to the small difference between the image changes in the upper and lower frames of the video, the object movement process in the video can be treated as a uniform linear motion. To mark the motion range of the tracking target, an external rectangular box is used, and the motion formula for detecting the target is presented in Eq. (10).
In Eq. (10),
In Eq. (11),
In Eq. (12),
After detecting and tracking the behavior of elderly people in surveillance videos, they need to be classified and processed to identify whether abnormal behaviors such as falling or difficulty in getting up and lying down occur [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]. The extraction of human behavioral features is a crucial step in describing an individual’s posture and movements. Convolutional neural networks are the most common method of feature extraction, with forward propagation in order to calculate the actual and expected error, and the error values are back-propagated to correct the weight parameters layer by layer, ultimately achieving the minimum error [28, 29]. A neural network includes three basic structures of import, implication and output. Let the import signal data be
In Eq. (13),
In Eq. (14),
In Eq. (15), lr represents the learning efficiency,
Structure diagram of CNN.
In the convolutional layer, the order of image information is defined by the convolutional kernel based on a certain pixel size. Logical operations are then executed on the elements of the filter, with the resulting calculation outputting the pixel value of the image post-convolution. The convolutional layer can perceive the image information locally and collate it to obtain the overall information. This approach can reduce the number of network parameters and lower the operational burden of the algorithm [30]. At the same time, the connection weight parameters of the neurons remain the same, and for a given input image information, each neuron only detects one feature point, which further reduces the complexity of the operation. To ensure that the overall information of the image can be captured, often the convolutional layer will contain multiple convolutional kernels, which can be used to also extract feature information for detecting different images and computing different feature planes. The pooling layer takes the average or maximum value of the same pixels detected by the convolutional kernels for subsequent operations. The pooling operation merges data and reduces image complexity, while accurately extracting crucial information. Thus, it decreases image dimensionality and simplifies pattern operations.
Operation process of the convolution layer and pooling layer.
After performing convolution and pooling operations, the fully connected layer performs a comprehensive analysis of the extracted relevant features to ensure the minimum amount of information loss and transmits the final result to the output layer.
Performance analysis and application of improved motion target detection algorithm
For the sake of analyzing the function of the improved motion object testing way, which combines the four-frame difference method with GMM, the study intercepts the same frame in the same video for target detection and compares it with the four-frame difference method, GMM and the improved algorithm. Figure 5 shows the target detection results of multiple algorithms in an indoor setting.
Target detection results of several detection algorithms in indoor environment.
As can be seen from the human fall image in Fig. 5, the indoor environment has few shadows, allowing many detection algorithms to capture a more complete human silhouette. For Fig. 5(b) it can be seen that the GMM has some shortcomings in the background and foreground distinction, and will identify the part of the background image as the target result, with the worst recognition accuracy and completeness among several algorithms. Figure 5(c) shows the detection outcomes of the four-frame difference. Compared with Fig. 5(b), the detection accuracy has notably increased, the anti-interference ability is stronger, and the human contours are more complete, but there are some voids in the detection results. The test outcomes of improved GMM are shown in Fig. 5(d), where the testing target is complete and the interference noise and hollowness are significantly improved, which further enhances the detection accuracy. From a quantitative perspective, the recognition accuracy of GMM is 88.63%, which has the advantage of relatively complete results and certain anti-interference ability, while the disadvantage is low computational efficiency; The target detection accuracy rate of the four frame difference method is 92.17%, which has the advantages of low computational complexity and is suitable for moving targets in constantly changing environments. However, its disadvantages include incomplete external contours and unstable target range. The detection accuracy of the four frame difference method combined with GMM is 98.97%, which has strong anti-interference noise ability, reduces the occurrence of voids, and the detection target contour is very complete. The disadvantage is that when multiple targets overlap, the detection accuracy will be affected.
For the sake of analyzing the function of the three detection ways in the presence of multiple objects under the interference of light, the study was carried out outdoors for detection and identification. The outcomes are shown in Fig. 6.
Detection results of three outdoor detection algorithms.
As shown in Fig. 6, the shadowed parts of the target detection results of the GMM and the four-frame difference method are reduced in the presence of light interference; In the case of multiple targets, several algorithms are able to detect them accurately. Figure 6(b) shows the worst detection results, with the foreground, background and shadow parts not being detected, and the human silhouette being relatively intact. The four-frame difference method in Fig. 6(c) is more complete, but there are cases where the shaded parts are detected incorrectly and there is more noise, which affects the accuracy of the results. The results from detection in Fig. 6(d) are comprehensive with minimal voids or shadows. The algorithm has produced significantly better target detection results compared to the comparison algorithm. High resolution images can provide more accurate motion data through motion markers, thereby improving the accuracy of motion capture. By integrating the effects of different algorithms in capturing human contours in indoor and outdoor environments, different algorithms have better results in capturing human contours in indoor environments. This is due to the relatively stable lighting conditions and low background complexity in indoor environments. In outdoor environments, the lighting conditions are more complex and variable, and the outdoor environment also contains many elements such as trees, buildings, pedestrians, etc. Therefore, different algorithms have poor capture effects in outdoor environments. Based on the above results, it can be concluded that the algorithm combining GMM and four frame difference method effectively avoids the shortcomings of GMM, and the completeness and accuracy of the detection results have been effectively improved.
In order to analyze the performance of moving object tracking based on KF, different algorithms were studied for object tracking in various scenarios, and each scenario requires different algorithms to be presented. In order to more scientifically verify the performance of the proposed method, the study selected current mainstream tracking algorithms for comparative experiments, such as Mean Shift (MS) algorithm, Particle Filter (PF) algorithm, and Cam Shift (CS) algorithm. The real-time tracking error effects corresponding to different algorithms are shown in Fig. 7.
Tracking accuracy results of four algorithms.
From Fig. 7, the trailing error of the PF fluctuates between 0.4 and 0.75; the trailing error of the CS algorithm fluctuates between 0.7 and 0.4; The tracking error of the MS algorithm fluctuates between 0.2 and 0.5; and the tracking error of the KF algorithm fluctuates between 0.25 and 0.15. The average tracking errors of the four algorithms are 0.61, 0.55, 0.36 and 0.19 respectively. The fluctuation range and the average error show that the PF algorithm has the largest tracking error, while the KF algorithm has the smallest. Meanwhile, KF has the best tracking stability and can achieve better real-time tracking effects of moving targets in different scenarios. In the above results, different algorithms exhibit significant differences in tracking errors, but the fluctuations in tracking errors are relatively smooth. This indicates that the differences in tracking objects and actions will to some extent affect the tracking performance of the algorithm, but the impact is not sufficient to cause significant changes in tracking errors. By integrating the effects of different algorithms in capturing human contours in indoor and outdoor environments, different algorithms have better results in capturing human contours in indoor environments. This is due to the relatively stable lighting conditions and low background complexity in indoor environments. In outdoor environments, the lighting conditions are more complex and variable, and the outdoor environment also contains many elements such as trees, buildings, pedestrians, etc. Therefore, different algorithms have poor capture effects in outdoor environments. To further explore the performance of the proposed method, the study evaluated tracking accuracy and time consumption. The tracking accuracy results of different algorithms are shown in Fig. 8.
Comparison of tracking accuracy results of different algorithms.
From Fig. 8, it can be observed that all four algorithms tend to stabilize as the number of iterations increases. The tracking accuracy of KF algorithm-based motion detection targets remains constant at 98.96%, while the tracking accuracy of MS, CS, and PF algorithms are 89.32%, 88.16%, and 76.59%, respectively. This indicates that the tracking performance of motion detection targets based on KF algorithm proposed in the study is superior to other algorithms. In general, the current tracking methods are described using an accuracy chart, and research also evaluates different algorithms using time consumption. Figure 9 illustrates the specific results.
Comparison of tracking time consumption results of different algorithms.
From Fig. 9, it can be seen that the KF algorithm based motion detection target tracking method has a more reasonable consumption time and a very short response time, averaging 0.01 seconds; The time consumption change curve of the PF algorithm fluctuates greatly and the response time is relatively long, at 0.79 seconds. The time consumption change curve corresponding to the MS algorithm fluctuates slightly, but the response time is relatively long, at 0.68 seconds, which needs further optimization; The average time consumption corresponding to the CS algorithm is 0.63 seconds. The above results confirm that the KF algorithm has better tracking performance and shorter consumption time while maintaining high accuracy. To verify the effectiveness of KF-based motion detection target tracking in real time, a video of a person moving around indoors was selected for the study and three random frames were taken to show the tracking results, which are shown in Fig. 10.
Target tracking results of the KF algorithm.
Figure 10 shows the trailing outcomes of the KF for a moving target. The three images are the 18th, 26th and 34th frames of the captured video. The observed data consistently shows that the algorithm can accurately track the target in real-time as it moves, with no tracking errors within the rectangular frame. This demonstrates the algorithm’s ability to achieve stable and precise tracking of moving targets.
In order to achieve the recognition and detection of abnormal behavior of the elderly, homemade images of people’s behavior were selected as the dataset, and the CNN model was trained and tested. The detection model constructed was a 2-layer convolutional, a 2-layer pooling, a 1-layer fully-connected, and import and output layers. The specific network structure parameters are shown in Table 1.
Parameter setting of CNN inspection model
Parameter setting of CNN inspection model
In the detection model, the convolutional layers are all 5*5 in size, which allows the full acquisition of image features at this size while the complexity of the model is small, and the image information is extracted accurately and efficiently. The alternation of convolution and pooling layers can ensures the transfer of sample information without translation or distortion.
The CNN model was trained on a self-made dataset to classify abnormal pose recognition in the elderly, and its model training results are shown in Fig. 11.
From Fig. 11, the recognition error rate of the CNN on different data sets showed a decreasing trend with increasing training times, where the recognition error rate on the training set decreased to 8.57% after about 120 training times and remained basically unchanged after 90 training times on the test set. Its recognition error dropped to 12.49% and then ceased to change. The error rate on the validation set does not decrease but increases after 370 attempts, indicating that the model’s performance has reached its optimal level and training can be stopped.
To verify the classification accuracy of the studied CNN-based pose detection model for abnormal behavior, Support Vector Machines (SVM) and Recurrent Neural Network (RNN) were selected for comparison experiments. The pose classification results of several algorithms are shown in Table 2.
Pose classification of several algorithms
CNN abnormal posture recognition results.
From the elderly pose classification and recognition results in Table 2, the CNN model has the highest classification accuracy of 95.87% and the RNN has the worst recognition accuracy of 86.29%. The model run time shows that CNN can efficiently complete the elderly pose recognition situation, and the error in the classification results likewise indicates the superiority of the CNN model. In addition, the running time of CNN is 27 seconds, and in actual situations, abnormal behavior and posture of the elderly need to be detected within 1 minute to facilitate subsequent handling of falling events. Therefore, this running time can better meet practical applications. Therefore, the use of CNN models for recognizing poses of the elderly can be effective in obtaining accurate results and provide an accurate basis for elderly health and safety issues.
With the increasing aging of the population, the quality of life and safety of the elderly have attracted wide attention. The body function of the elderly is weak, prone to disease and accidental injury. Timely detection and prediction of abnormal behavior of the elderly can provide better protection for the safety of the elderly. Therefore, the research first used the improved GMM model and KF tracking technology to detect and track the daily behaviors of the elderly in real time. Then, by extracting the posture features of the elderly, the CNN model was used to identify whether the behavior and posture of the elderly were abnormal. Firstly, the performance of the improved moving target detection algorithm combined with the four-frame difference method and GMM is verified, and the target detection results of the four-frame difference method, Gaussian mixture model and the improved algorithm are compared. The experimental results show that under indoor conditions, the recognition accuracy of GMM is 88.63%, and it has certain defects in the distinction between background and foreground. It will recognize part of the background image as the target result, and the recognition accuracy and integrity are the worst among the several algorithms. The accuracy rate of the four-frame difference method’s target detection precision force is 92.17%, and its target detection accuracy rate is significantly improved, the anti-interference ability is strong, and the human body profile is relatively complete, but there are some void phenomena in the detection results. The detection accuracy of the four-frame difference method combined with GMM is 98.97%, and the detection target contour is complete, and the interference noise and void phenomenon are significantly improved, and the detection accuracy is further improved. However, under outdoor conditions, when there is light interference, there are shadow parts in the target detection results of GMM and four-frame difference method, and the recognition accuracy decreases. In the case of multiple targets, several algorithms can detect accurately; From the overall analysis, the detection results of the proposed four-frame difference method combined with GMM method are complete, with basically no self shadow and a few holes, and the target detection results are significantly better than other algorithms. In summary, the proposed method can achieve good detection results in both indoor environment and complex outdoor environment, effectively avoiding the shortcomings of GMM, and effectively improving the integrity and accuracy of detection results.
Secondly, the tracking performance of moving targets based on KF is analyzed, and the target tracking performance of algorithms in different scenarios is explored. MS algorithm, PF algorithm and CS algorithm are used for comparative experiments. The results show that the tracking error of PF algorithm fluctuates between 0.4 and 0.75. The tracking error of CS algorithm fluctuates from 0.7 to 0.4. The tracking error of MS algorithm fluctuates between 0.2 and 0.5. The tracking error of KF algorithm fluctuates between 0.25 and 0.15. The average tracking errors of the four algorithms are 0.61, 0.55, 0.36 and 0.19, respectively. In comparison with the results of tracking accuracy and time consumption, the four algorithms all tend to be stable with the increase of iterations, among which, the tracking accuracy of motion detection target based on KF algorithm is stable at 98.96%, and the tracking accuracy of MS algorithm, CS algorithm and PF algorithm is 89.32%, 88.16% and 76.59%, respectively. The motion detection target tracking method based on KF algorithm consumes more reasonable time, and the response time is very short, with an average of 0.01s. The time consumption curve of PF algorithm fluctuates greatly, and the response time is long, which is 0.79s. The time consumption curve corresponding to MS algorithm fluctuates little, and the response time is 0.68s. The average consumption time of CS algorithm is 0.63s. In summary, the KF method proposed in this study has excellent effect on real-time tracking, which can not only ensure tracking accuracy, but also ensure running stability.
Finally, in order to test the effect of the classification method of abnormal posture of the elderly based on CNN, the study conducted experiments through self-made data sets. The results show that the recognition error rate of the CNN model shows a decreasing trend with the increase of training times. The recognition error rate on the training set decreases to 8.57% after about 120 training times, and remains basically unchanged after 90 training times on the test set. The identification error is reduced to 12.49% and no longer changes. The error rate on the verification set increased after 370 times, indicating that the performance of the model has reached the best and the training can be stopped. In the comparison experiment with SVM and RNN, the classification accuracy of CNN model is the highest (95.87%), and the recognition accuracy of RNN is the worst (86.29%). Based on the above results, it can be concluded that the posture recognition of the elderly based on CNN model can obtain accurate results efficiently and meet the goal of finding abnormal behaviors of the elderly within 1 minute in practical applications.
Conclusion
As the ageing population continues to increase, the concern for health care of the elderly is becoming more and more important. In order to reduce the risky behavior of the elderly, this study implements an enhanced GMM model and KF tracking technology to detect and track the daily behavior of the elderly in real time. This is achieved via the extraction of the elderly’s posture features, utilizing a CNN model to identify any abnormal behavioral posture. The results of the experiments show that the fusion of the GMM and the four-frame difference method has improved the completeness and accuracy of the detection results, and there is basically no shadowing; the error value of the KF tracking technique is 0.19, with minimal fluctuation, which can achieve better real-time tracking of motion targets; The tracking results for motion targets show that the real-time tracking accuracy and stability are excellent. The training results for recognition of poses among the elderly revealed that the CNN model maintained a consistent recognition error rate of approximately 8.57% on the training set. Compared with other classification models, the classification accuracy of the CNN was as high as 95.87%, enabling efficient and accurate pose recognition. In summary, the method proposed in the study effectively improves the accuracy of moving object detection in different background environments, and can achieve accurate real-time tracking of detected moving objects. It is suitable for practical elderly health detection applications and has extremely important application value. However, there are still limitations to research. Firstly, when there are multiple detection targets, tracking techniques may experience tracking delays and loss of person occlusion tracking. Secondly, there is still some room for improvement in the recognition rate of the proposed method. Therefore, in future research, it is possible to focus on exploring the overlapping and occlusion of multiple moving targets, and further improve recognition accuracy by using more samples and advanced technologies.
