Using posture recognition algorithms based on machine learning to identify senior health

Abstract

Faced with the situation that the elderly people at home have dangerous behaviors, the study explores various aspects of motion target detection, real-time target tracking and behavioral pose recognition and classification, using behavioral poses in videos as samples. To tackle the challenges in detecting motion targets, a target detection method based on Gaussian mixture model (GMM) and four frame difference method is proposed; A tracking technique incorporating Kalman filter (KF) is investigated to trail the behavioral changes of the elderly in actual time. A seven-layer convolutional neural network (CNN) is constructed to face the problem of inaccurate behavioral pose recognition. Through relevant experimental analyses, the outcomes show that the increased GMM detection way has a complete profile and the accuracy is significantly improved. The KF target tracking technique can trail the object trajectory in actual time and steadily, with the smallest trailing error value of 0.19. The classification accuracy of the CNN pose recognition model is 95.87%, and the pose classification time is 27 seconds. Its performance is superior to the mean shift algorithm, particle filter algorithm, and Cam Shift algorithm in all aspects. When applied in practice, it can accurately identify whether the elderly’s behavior is abnormal and ensure their daily health.

Keywords

Abnormal poses target tracking convolutional neural networks gaussian mixture models kalman filtering

1. Introduction

With the advancement of medical care, the average age of our population has increased significantly. The proportion of the aging population has gradually increased, with relevant data showing that the current number of elderly people in China exceeds 300 million [1]. As the physical quality of the elderly continues to decline, their movements become slower and their memory gradually declines. They become highly vulnerable to injury in the event of a fall, with very serious consequences. Most of the elderly live alone, and if they are inadvertently injured, they are often unable to be treated in time, causing great harm to their health [2]. In the face of the above problems, real-time detection and tracking of abnormal behavior in the elderly, health testing of the elderly, and issuing warnings are of great significance for timely rescue of elderly people in danger [3]. The elderly’s health detection and recognition methods can determine their health status and risk factors in real time. When the elderly engage in unhealthy behaviors, they can accurately identify their current status and promptly remind their families or hospitals to achieve early detection and intervention of potential health problems in the elderly. This can help them maintain good physical health and quality of life. Current target detection techniques have certain shortcomings, and the study combines Gaussian Mixture Model (GMM) with the four-frame difference method for motion target detection; a mainstream motion target tracking technique is introduced for the sake of achieving actual time trailing of the elderly’s behavior. On the basis of the detection and tracking of moving targets, a method for the manual extraction of features to recognize abnormal human poses was initially proposed. As technology progressed, the use of machine learning algorithms for classification training has significantly improved the accuracy rate for recognizing human behavior [4]. Convolutional Neural Network (CNN) can be applied to a variety of scenes and pose detection and recognition, and stands out among the elderly pose recognition models; therefore, the study constructs CNN elderly pose classification and recognition models. It is expected that the researched motion target detection, tracking and pose recognition techniques can be effectively applied in elderly health detection, ultimately leading to enhanced elderly safety. The research aims to conduct real-time detection, tracking, and classification of abnormal behaviors that affect the health of the elderly, so that they can receive timely assistance in case of danger and provide better protection for their safety. The above research has important practical application significance, helping to improve the quality of life and safety level of elderly people, and providing strong decision-making and intervention support for social workers and medical personnel. This has great application value. The research structure is mainly divided into four parts. The first part is a review of relevant research results; The second part is the design of a moving object detection algorithm that integrates Gaussian mixture model and four frame difference method, a real-time tracking algorithm for detected objects based on Kalman filter, and an improved CNN elderly pose classification model; The third part is to verify the effectiveness and feasibility of the proposed methods in the study; The paper concludes with a summary of the research.

2. Related work

With the growing elderly population, attention has shifted towards their health concerns. Iazzi and other researchers studied a detection system for elderly people’s fall posture, utilizing a support vector machine to classify abnormal behaviors. The detection outcomes on the public data set show that the detection system has a low error rate and can effectively detect the elderly’s fall posture [5]. The research team of Liaqat uses sensor equipment to monitor the daily activities of the elderly, and uses a variety of classification algorithms to classify the behaviors of the elderly. Relevant outcomes show that the accuracy of the classification algorithm is more than 90%, surpassing that of the comparison algorithm [6]. To improve the sleep quality of the elderly, Tang et al. proposed an electronic mattress to monitor the sleep posture and sleep quality of the elderly, using CNN to analyze various sleep postures of the elderly, and experimental outcomes show that the effectiveness of this approach, with classification accuracy at 90% [7]. Tay CZ et al. proposed a novel unlabeled gait estimation and tracking algorithm to reduce the risk of serious injuries during exercise. This algorithm can automatically capture human joints for posture evaluation and analyze human motion. The proposed system is implemented on the Intel Up Squared board and can achieve 9 frames per second, with a gait recognition accuracy of 95% [8]. Huang S et al. designed a subway personal ticket evasion behavior detection method based on skeleton sequence and time series to alleviate the adverse effects of subway station ticket evasion behavior. The experimental results showed that the proposed method can effectively identify human body status and detect personal ticket evasion behavior, including jumping and squatting [9].

Divya scholars used a posture prediction system for the identification of abnormal activities of the elderly, achieving the purpose of real-time target detection as well as abnormal activity classification. Performance test results showed that the system can handle different types of poses in different environments. Moreover, the system exhibited good anti-interference capability for environmental changes [10]. Mandischer N et al. proposed a multi-pulse detection system based on laser and radar sensors to address the limited field of view of target objects in practical complex scenarios. The proposed tracking pipeline was trained and widely validated on a new dataset, and the results confirmed that the radar tracker achieved state-of-the-art performance, with laser and fusion trackers outperforming recent methods [11]. The research team of Berlin used convolutional filters to detect and rank poses by calculating image scores. Comparative experimental outcomes show that the detection way studied has practical applications and is highly feasible [12]. Su scholars combined radar technology to detect human motion characteristics, using signal waves generated by the target for feature analysis. Relevant data showed that the technique can accurately detect postural changes during falls, as well as fall warnings [13]. Nour et al. used a posture estimation system to detect the daily life of elderly people living alone and to perform elderly health care interventions. The results of the study indicate that the system has potential to effectively monitor the safety of elderly individuals and can be applied in the healthcare field [14].

Through a brief review of the results of domestic and international research scholars, it is found that health monitoring for the elderly has attracted the attention of many scholars, of which posture classification and recognition is an indispensable part. Among many classification algorithms, CNN is widely used. Therefore, the study applies CNN to identify abnormal postures in the elderly, aiming to make an important contribution in the field of elderly health detection by detecting dangerous behaviors in a timely manner.

3. Research on the detection and identification of abnormal human posture in health testing of the elderly

3.1 Moving object detection algorithm incorporating GMM and four-frame difference method

As the physical quality of the elderly decreases, once they fall and other behaviors often cause physical damage, it is crucial to detect abnormal behavior of the elderly and give a distress signal in time. Before abnormal pose recognition, the motion target needs to be detected first [15, 16, 17]. Several commonly used object detection ways are optical flow way, background difference way, inter-frame difference way, Gaussian mixture model, etc. Compared to other methods, GMM has greater advantages, and the study optimizes it and applies it to motion target detection [18, 19, 20]. GMM is a combination of Gaussian models with different parameters according to set thresholds, firstly, background modeling is performed, and a pixel point of a video image is set to be represented by $L$ Gaussian distribution, and any color of a pixel point at $k$ is taken as $X_{k}$ , then the sequence expression of the pixel at that point is $\left\{{X_{1},\ldots X_{k}}\right\}=\left\{{H\left({x,y,k}\right)}\right\}$ , $H\left({x,y,k}\right)$ is the video graph information, and $X_{k}$ the probability density is calculated as in Eq. (1).

$\displaystyle P\left({X_{k}}\right)=\sum\limits_{j=1}^{L}{\omega_{j,k}% \varepsilon\left({X_{k},U_{j,k},\sum\nolimits_{j,k}}\right)}$ (1)

In Eq. (1), $\omega_{j,k}$ is the weight of the $j$ Gaussian distribution at $k$ and satisfies $\sum_{j=1}^{L}{\omega_{j,k}}=1$ ; $\varepsilon\left({X_{k},U_{j,k},\sum_{j,k}}\right)$ represents the probability density of the $j$ Gaussian distribution at $k$ , and its expression is Eq. (2).

$\displaystyle\varepsilon\left({X_{k},U_{j,k},\sum\nolimits_{j,k}}\right)=\frac% {1}{\left({2\pi}\right)^{\varepsilon/2}\left|{\sum_{j,k}}\right|^{1/2}}e^{-% \frac{1}{2}\left({X_{k}-U_{j,k}}\right)^{T}}\sum\nolimits_{j,k}\left({X_{k}-U_% {j,k}}\right)^{-1}$ (2)

In Eq. (2), $\sum_{j,k}$ represents the covariance of the $j$ Gaussian distribution at the time of $k$ , $U_{j,k}$ denotes the mean of the $j$ Gaussian distribution at the time of $k$ , $e$ is a constant and $T$ represents the transpose. In practice, each color channel is independent of the other and the covariance matrix is calculated as Eq. (3) for a one-dimensional distribution.

$\displaystyle\sum_{j,k}=\sigma_{j,k}^{2}H$ (3)

In Eq. (3) $\sigma_{j,k}^{2}$ represents the variance of the $j$ Gaussian distribution at the time of $k$ . $n L$ The mean and variance of all the pixel points of $x_{k}$ are set to $u_{0}$ and the variance to $\sigma_{0}^{2}$ . The mean and variance are calculated in Eq. (4).

$\displaystyle\left\{{\begin{array}[]{l}u_{0}=\frac{1}{n}\sum\limits_{j=0}^{n-1% }{x_{k}}\\ \sigma_{0}^{2}=\frac{1}{n}\sum\limits_{j=0}^{n-1}{\left[{x_{k}-\mu_{0}}\right]% ^{2}}\\ \end{array}}\right.$ (4)

For the sake of distinguishing the background and foreground of the graph, the ratio of the pixel points ${\omega_{j,k}}\mathord{\left/{\vphantom{{\omega_{j,k}}{\sigma_{j,k}}}}\right.% \kern-1.2pt}{\sigma_{j,k}}$ is used to sort, the larger the ratio, the smaller the difference of the pixel, the larger the weight, the larger the proportion in the GMM, the existence of matching models, the pixel point is more likely to be the background model, and vice versa the pixel point is the foreground pattern. If there are disturbing factors such as noise in the background model, it may not be accurately judged as a background model. This is because the disturbing factors do not exist for a long time and the weight value is small, the weight threshold can be set in the model judgment process $\delta$ , which can reduce the amount of operations to a certain extent [21, 22]. The first $A$ distributions of all Gaussian distributions are selected according to the threshold value. The first $A$ distributions are used to indicate the image background, while the remaining part of the Gaussian distribution indicates the foreground. Equation (5) displays the formula utilized for selecting the Gaussian distribution.

$\displaystyle A=\arg_{j}\min\left\{{\sum\limits_{j=1}^{L}{\omega_{j,k}}>\delta% }\right\},0.5<\delta<1$ (5)

For video, each frame needs a background distinction. The GMM analyses whether the pixel values satisfy the first $A$ Gaussian distribution according to Eq. (5), and if so, matches the background pattern and updates the background pattern by substituting the new pixel values into Eq. (6).

$\displaystyle\left\{{\begin{array}[]{l}\omega_{j,k}=\left({1-\lambda}\right)% \omega_{j,k-1}+\lambda\\ U_{j,k}=\left({1-\nu}\right)U_{j,k-1}+\nu X_{k}\\ \sigma_{j,k}^{2}=\left({1-\nu}\right)\sigma_{j,k-1}^{2}+\nu\left({X_{k}-U_{j,k% -1}}\right)^{T}\left({X_{k}-U_{j,k-1}}\right)\\ \nu=\lambda/\omega_{j,k}\\ \end{array}}\right.$ (6)

In Eq. (6), $\lambda$ is the learning factor of the GMM, which takes the value range [0,1], and $\nu$ is used to describe the update rate of the background pattern. For the testing of moving targets, the GMM pattern construction time for the background becomes longer, and at the same time, the presence of light during the detection process can misjudge the generated target shadow as a moving target for detection, resulting in a decrease in target detection accuracy and reliability. In order to solve the problems of GMM, the GMM motion target detection method combined with the four-frame difference method is investigated. The four-frame difference method extracts four consecutive frames from the video, i.e. $f_{m-2}\left({a,b}\right),f_{m-1}\left({a,b}\right),f_{m}\left({a,b}\right),f_% {m+1}\left({a,b}\right)$ , then the first and second frames are differenced to obtain the difference image $G_{1}\left({a,b}\right)$ , and the third and fourth frames are differenced in the same way to obtain $G_{2}\left({a,b}\right)$ . There is noise in the graph, and binarisation of the graph can eliminate the effect of interference.

$\displaystyle f\left({a,b}\right)=\left\{{\begin{array}[]{l}1,G\left({a,b}% \right)>I\\ 0,\textit{else}\\ \end{array}}\right.$ (7)

In Eq. (7), $I$ is the noise interference threshold. The binarized differential image is inflated to obtain $g_{1}\left({a,b}\right)$ and $g_{2}\left({a,b}\right)$ . The logic operation between the inflated images is performed to obtain $g\left({a,b}\right)$ . Then the edge features of the first $M$ frame are extracted, and the extracted results are binarized to obtain $r\left({a,b}\right)$ , and $r\left({a,b}\right)g_{1}\left({a,b}\right)$ and $g_{2}\left({a,b}\right)$ respectively to obtain $s_{1}\left({a,b}\right)$ and $s_{2}\left({a,b}\right)$ . The same logical operation is performed on $s_{1}\left({a,b}\right)$ and to obtain $s_{2}\left({a,b}\right)s\left({a,b}\right)$ . The logical operation of $g\left({a,b}\right)$ and $s\left({a,b}\right)$ results in the detection of a moving target at $O\left({a,b}\right)$ . The whole operation process is shown in Eq. (8).

$\displaystyle\left\{{\begin{array}[]{l}g\left({a,b}\right)=g_{1}\left({a,b}% \right)\&\ g_{2}\left({a,b}\right)\\ s_{1}\left({a,b}\right)=r\left({a,b}\right)\&\ g_{1}\left({a,b}\right)\\ s_{2}\left({a,b}\right)=r\left({a,b}\right)\&\ g_{2}\left({a,b}\right)\\ s\left({a,b}\right)=s_{1}\left({a,b}\right)\left|{s_{2}\left({a,b}\right)}% \right.\\ O\left({a,b}\right)=g\left({a,b}\right)\left|{s\left({a,b}\right)}\right.\\ \end{array}}\right.$ (8)

The process of detecting a moving target using the GMM and the four-frame difference method is shown in Fig. 1. The initial image sequence is obtained and denoised. Then, a background model is obtained using the GMM, and the detection area is obtained using the four-frame difference method.

Figure 1.

Flow Chart of GMM combined with four frame difference method for target detection.

3.2 Kalman filter-oriented algorithm for real-time tracking of detection targets

Among the numerous methods for target tracking, Kalman Filtering (KF) can achieve actual time tracking and detection of multiple targets with a relatively simple operation process, and is used to trail the behavior of the aged in actual time [23, 24, 25]. The state equation and the observation equation are shown in Eq. (9).

$\displaystyle\left\{{\begin{array}[]{l}X_{i}=a_{i,i-1}X_{i-1}+cu_{i}+W_{i}\\ Z_{i}=h_{i}X_{i}+V_{i}\\ \end{array}}\right.$ (9)

In Eq. (9), $X_{i}$ and represent the state values of the system at $X_{i-1}$ $i$ and $i-1$ , $a_{i,i-1}$ is the transition matrix from $i-1$ to $i$ , $c$ represents the matrix, $u_{i}$ represents the control value at $i$ , $W_{i}$ represents the system noise, $Z_{i}$ is the reconnoitre value at $i$ , $h_{i}$ represents the reconnoitre matrix, and $V_{i}$ is the reconnoitre system noise. After obtaining the state and observation values before $i$ , the state is predicted for the subsequent moments, and the state is corrected by using the difference error equation. In the process of tracking a moving target, KF analyzes the detected motion target area to obtain feature values, constructs an image prediction model and uses the image pattern to forecast the location of the object motion at the next moment. The images at the new position are matched according to the features. If the match is accurate, the tracking results are displayed and the target model is updated for the next frame of target tracking. The whole process is shown in Fig. 2.

Figure 2.

Principle of KF real-time target tracking.

When predicting the target’s position using the KF, due to the small difference between the image changes in the upper and lower frames of the video, the object movement process in the video can be treated as a uniform linear motion. To mark the motion range of the tracking target, an external rectangular box is used, and the motion formula for detecting the target is presented in Eq. (10).

$\displaystyle\left\{{\begin{array}[]{l}S_{i}=S_{i-1}+\Delta t\cdot v_{i-1}\\ v_{i}=v_{i-1}\\ \end{array}}\right.$ (10)

In Eq. (10), $\Delta t$ is the tracking time range, $v_{i}$ represents the velocity of the target, $S_{i-1}$ represents the trajectory flow of the moving target at $i-1$ and the centre coordinates of the target position at $i$ are represented by $S_{i}\left({x\left(i\right),y\left(i\right)}\right)$ . The centre position of the rectangular box and the variation of the length and width are considered observations, and the observation equation is $Z_{i}=\left[{x_{z}\left(i\right),y_{z}\left(i\right),w_{z}\left(i\right),h_{z}% \left(i\right)}\right]$ . Assuming that all noise follows a Gaussian distribution and any changes do not alter the system’s state, the state-space and reconnoitre equation for the detected target are given in Eq. (11).

$\displaystyle\left\{{\begin{array}[]{l}\left[{{\begin{array}[]{*{20}c}{x\left(% i\right)}\hfill\\ {y\left(i\right)}\hfill\\ \end{array}}}\right]=\left[{{\begin{array}[]{*{20}c}\overline{}1\hfill&0\hfill% &{\Delta t}\hfill&0\hfill\\ 0\hfill&1\hfill&0\hfill&{\Delta t}\hfill\\ \end{array}}}\right]\left[{\begin{array}[]{l}x\left(i\right)\\ y\left(i\right)\\ w\left(i\right)\\ h\left(i\right)\\ \end{array}}\right]+W\left({i-1}\right)\\ \\ \left[{\begin{array}[]{l}x_{z}\left(i\right)\\ y_{z}\left(i\right)\\ w_{z}\left(i\right)\\ h_{z}\left(i\right)\\ \end{array}}\right]=\left[{{\begin{array}[]{*{20}c}1\hfill&0\hfill&0\hfill&0% \hfill\\ 0\hfill&1\hfill&0\hfill&0\hfill\\ 0\hfill&0\hfill&1\hfill&0\hfill\\ 0\hfill&0\hfill&0\hfill&1\hfill\\ \end{array}}}\right]\left[{\begin{array}[]{l}x\left(i\right)\\ y\left(i\right)\\ w\left(i\right)\\ h\left(i\right)\\ \end{array}}\right]+V\left({i-1}\right)\\ \end{array}}\right.$ (11)

In Eq. (11), $w\left(i\right)$ and $h\left(i\right)$ denote the width and height of the rectangular frame. The process of tracking achieves real-time tracking of the moving target by predicting and correcting the centre of the detected target and the dimensions of the rectangular box. After obtaining the feature values of the moving target, the feature points such as the size and centre of the external rectangular frame are matched by the rectangular frame area size and the centre point matching formula as shown in Eq. (12).

$\displaystyle\left\{{\begin{array}[]{l}\Delta S=\left|{S_{i}-S_{j}}\right|\\ S_{i,j}=\frac{\Delta S}{S_{j}}\\ \frac{\Delta S}{S_{i,j}}<\varsigma_{S}\\ D_{i,j}=\sqrt{\left({x_{i}-x_{j}}\right)^{2}+\left({y_{i}-y_{j}}\right)^{2}}<% \varsigma_{D}\\ \end{array}}\right.$ (12)

In Eq. (12), $\Delta S$ is the area difference between the rectangular frames of $S_{i}$ and $S_{j}$ , $D_{i,j}$ is the distance between the centroids of the two frames, and $\varsigma_{S}$ and $\varsigma_{D}$ represent the thresholds for the area and distance parameters respectively. When the area and distance measurements of two images fall below their respective thresholds, it means that the two targets are matched and the targets tracked in both images are the same. After the tracking results are displayed, the existing target features are replaced to form a new target template, and the moving target is tracked for the next frame. This process is repeated continuously to achieve real-time surveillance video target tracking.

3.3 CNN-based classification and detection of abnormal poses of the elderly

After detecting and tracking the behavior of elderly people in surveillance videos, they need to be classified and processed to identify whether abnormal behaviors such as falling or difficulty in getting up and lying down occur [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]. The extraction of human behavioral features is a crucial step in describing an individual’s posture and movements. Convolutional neural networks are the most common method of feature extraction, with forward propagation in order to calculate the actual and expected error, and the error values are back-propagated to correct the weight parameters layer by layer, ultimately achieving the minimum error [28, 29]. A neural network includes three basic structures of import, implication and output. Let the import signal data be $x=\left({x_{1},x_{2},\ldots,x_{n}}\right)^{T}$ , the import, implicit and output of the model exist $p$ , $p_{1}$ and $q$ neurons respectively, the output signal of the implicit layer of the model is $x^{\prime}=R^{p_{1}}$ and the signal of the output layer is $y\in R^{q}$ , then the output expression of the implicit layer and output layer of the $n$ neuron is Eq. (13).

$\displaystyle\left\{{\begin{array}[]{l}x_{n}^{\prime}=f\left({\sum\limits_{i=1% }^{p}{w_{ij}x_{i}-\theta_{n}}}\right),n=1,2,\ldots p_{1}\\ y_{k}=f\left({\sum\limits_{n=1}^{p_{1}}{w_{ij}^{\prime}x_{n}^{\prime}-\theta_{% k}^{\prime}}}\right),k=1,2,\ldots,q\\ \end{array}}\right.$ (13)

In Eq. (13), $w_{\textit{ij}}$ represents the weight values between the import and hidden, $w_{\textit{ij}}^{\prime}$ represents the weight between the hidden and output, $x_{i}$ is the output of the input neuron $i$ , $\theta_{k}$ and $\theta_{k}^{\prime}$ represent the threshold between the import and hidden, and the threshold between the hidden and output respectively, and $f\left(x\right)=\frac{1}{1+e^{-x}}$ is a Sigmoid function. The error value of the $k$ th sample calculated in the reverse error correction process is Eq. (14).

$\displaystyle E_{k}=\frac{1}{2}\sum\limits_{n=1}^{q}{\left({y_{\textit{nk}}-T_% {\textit{nk}}}\right)^{2}}$ (14)

In Eq. (14), $T_{\textit{jk}}$ is the desired output of the $n$ neuron and $y_{\textit{jk}}$ is the real output of the $n$ neuron. The correction formula for the connection weights of the neurons in each layer is shown in Eq. (15).

$\displaystyle\Delta w\left({k+1}\right)=\Delta w\left(k\right)-lr\times g\left% (k\right)$ (15)

In Eq. (15), lr represents the learning efficiency, $g\left(k\right)$ represents the local gradient vector, the expression is $g\left(k\right)=\frac{\partial E\left(w\right)}{\partial w}\left|{{}_{w=w\left% (k\right)}}\right.$ and $E$ is the error function. The CNN model comprises several layers, including import, convolutional, pooling, fully connected, and output. By sharing weights between layers, network parameters are reduced to accelerate model learning. Each layer encompasses multiple trainable feature vectors. Meanwhile, convolution check is used to input feature vectors for feature extraction. Its structure diagram is shown in Fig. 3. C1 and C3 are convolution, S2 and S4 are pooling, and F5 is full connection.

Figure 3.

Structure diagram of CNN.

In the convolutional layer, the order of image information is defined by the convolutional kernel based on a certain pixel size. Logical operations are then executed on the elements of the filter, with the resulting calculation outputting the pixel value of the image post-convolution. The convolutional layer can perceive the image information locally and collate it to obtain the overall information. This approach can reduce the number of network parameters and lower the operational burden of the algorithm [30]. At the same time, the connection weight parameters of the neurons remain the same, and for a given input image information, each neuron only detects one feature point, which further reduces the complexity of the operation. To ensure that the overall information of the image can be captured, often the convolutional layer will contain multiple convolutional kernels, which can be used to also extract feature information for detecting different images and computing different feature planes. The pooling layer takes the average or maximum value of the same pixels detected by the convolutional kernels for subsequent operations. The pooling operation merges data and reduces image complexity, while accurately extracting crucial information. Thus, it decreases image dimensionality and simplifies pattern operations.

Figure 4.

Operation process of the convolution layer and pooling layer.

After performing convolution and pooling operations, the fully connected layer performs a comprehensive analysis of the extracted relevant features to ensure the minimum amount of information loss and transmits the final result to the output layer.

4. Performance analysis and application of human abnormal posture detection and recognition methods in elderly health detection

4.1 Performance analysis and application of improved motion target detection algorithm

For the sake of analyzing the function of the improved motion object testing way, which combines the four-frame difference method with GMM, the study intercepts the same frame in the same video for target detection and compares it with the four-frame difference method, GMM and the improved algorithm. Figure 5 shows the target detection results of multiple algorithms in an indoor setting.

Figure 5.

Target detection results of several detection algorithms in indoor environment.

As can be seen from the human fall image in Fig. 5, the indoor environment has few shadows, allowing many detection algorithms to capture a more complete human silhouette. For Fig. 5(b) it can be seen that the GMM has some shortcomings in the background and foreground distinction, and will identify the part of the background image as the target result, with the worst recognition accuracy and completeness among several algorithms. Figure 5(c) shows the detection outcomes of the four-frame difference. Compared with Fig. 5(b), the detection accuracy has notably increased, the anti-interference ability is stronger, and the human contours are more complete, but there are some voids in the detection results. The test outcomes of improved GMM are shown in Fig. 5(d), where the testing target is complete and the interference noise and hollowness are significantly improved, which further enhances the detection accuracy. From a quantitative perspective, the recognition accuracy of GMM is 88.63%, which has the advantage of relatively complete results and certain anti-interference ability, while the disadvantage is low computational efficiency; The target detection accuracy rate of the four frame difference method is 92.17%, which has the advantages of low computational complexity and is suitable for moving targets in constantly changing environments. However, its disadvantages include incomplete external contours and unstable target range. The detection accuracy of the four frame difference method combined with GMM is 98.97%, which has strong anti-interference noise ability, reduces the occurrence of voids, and the detection target contour is very complete. The disadvantage is that when multiple targets overlap, the detection accuracy will be affected.

For the sake of analyzing the function of the three detection ways in the presence of multiple objects under the interference of light, the study was carried out outdoors for detection and identification. The outcomes are shown in Fig. 6.

Figure 6.

Detection results of three outdoor detection algorithms.

As shown in Fig. 6, the shadowed parts of the target detection results of the GMM and the four-frame difference method are reduced in the presence of light interference; In the case of multiple targets, several algorithms are able to detect them accurately. Figure 6(b) shows the worst detection results, with the foreground, background and shadow parts not being detected, and the human silhouette being relatively intact. The four-frame difference method in Fig. 6(c) is more complete, but there are cases where the shaded parts are detected incorrectly and there is more noise, which affects the accuracy of the results. The results from detection in Fig. 6(d) are comprehensive with minimal voids or shadows. The algorithm has produced significantly better target detection results compared to the comparison algorithm. High resolution images can provide more accurate motion data through motion markers, thereby improving the accuracy of motion capture. By integrating the effects of different algorithms in capturing human contours in indoor and outdoor environments, different algorithms have better results in capturing human contours in indoor environments. This is due to the relatively stable lighting conditions and low background complexity in indoor environments. In outdoor environments, the lighting conditions are more complex and variable, and the outdoor environment also contains many elements such as trees, buildings, pedestrians, etc. Therefore, different algorithms have poor capture effects in outdoor environments. Based on the above results, it can be concluded that the algorithm combining GMM and four frame difference method effectively avoids the shortcomings of GMM, and the completeness and accuracy of the detection results have been effectively improved.

4.2 Analysis of the effectiveness of KF-based real-time tracking of detection targets

In order to analyze the performance of moving object tracking based on KF, different algorithms were studied for object tracking in various scenarios, and each scenario requires different algorithms to be presented. In order to more scientifically verify the performance of the proposed method, the study selected current mainstream tracking algorithms for comparative experiments, such as Mean Shift (MS) algorithm, Particle Filter (PF) algorithm, and Cam Shift (CS) algorithm. The real-time tracking error effects corresponding to different algorithms are shown in Fig. 7.

Figure 7.

Tracking accuracy results of four algorithms.

From Fig. 7, the trailing error of the PF fluctuates between 0.4 and 0.75; the trailing error of the CS algorithm fluctuates between 0.7 and 0.4; The tracking error of the MS algorithm fluctuates between 0.2 and 0.5; and the tracking error of the KF algorithm fluctuates between 0.25 and 0.15. The average tracking errors of the four algorithms are 0.61, 0.55, 0.36 and 0.19 respectively. The fluctuation range and the average error show that the PF algorithm has the largest tracking error, while the KF algorithm has the smallest. Meanwhile, KF has the best tracking stability and can achieve better real-time tracking effects of moving targets in different scenarios. In the above results, different algorithms exhibit significant differences in tracking errors, but the fluctuations in tracking errors are relatively smooth. This indicates that the differences in tracking objects and actions will to some extent affect the tracking performance of the algorithm, but the impact is not sufficient to cause significant changes in tracking errors. By integrating the effects of different algorithms in capturing human contours in indoor and outdoor environments, different algorithms have better results in capturing human contours in indoor environments. This is due to the relatively stable lighting conditions and low background complexity in indoor environments. In outdoor environments, the lighting conditions are more complex and variable, and the outdoor environment also contains many elements such as trees, buildings, pedestrians, etc. Therefore, different algorithms have poor capture effects in outdoor environments. To further explore the performance of the proposed method, the study evaluated tracking accuracy and time consumption. The tracking accuracy results of different algorithms are shown in Fig. 8.

Figure 8.

Comparison of tracking accuracy results of different algorithms.

From Fig. 8, it can be observed that all four algorithms tend to stabilize as the number of iterations increases. The tracking accuracy of KF algorithm-based motion detection targets remains constant at 98.96%, while the tracking accuracy of MS, CS, and PF algorithms are 89.32%, 88.16%, and 76.59%, respectively. This indicates that the tracking performance of motion detection targets based on KF algorithm proposed in the study is superior to other algorithms. In general, the current tracking methods are described using an accuracy chart, and research also evaluates different algorithms using time consumption. Figure 9 illustrates the specific results.

Figure 9.

Comparison of tracking time consumption results of different algorithms.

From Fig. 9, it can be seen that the KF algorithm based motion detection target tracking method has a more reasonable consumption time and a very short response time, averaging 0.01 seconds; The time consumption change curve of the PF algorithm fluctuates greatly and the response time is relatively long, at 0.79 seconds. The time consumption change curve corresponding to the MS algorithm fluctuates slightly, but the response time is relatively long, at 0.68 seconds, which needs further optimization; The average time consumption corresponding to the CS algorithm is 0.63 seconds. The above results confirm that the KF algorithm has better tracking performance and shorter consumption time while maintaining high accuracy. To verify the effectiveness of KF-based motion detection target tracking in real time, a video of a person moving around indoors was selected for the study and three random frames were taken to show the tracking results, which are shown in Fig. 10.

Figure 10.

Target tracking results of the KF algorithm.

Figure 10 shows the trailing outcomes of the KF for a moving target. The three images are the 18th, 26th and 34th frames of the captured video. The observed data consistently shows that the algorithm can accurately track the target in real-time as it moves, with no tracking errors within the rectangular frame. This demonstrates the algorithm’s ability to achieve stable and precise tracking of moving targets.

4.3 Research on CNN-based classification and detection of abnormal postures of the elderly

In order to achieve the recognition and detection of abnormal behavior of the elderly, homemade images of people’s behavior were selected as the dataset, and the CNN model was trained and tested. The detection model constructed was a 2-layer convolutional, a 2-layer pooling, a 1-layer fully-connected, and import and output layers. The specific network structure parameters are shown in Table 1.

Table 1
Parameter setting of CNN inspection model

Layer no	Layer type	Number of training	Size
1	Input layer	/	32*32
2	C1 convoluted layer	36	5*5
3	S2 Pooling layer	12	2*2
4	C3 convoluted layer	116	5*5
5	S4 Pooling layer	32	2*2
6	F5 full connection layer	1064	1*1
7	Output layer	/	1*1

In the detection model, the convolutional layers are all 5*5 in size, which allows the full acquisition of image features at this size while the complexity of the model is small, and the image information is extracted accurately and efficiently. The alternation of convolution and pooling layers can ensures the transfer of sample information without translation or distortion.

The CNN model was trained on a self-made dataset to classify abnormal pose recognition in the elderly, and its model training results are shown in Fig. 11.

From Fig. 11, the recognition error rate of the CNN on different data sets showed a decreasing trend with increasing training times, where the recognition error rate on the training set decreased to 8.57% after about 120 training times and remained basically unchanged after 90 training times on the test set. Its recognition error dropped to 12.49% and then ceased to change. The error rate on the validation set does not decrease but increases after 370 attempts, indicating that the model’s performance has reached its optimal level and training can be stopped.

To verify the classification accuracy of the studied CNN-based pose detection model for abnormal behavior, Support Vector Machines (SVM) and Recurrent Neural Network (RNN) were selected for comparison experiments. The pose classification results of several algorithms are shown in Table 2.

Table 2

Pose classification of several algorithms

Algorithm	Classification accuracy	Running time	RMSE
CNN	95.87%	27 s	0.13
RNN	86.29%	41 s	0.29
SVM	90.75%	33 s	0.17

Figure 11.

CNN abnormal posture recognition results.

From the elderly pose classification and recognition results in Table 2, the CNN model has the highest classification accuracy of 95.87% and the RNN has the worst recognition accuracy of 86.29%. The model run time shows that CNN can efficiently complete the elderly pose recognition situation, and the error in the classification results likewise indicates the superiority of the CNN model. In addition, the running time of CNN is 27 seconds, and in actual situations, abnormal behavior and posture of the elderly need to be detected within 1 minute to facilitate subsequent handling of falling events. Therefore, this running time can better meet practical applications. Therefore, the use of CNN models for recognizing poses of the elderly can be effective in obtaining accurate results and provide an accurate basis for elderly health and safety issues.

5. Discussion

With the increasing aging of the population, the quality of life and safety of the elderly have attracted wide attention. The body function of the elderly is weak, prone to disease and accidental injury. Timely detection and prediction of abnormal behavior of the elderly can provide better protection for the safety of the elderly. Therefore, the research first used the improved GMM model and KF tracking technology to detect and track the daily behaviors of the elderly in real time. Then, by extracting the posture features of the elderly, the CNN model was used to identify whether the behavior and posture of the elderly were abnormal. Firstly, the performance of the improved moving target detection algorithm combined with the four-frame difference method and GMM is verified, and the target detection results of the four-frame difference method, Gaussian mixture model and the improved algorithm are compared. The experimental results show that under indoor conditions, the recognition accuracy of GMM is 88.63%, and it has certain defects in the distinction between background and foreground. It will recognize part of the background image as the target result, and the recognition accuracy and integrity are the worst among the several algorithms. The accuracy rate of the four-frame difference method’s target detection precision force is 92.17%, and its target detection accuracy rate is significantly improved, the anti-interference ability is strong, and the human body profile is relatively complete, but there are some void phenomena in the detection results. The detection accuracy of the four-frame difference method combined with GMM is 98.97%, and the detection target contour is complete, and the interference noise and void phenomenon are significantly improved, and the detection accuracy is further improved. However, under outdoor conditions, when there is light interference, there are shadow parts in the target detection results of GMM and four-frame difference method, and the recognition accuracy decreases. In the case of multiple targets, several algorithms can detect accurately; From the overall analysis, the detection results of the proposed four-frame difference method combined with GMM method are complete, with basically no self shadow and a few holes, and the target detection results are significantly better than other algorithms. In summary, the proposed method can achieve good detection results in both indoor environment and complex outdoor environment, effectively avoiding the shortcomings of GMM, and effectively improving the integrity and accuracy of detection results.

Secondly, the tracking performance of moving targets based on KF is analyzed, and the target tracking performance of algorithms in different scenarios is explored. MS algorithm, PF algorithm and CS algorithm are used for comparative experiments. The results show that the tracking error of PF algorithm fluctuates between 0.4 and 0.75. The tracking error of CS algorithm fluctuates from 0.7 to 0.4. The tracking error of MS algorithm fluctuates between 0.2 and 0.5. The tracking error of KF algorithm fluctuates between 0.25 and 0.15. The average tracking errors of the four algorithms are 0.61, 0.55, 0.36 and 0.19, respectively. In comparison with the results of tracking accuracy and time consumption, the four algorithms all tend to be stable with the increase of iterations, among which, the tracking accuracy of motion detection target based on KF algorithm is stable at 98.96%, and the tracking accuracy of MS algorithm, CS algorithm and PF algorithm is 89.32%, 88.16% and 76.59%, respectively. The motion detection target tracking method based on KF algorithm consumes more reasonable time, and the response time is very short, with an average of 0.01s. The time consumption curve of PF algorithm fluctuates greatly, and the response time is long, which is 0.79s. The time consumption curve corresponding to MS algorithm fluctuates little, and the response time is 0.68s. The average consumption time of CS algorithm is 0.63s. In summary, the KF method proposed in this study has excellent effect on real-time tracking, which can not only ensure tracking accuracy, but also ensure running stability.

Finally, in order to test the effect of the classification method of abnormal posture of the elderly based on CNN, the study conducted experiments through self-made data sets. The results show that the recognition error rate of the CNN model shows a decreasing trend with the increase of training times. The recognition error rate on the training set decreases to 8.57% after about 120 training times, and remains basically unchanged after 90 training times on the test set. The identification error is reduced to 12.49% and no longer changes. The error rate on the verification set increased after 370 times, indicating that the performance of the model has reached the best and the training can be stopped. In the comparison experiment with SVM and RNN, the classification accuracy of CNN model is the highest (95.87%), and the recognition accuracy of RNN is the worst (86.29%). Based on the above results, it can be concluded that the posture recognition of the elderly based on CNN model can obtain accurate results efficiently and meet the goal of finding abnormal behaviors of the elderly within 1 minute in practical applications.

6. Conclusion

As the ageing population continues to increase, the concern for health care of the elderly is becoming more and more important. In order to reduce the risky behavior of the elderly, this study implements an enhanced GMM model and KF tracking technology to detect and track the daily behavior of the elderly in real time. This is achieved via the extraction of the elderly’s posture features, utilizing a CNN model to identify any abnormal behavioral posture. The results of the experiments show that the fusion of the GMM and the four-frame difference method has improved the completeness and accuracy of the detection results, and there is basically no shadowing; the error value of the KF tracking technique is 0.19, with minimal fluctuation, which can achieve better real-time tracking of motion targets; The tracking results for motion targets show that the real-time tracking accuracy and stability are excellent. The training results for recognition of poses among the elderly revealed that the CNN model maintained a consistent recognition error rate of approximately 8.57% on the training set. Compared with other classification models, the classification accuracy of the CNN was as high as 95.87%, enabling efficient and accurate pose recognition. In summary, the method proposed in the study effectively improves the accuracy of moving object detection in different background environments, and can achieve accurate real-time tracking of detected moving objects. It is suitable for practical elderly health detection applications and has extremely important application value. However, there are still limitations to research. Firstly, when there are multiple detection targets, tracking techniques may experience tracking delays and loss of person occlusion tracking. Secondly, there is still some room for improvement in the recognition rate of the proposed method. Therefore, in future research, it is possible to focus on exploring the overlapping and occlusion of multiple moving targets, and further improve recognition accuracy by using more samples and advanced technologies.

References

Bhattacharjee

Biswas

. Smart walking assistant (SWA) for elderly care using an intelligent realtime hybrid model. Evolving Systems.2022; 13(2): 265-279.

Banerjee

Sharma

Soni

Kapil Dev Bansal

Mahajan

Khajanchi

Warnberg

Martin Gerdin Roy

. Are home environment injuries more fatal in children and the elderly? Injury.2022; 53(6): 1987-1993.

Pramod

. Assistive technology for elderly people: State of the art review and future research agenda. Science & Technology Libraries.2023; 42(1): 85-118.

Kuadey

Mahama

Ankora

Bensah

Maale

Agbesi

Adjei

. Predicting students’ continuance use of learning management system at a technical university using machine learning algorithms. Interactive Technology and Smart Education.2023; 20(2): 209-227.

Iazzi

Rziza

Oulad Haj Thami

. Fall detection system-based posture-recognition for indoor environments. Journal of Imaging.2021; 7(3): 42.

Liaqat

Dashtipour

Shah

Rizwan

Alotaibi

Althobaiti

Ramzan

. Novel ensemble algorithm for multiple activity recognition in elderly people exploiting ubiquitous sensing devices. IEEE Sensors Journal.2021; 21(16): 18214-18221.

Tang

Kumar

Nadeem

Maaz

. CNN-based smart sleep posture recognition system. IoT.2021; 2(1): 119-139.

Tay

Lim

Phang

JTS

. Markerless gait estimation and tracking for postural assessment. Multimedia Tools and Applications.2022; 81(9): 12777-12794.

Huang

Liu

Chen

Song

Zhang

Yang

Zhang

. A detection method of individual fare evasion behaviours on metros based on skeleton sequence and time series. Information Sciences.2022; 589: 62-79.

10.

Divya

Peter

. Smart healthcare system-a brain-like computing approach for analyzing the performance of detectron2 and PoseNet models for anomalous action detection in aged people with movement impairments. Complex & Intelligent Systems.2022; 8(4): 3021-3040.

11.

Mandischer

Hou

Corves

. Multiposture leg tracking for temporarily vision restricted environments based on fusion of laser and radar sensor data. Journal of Field Robotics.2023; 40(6): 1620-1638.

12.

Berlin

John

. Vision based human fall detection with Siamese convolutional neural networks. Journal of Ambient Intelligence and Humanized Computing.2022; 13(12): 5751-5762.

13.

Horng

Tang

. Hybrid continuous-wave and self-injection-locking monopulse radar for posture and fall detection. IEEE Transactions on Microwave Theory and Techniques.2022; 70(3): 1686-1695.

14.

Nour

Gardoni

Renaud

Gauthier

. Real-time detection and motivation of eating activity in elderly people with dementia using pose estimation with TensorFlow and OpenCV. Adv Soc Sci Res J.2021; 8(3): 28-34.

15.

Sakakibara

Tateno

Aiba

Sawai

Ogata

Terada

Katsuragawa

. Elderly people with gait disorder in Lewy body diseases, white matter diseases, and their combination: A neuroimaging-assisted analysis. Neurology and Clinical Neuroscience.2022; 10(1): 25-29.

16.

Campobasso

Di Cosola

Testa

Lacarbonara

Dioguardi

Lo Muzio

Cazzolla

. The influence of abnormal head posture on facial asymmetry. Journal Of Biological Regulators & Homeostatic Agents.2022; 36(2): 325-335.

17.

Sakakibara

Tateno

Aiba

Sawai

Ogata

Terada

Katsuragawa

. Elderly people with gait disorder in Lewy body diseases, white matter diseases, and their combination: A neuroimaging-assisted analysis. Neurology and Clinical Neuroscience.2022; 10(1): 25-29.

18.

Nguyen

TXB

Rosser

Perera

Moss

Teague

Chahl

. Characteristics of optical flow from aerial thermal imaging,“thermal flow”. Journal of Field Robotics.2022; 39(5): 580-599.

19.

Hoogerheide

Dura

Maranville

Majkrzak

. Low-background neutron reflectometry from solid/liquid interfaces. Journal of Applied Crystallography.2022; 55(1): 58-66.

20.

Yakoh

. Re-shooting Resistant Blind Watermarking framework based on feature separation with gaussian mixture model. IEEJ Transactions on Electrical and Electronic Engineering.2022; 17(4): 556-565.

21.

van Wonderen

Peters

Grey

Rajbhandary

de Jonge

Andrzejewski

, Jr Vlaar

. Standardized reporting of pulmonary transfusion complications: development of a model reporting form and flowchart. Transfusion.2023; 63(6): 1161-1171.

22.

Feng

Zhang

Peng

. Mean-risk model for uncertain portfolio selection with background risk and realistic constraint. Journal of Industrial & Management Optimization.2023; 19(7).

23.

Chen

Lee

. EMA-type trading strategies maximize utility under partial information[M]//Peter Carr Gedenkschrift: Research Advances in Mathematical Finance. 2024; 511-536.

24.

Khodabandehloo

Riboni

Alimohammadi

. HealthXAI: Collaborative and explainable AI for supporting early diagnosis of cognitive decline. Future Generation Computer Systems.2021; 116: 168-189.

25.

Yang

Wang

Qiao

Fernandez

. Fuzzy adaptive singular value decomposition cubature Kalman filtering algorithm for lithium-ion battery state-of-charge estimation. International Journal of Circuit Theory and Applications.2022; 50(2): 614-632.

26.

Duan

. Multi-scale residual aggregation feature network based on multi-time division for motion behavior recognition. International Journal of Computers and Applications.2023; 45(6): 452-459.

27.

Kong

Wang

Sun

. Design of care decision support system based on home-based behavior of elderly: a design science study. Sage Open.2022; 12(1): 21582440221086606.

28.

Song

Fan

. Behavior recognition of the elderly in indoor environment based on feature fusion of wi-fi perception and videos. Journal of Beijing Institute of Technology.2023; 32(2): 142-155.

29.

Jiang

Tian

Han

Huang

Luo

. Rapid nondestructive detecting of sorghum varieties based on hyperspectral imaging and convolutional neural network. Journal of the Science of Food and Agriculture.2023; 103(8): 3970-3983.

30.

Aljohani

Fayoumi

Hassan

. A novel focal-loss and class-weight-aware convolutional neural network for the classification of in-text citation. Journal of Information Science.2023; 49(1): 79-92.

Using posture recognition algorithms based on machine learning to identify senior health

Abstract

Keywords

1. Introduction

2. Related work

3. Research on the detection and identification of abnormal human posture in health testing of the elderly

3.1 Moving object detection algorithm incorporating GMM and four-frame difference method

4.1 Performance analysis and application of improved motion target detection algorithm

Table 1 Parameter setting of CNN inspection model

6. Conclusion

References

Table 1
Parameter setting of CNN inspection model