Abstract
Introduction
Video-based human activity recognition (HAR) is an area of immense interest for researchers because of applications in the smart home, elderly healthcare, and life care. 1 Increasing age brings radical changes in the daily life functioning, health, and social activities of elderly people. According to a United Nations report, 2 the proportion of older people is expected to reach 22% of the world's population by 2050. Elderly people spend most of their time living independently. The aim of this study is to propose a reliable HAR system to recognize potentially injurious abnormal activities and provide a protected living environment for elderly people at home. It is socially and economically more feasible to take care of elderly people at home compared with in healthcare centers. Automatic activity recognition systems provide efficient, low-cost health monitoring 24 h/day compared with monitoring by humans.
The abnormal activities in this study were selected after consultation with doctors and literature reviews. The literature reviews include falling, 3 chest pain, 4 and fainting. 5 An abnormal activity is defined as a state that requires urgent medical assistance for an elderly person.
Several research studies have investigated abnormal HAR focused on falling activity recognition because of the higher risk of falling in elderly people and its severe physical and psychological consequences. 6 –9 R-transform, which is invariant to common geometrical transformations, is used as a shape descriptor to recognize complex shapes, and chamfer distance transform is used to project binary shapes in the radon space to recognize particular shapes. Chamfer distance transform provides good approximation of shapes at different levels of Euclidean distance with high tolerance to scale and rotation misalignments. 10 The two accelerometer sensors are used to capture four types of hand motion: hand open, hand closed, flexion, and extension. Generalized discriminant analysis (GDA) is used as a nonlinear technique to recognize the hand motion patterns. 11 The discriminative hidden approach is proposed for human gesture recognition by labeling the complete sequence. Temporal sequences of human arm and head gestures are trained and recognized by hidden conditional random fields (HCRF). 12
In a previous study on HAR, we recognized six abnormal activities with a recognition rate of 86.5% by using R-transform and principal component analysis from the single view angle (90°) only. 13 The addition of activities from other view angles (–90°, 45°, −45°) severely decreased the recognition rate because of the fact that principal component analysis failed to extract symmetric, scale, and translation-invariant features.
In this study, six abnormal activities (falling backward, falling forward, falling rightward, falling leftward, chest pain, and fainting) and four normal activities (walking, rushing, sitting down, and standing up) from different view angles (90°, −90°, 45°, −45°) are recognized by using a single camera. The prominent features from silhouette sequences are transformed into directional coefficients by R-transform, and the GDA algorithm is used for feature extraction and dimensions reduction. The proposed system provides a higher recognition rate compared with the previous study 13 for the increased number of activities from different view angles.
Subject and Methods
Dataset Generation
The video activities datasets are generated for abnormal and normal activities from different view angles. Twenty actors (14 males, 6 females; age, 42.5±7.25 years; range, 25–60 years) performed the activities in a studio apartment. The activities are captured with a frame size of 320×240 at 25 frames/s. Figure 1 depicts the activity performed from different view angles.

The different view angles for human activity dataset generation.
System Model
The video activities are preprocessed to reduce the complexity of data and obtain background-subtracted binary silhouettes for each activity sequence. The silhouettes are centered and resized to 50×50 pixels before applying R-transform for feature extraction. GDA on R-transform features is then used to increase the discrimination between different classes of activities. The extracted features for each activity are transformed into symbol sequences by the Linde–Buzo–Gray (LBG) clustering algorithm. HCRF is used for activity recognition. The proposed system recognizes activities and generates an alarm message for the emergency service/doctor in the case of abnormal HAR. Figure 2 illustrates the overall architecture of the proposed abnormal HAR system model.

The proposed abnormal human activity recognition system model. GDA, generalized discriminant analysis; HCRF, hidden conditional random fields; LBG, Linde–Buzo–Gray; ROI, region of interest.
Problem identification
The two major issues that negatively affected the overall recognition rate are addressed during the implementation of our abnormal HAR system. First, the changing distance of a moving person from the camera results in scale and translation variations. Second, the same activity performed from different view angles increases the ambiguities among different activities (for example, falling backward/fainting and walking/rushing). R-transform and GDA methods are proposed to resolve the above-mentioned problems. R-transform provides symmetric, scale, and translation-invariant features. GDA works as a nonlinear technique to remove ambiguities and further improve class separation for the highly similar activities.
Feature Extraction and Activity Recognition Algorithms
A review of segmentation, feature extraction, dimensions reduction, and activity recognition methods is presented.
Image acquisition and segmentation
The Gaussian mixture model is used to extract the binary silhouette of a moving person from video activities based on adaptive background subtraction. The background is updated continuously to capture the recent changes in background due to intensity variations, repetitive motions, and cluttered environments.
14
The intensity xt
for a particular pixel at time t is given by a Gaussian probability density function as
where wt
is the weight, μt
is the mean, and

Block diagram of the adaptive background subtraction model.
The extracted silhouettes are resized to 50×50 pixels for increased efficiency. The activity of a moving person is represented by extracting the rectangular region of interest based on the foreground pixels of each frame. The shape vectors are normalized and represented in a row vector of 2,500 dimensions. Figure 4 shows the preprocessing steps to extract binary silhouettes from sample frames of a forward fall sequence.

Preprocessing steps to extract binary silhouettes:
R-transform
R-transform is used as a shape description to capture directional features from silhouette sequences based on Radon transformation. Radon transform computes the projection of an image at specified angles from the spatial domain (x, y) to the Radon domain (ρ, θ).
10
Let (x, y) represent the coordinates of points for binary function F; then the Radon transform of a silhouette F(x,y) is given as
where ρ represents the perpendicular distance along a line that is defined as
The normalized R-transform is symmetric, scale, and translation invariant. It transforms two-dimensional Radon projections into one-dimensional feature vectors of 180 dimensions. 10 Figure 5 depicts the representation of normalized R-transform for a falling backward activity sequence. The peaks in Figure 5b–d represent the maximum-valued Radon coefficients. The dimensions are also reduced from 1×2,500 dimensions for a silhouette of 50×50 pixels to 1×180 by R-transform.

R-transform feature representation for a falling backward activity sequence:
GDA
GDA is used for reduction of dimensions and increasing the variation among different classes of activities by the kernel approach to maximize the between-class variation and minimize within-class variation for better activity recognition.
15
The between-class
where
LBG algorithm
The LBG algorithm is used to generate discrete symbol sequences from GDA-transformed features before using HCRF for the training and recognition of activities. The codebook of feature vectors is generated by the LBG clustering algorithm. LBG is an iterative clustering algorithm that initializes with a codebook size of 1 and recursively splits further to get an optimally sized codebook. 16 The optimal codebook size of 64 is selected after experimenting with 4, 8, 16, 32, and 64 sized codebooks. Feature vectors for each activity sequence are transformed into the corresponding sequence of symbols by the LBG algorithm.
HCRF algorithm
The HCRF algorithm is selected for HAR because of its usefulness in recognizing sequential data patterns.
12
The conditional probabilistic HCRF model is defined as
where

A simplified hidden conditional random fields model for m length sequence.
The training data are used to estimate the parameters of the log-likelihood as
where n is the number of training sequences and L(θ) is the log-likelihood of GDA features. A separate HCRF model is trained for each activity. The sequence that is to be tested is compared with each HCRF, and the one with the highest likelihood is selected as the recognized activity.
The experiments are performed with MATLAB version R2009b on an Intel machine with a Core2 Duo 3 GHz processor, 2 GB RAM, and Windows XP. The activities dataset from 20 individuals is divided into training and testing datasets by utilizing 120 sequences from 10 people for training and 120 sequences from the other 10 people for testing. All the individuals repeated the activities three times from each view angle. The video sequence for each activity is transformed to the silhouette sequence and represented by the 15 key silhouettes. Table 1 presents the description of activities for dataset generation.
Description of Abnormal and Normal Activities for Dataset Generation
R-transform is applied on the silhouette sequences to extract symmetric, scale, and translation-invariant features from different view angles (90°, −90°, 45°, −45°). The 1×2,500 dimensional silhouette is reduced to 1×180 by R-transform. GDA on R-transformed features achieved a maximum recognition rate for 1×9 dimensional feature vectors. The 6-state HCRF model is selected for the training and recognition of activities after experimenting with different number of states (from 3 to 10).
The classification performance is evaluated by the confusion matrices for different view angles based on two major groups (abnormal activities and normal activities) as shown in the binary confusion matrix in Table 2. In this research, the focus is to recognize abnormal activities; therefore true positive (TP) represents correctly recognized abnormal activity, and true negative (TN) represents correctly recognized normal activity. False negative (FN) represents abnormal activity wrongly recognized as normal, and false positive (FP) represents normal activity wrongly recognized as abnormal activity. Sensitivity is defined as the proportion of TPs that are correctly recognized by the classifier: Sensitivity=TP/(TP+FN). Specificity is defined as the proportion of TNs that are correctly recognized by the classifier: Specificity=TN/(TN+FP). Recall is similar to sensitivity. The F1-measure is defined as the harmonic mean of precision and recall: F1-measure=2·(precision·recall)/(precision+recall). Precision is defined as precision=TP/(TP+FP). The false alarm rate (FAR) is defined as FAR=FP/(FP+TN).
Binary Confusion Matrix
Results
Activity Recognition from Different View Angles
To utilize the benefits of R-transform and GDA methods, GDA is applied on the R-transform features from different individual view angles. The recognition results are shown in Table 3. It is observed that some view angles (–45°, 45°) achieved higher recognition rates compared with other view angles (90°, −90°).
Recognition Rate from Different View Angles
Data are percentages.
Activity Recognition from Mixed View Angles
The sequences from different view angles are mixed to analyze our system, as in the real world the view angle of the testing sequences will not be known to the system. The recognition results from mixed view angles and a comparison with R-transform and GDA methods using our system are shown in Table 4. It is observed that our method provides a higher recognition rate for the activities compared with the R-transform and GDA methods.
Average Recognition Rate for R-Transform, Generalized Discriminant Analysis, and Our Method
Data are percentages.
GDA, generalized discriminant analysis.
Table 5 shows the performance evaluation for all the methods. A higher recognition rate for sensitivity, specificity, precision, and the F1-measure and low FAR is observed for our system compared with the R-transform and GDA methods.
Performance Measures for R-Transform, Generalized Discriminant Analysis, and Our Method
Data are percentages.
FAR, false alarm rate; GDA, generalized discriminant analysis.
Discussion
In this research, a system is presented for the elderly person's healthcare at home. Other systems for abnormal HAR include a project called TigerPlace, implemented with a concept of aging in place for elderly people living in apartments; the daily life activities are monitored and analyzed to improve the quality of life for elderly people. 17 R-transform is used to recognize abnormal activities (rushing in, carrying a bag out of the office, and abruptly bending down) in an office environment; an average recognition rate of 90% is achieved using the hidden Markov model from the simple view direction. 18
Our proposed system uses a novel combination of R-transform and GDA methods for feature extraction/dimensions reduction and HCRF for activity recognition. Average recognition rates of 94.2% for six abnormal activities and 92.7% for four normal activities are achieved. The recognition rate for highly similar posture sequences of falling backward/fainting and walking/rushing activities is further improved. Our system performs well compared with the previous study, 13 even with a higher number of complex activities from different view angles. This proves the feasibility of the proposed system.
Some of the postures from different view angles may not be prominent in binary silhouettes. For instance, when the hands are in front of the body or very close to the body, then the binary silhouette will consider it as part of the body and generate confusions with postures from other activities. In a future study, we will use a stereovision camera to generate the three-dimensional depth video activities dataset, where the depth information from each pixel of the silhouette is used to generate the three-dimensional depth silhouette map. This will further improve the discrimination between different classes of activities and results in the increased overall recognition rate. In the future, the dataset with more complex activities over longer periods of time will be considered for activity recognition.
Conclusions
This study presented a system for elderly healthcare at home by monitoring the daily life activities of elderly people. An alert with the patient information and type of abnormal activity is generated and transmitted to the emergency service for urgent help in the case of abnormal activity recognition. The high recognition rate and low FAR for different activities show the potential of the proposed system for real lifecare applications.
Footnotes
Disclosure Statement
No competing financial interests exist.
