Abstract
Introduction
The impending increase in numbers of elderly citizens over 65 years of age will soon lead to a much larger ratio to the entire worldwide population. For example, 15.07% of the 23 million population of Shanghai is more than 60 years old. 1 Falling is a common and dangerous accident for elderly individuals and an important factor affecting the living quality of the elderly. It is estimated that more than one in three people over 65 years of age living at home fall at least once each year. The risk of falling also rises with increasing age. Falling also leads to decreased mobility, fear of fall, and even deaths. 2 –4 Furthermore, research among Medicare (the largest healthcare insurer in United States) beneficiaries shows that the total medical care costs each year for elderly adults who fall once a year are 29% higher than the ones for other elderly adults who never fall in a year; similarly, the costs for elderly adults who fall more than two times a year are 79% higher than for those who report no fall. 5 However, the limited funds for public healthcare service and an aging population are driving factors for reducing institutionalized healthcare based on entity service and moving to home healthcare based on wireless sensor networks.
Accelerometers with low-cost and low-power features make wearable and reliable devices for fall detection into a reality. Multiple sensors with accelerometers placed at various body sites are used for real-time human movement detection. 6 –8 Many systems 2,9 –12 use triaxial accelerometers to detect a fall according to the acceleration of body motion and posture angle. To achieve better accuracy, recent systems 13 –15 detect a fall by combining an accelerometer with a barometric pressure sensor, image processor, gyroscope, etc. The Information Technology for Assisted Living at Home (ITALH) is a project using new technologies to help older citizens live more comfortable. 16,17 The ITALH project includes two items: the IVY project is concerned with detecting falls at home or in office environments, and the SensorNet project is concerned with developing an integrated, safe, and wireless sensor to monitor the user. However, these previous systems have several restrictions: (1) the methods are device-centric, not user-centric; (2) the devices are expensive and complicated; and (3) the information received by the doctor is insufficient to make an accurate diagnose in a timely fashion. In most of the systems, the final decisions are based on the data collected from the sensors, and the user cannot express his or her ideas on his or her own initiative and must passively accept the decision. In addition, some of the previous systems use acoustic or vibration sensors, image processing software, etc. Most of them are high cost and not universally accessible. Ordinary users cannot control them at their will. Few systems send a short message service (SMS) alert as a simple alarm. However, the text message is not enough for describing a patient's symptoms, and a caregiver cannot provide accurate treatments to rescue the user through this simple text message. Therefore, we propose a three-step fall detection scheme. First, we use multimodality resources (i.e., movement, audio, images, and video) for precise fall detection. Second, our device informs caregivers that a fall has happened by sending an alert e-mail once a fall is detected. Third, we provide a video clip of the fall scene by uploading the video to network storage for further investigation.
In this article, we propose the Home Healthcare Sentinel System (HONEY) for home-based fall detection and response by combining multiple sensors and a home-area network, which will be a common setting for most houses and apartments. In addition, HONEY is an innovative multimodality detection scheme that combines a triaxial accelerometer, speech recognition, and on-demand video. We implement a prototype that incorporates the core functions of HONEY and perform a comprehensive evaluation of the system. Our results show that (1) HONEY can distinguish fall and non-fall actions with high accuracy, (2) the speech recognition of HONEY can work well on a reasonable level in either a quiet or a noisy environment, (3) we compare HONEY with the Advanced Magnitude Algorithm (AMA) presented by Brown, 12 which is a wearable sensor-based method, and the accuracy rate of HONEY to detect a fall is higher than that of AMA, and (4) the average response time of HONEY is 46.2 s, which is a short enough period for emergency first aid.
Background
Because of the growing ratio of elderly people, most elderly people live on their own in a home-dwelling environment. In China, the population over 60 years of age is about 167 million by 2010 (13.2% of the total). By 2050, the ratio will be over 30%. The number of people over 80 years old will be more than 90 million by 2050. 18 Such trends are also found in Australia and other developed countries. 19,20 The high costs for healthcare become a significant issue that every nation has to address. 21 An emerging method is the use of wireless sensors to detect problems as early as possible and to prevent incidents. Among these incidents, falling is the leading cause of nonfatal or fatal injuries for the elderly group. Fifty-five percent of falling injuries happen inside the home, and an additional 23% occur near the home. 22 A fall often causes many physiological and psychological problems, such as restricted activity, fear of falling, fear of living alone, severe injury, and even death. To avoid this situation many individuals are forced to leave the comfort and privacy of their home to live in a nursing home; it is a significant issue to develop a home-based fall detection system to protect elderly people by using an accurate, automatic, nonintrusive, and socially acceptable method.
Recent developments in wireless communication technologies and the wireless sensor network provide a foundation for the remote supervision of physiological monitoring. Most of these systems only give a judgment about whether the subject has fallen. A few of them send a SMS alert. However, the information is not enough for caregivers to give an accurate diagnosis. To resolve these problems, we propose HONEY, which is a home-based healthcare system. Our system not only provides instantaneous and continuous fall detection, but also provides more information for caregivers. The time between the occurrence of a fall event and the sending of an alarm e-mail for caregivers is also short enough for first aid.
HONEY System Architecture
Figure 1a provides the high-level overview of HONEY by showing the key components. HONEY consists of the FallSensor, the Homeserver, and the Cloud-based Personal Information Storage (CloudPIS). The FallSensor includes an accelerometer, microphones, and cameras. The accelerometer is fixed on the user's waist, and the microphones and cameras are distributed in every room of a house in pairs. The Homeserver is the control center, which determines whether the user has fallen or not by using the data collected from the FallSensor. The CloudPIS is an information-sharing platform for patients and caregivers and serves as a permanent storage for a patient, including all the information about the patient's medical history and real-time data, such as video clips. The caregivers can obtain the video information of a user's fall scene through the CloudPIS in a secure way. Figure 1b shows the three-step fall detection at the functional level. First, an accelerometer monitors a user's acceleration and posture; if triggering conditions are met, the user's audio message can be used for speech recognition function to confirm or cancel the alarm. When a fall has been detected, an alarm e-mail is sent to caregivers. Second, because the alarm e-mail contains images that may involve the user's privacy, we have thus taken measures to protect his or her privacy information during the mail delivery. Similarly, during the data transmission between the Homeserver and the CloudPIS, the video clip data also need strict protection to avoid privacy disclosure. Third, when caregivers receive an alarm e-mail, they can review the fall scene video through the CloudPIS and can make an accurate diagnosis based on the symptoms presented in the video clip and the medical history stored in the CloudPIS.

A high-level overview of the Home Healthcare Sentinel System.
Fall Detection
A sensor is fixed on the waist of a subject, and it monitors the acceleration and the tilt angle (TA) of the subject in real time by processing the data collected from the accelerometer. If the FallSensor detects magnitude acceleration, it will trigger the Homeserver for further detection, which includes speech recognition and kinescope recording using a pair of microphones and a preset camera. Later, the FallSensor may give the posture information. However, this process depends on the result of speech recognition. The Homeserver will combine the results of speech recognition with the TA sent by the FallSensor to make a final decision on whether the subject has fallen or not. An alarm e-mail will be sent to a doctor and the subject's relatives if HONEY gives a positive decision. The next step is uploading the video recorded by HONEY to the user's CloudPIS. The caregivers can review the video and take appropriate first aid measures in light of the user's symptom shown in the video.
When HONEY is working properly, the triaxial accelerometer on the FallSensor will sample three-axes acceleration at a frequency of 45 Hz. Then the total acceleration of body movement is calculated at every sampling point. If the total acceleration is over the the threshold preset, the FallSensor detects a possible fall. Then “early warning information” is sent from the FallSensor to the Homeserver. The FallSensor keeps monitoring the TA of the body movement using those sampling values in the following interval (about 20 s). Now the Homeserver receives the early warning information and locates the position where the subject has fallen. The existing localization methods based on radiofrequency or electromagnetism
23
are good enough and can be leveraged in our localization algorithm. HONEY leverages either the radiofrequency- or electromagnetic-based developed indoor localization algorithm, which uses a magnetic sensor, an accelerometer, and a moving vehicle with servo motors to determine the location of the fall event. (Details are beyond the scope of this article.) The pair of microphones and the camera closest to the fall position are turned on, and then a further detection begins immediately. The camera will not stop recording on the falling scene until HONEY finishes the current detection. This action continues for 12 s. During the recording, the camera also captures images every 2 s, so there are in total six images. The video file and images are stored in the local storage temporally. At the same time, the Homeserver begins speech recognition with the microphone. The Homeserver will identify whether the subject says the specified key words. According to the cases after the patient falls, HONEY's working states will be divided into the following two categories: 1. If the patient is conscious, he or she could speak the specified key words clearly. The patient could say a positive key word such as “Help me,” which means he or she cannot take care of him- or herself and needs first aid right now, or a negative key word such as “I am fine,” which means he or she is not injured and could recover him- or herself. If a positive key word is detected, the Homeserver sends “abort information” to the FallSensor. This information will stop the FallSensor from watching the TA changes and drive it into the next round of detection. All those actions should be completed in 20 s. Otherwise, HONEY will work in the other process. In most cases, the speech recognition finishes earlier than the kinescope recording. Consequently, when the video recording is completed, the Homeserver sends an alarm e-mail to the target mailboxes. The prior six images captured will be the attachments of the alarm e-mail. After the e-mail is sent out, the video file will be uploaded to the CloudPIS. A doctor could get the patient's status based on the images and take first aid measures immediately. If image information is insufficient to decide the patient's state, the doctor can review the video file for more details. In contrast, a negative key word stops current detection. The “abort information” statement is sent, and the FallSensor steps into the next round of detection. The kinescope recording and image capturing will finish normally. Then HONEY enters into the standby state. Afterward, an investigation of this uninjured fall could help to improve the living environment according to those video file and images. 2. If the subject is seriously injured and unconscious, he or she cannot speak anything in this situation. The camera is still on duty as usual to perform kinescope recording on the scene and to capture images, but the speech recognition function becomes invalid. Therefore, the Homeserver will not send the “abort information” statement. The FallSensor still keeps monitoring the TA of body movement after “early warning information” is sent. Now it is the FallSensor's turn to continue the detection. When the 20-s interval ends, according to the current TA of the patient, the FallSensor sends “alert information” to the Homeserver. This information includes the judgment made by the FallSensor based on the TA. The analysis result of “alert information” provides the foundation for the Homeserver to make the final decision. If it is a fall, HONEY will send an alarm e-mail, which is the same as the one previously described. Otherwise, HONEY terminates current detection and enters into the standby state.
In all categories, FallSensor or speech recognition may draw a wrong conclusion. In this situation, the images and video clip could be served as a remedy measure. A doctor can correct the false-positive cases by scanning the images and the video, which also reduces unnecessary waste of medical resources.
Privacy Protection
Because HONEY is used in a home environment and related to human beings, privacy protection and data security are important issues in reality. We assume the patients are willing to share their information with caregivers. In HONEY, the cameras are located in several different rooms of a house, such as the bedroom, living room, and dining room. When HONEY detects falls, the system has to capture some images and record a video clip. In those images and clips, some personal or individual privacy information may be included, and a privacy disclosure may occur during the transmission. A more serious case is that revealing the privacy information may affect a user's personal safety. Thus, HONEY provides a robust protection on privacy protection and data security. HONEY takes the following three approaches to achieve the purpose: 1. We have chosen Bluetooth® (Bluetooth SIG, Kirkland, WA) as the communication protocol between the FallSensor and the Homeserver. Up to now, the effective signal range of Bluetooth is about 10 m. This distance could cover most ranges of a house but not much larger than a house. It is hard for an intruder outside of the house to receive the Bluetooth signal, which means that it is unlikely and difficult to intercept sensitive information. An 8-bit decimal personal identification number (PIN) code is also used for establishing the connection in the safe mode. The connection mode is point-to-point, not spreading. A short message length is necessary to reduce the risk of data loss, which could also extend the sensor's lifetime. 2. We have chosen the Secure Socket Layer (SSL)
24
technology as the privacy protection tool on the Internet. Large numbers of applications prove that SSL is safe and reliable in an open Internet environment. When HONEY sends the alarm e-mail or uploads the video clip, the sensitive information will be transferred over an open Internet environment. The attacker may capture the packets that contain the privacy information. SSL is built on a reliable transport protocol, such as Transmission Control Protocol, and provides data encapsulation, compression, encryption, and other basic functions to ensure the data and communication secure. 3. When the video is uploaded, a time-limited link address is sent to the doctor. This link address is the one for the doctor to access a user's CloudPIS. If there is a need to review the video, the doctor could get the video through the address. It is certain that the link address forbids downloading, deleting, and modifying. If the time exceeds the limitation, the address becomes invalid. The doctor also has to promise not to reveal the images in the alarm e-mail.
Under the protections of the above measures, privacy information is well protected, without the worry of privacy disclosure. Because the privacy protection is beyond the scopes on which we focus, the protections may be faulty. In the future, we can use other more robust measures to make HONEY safer.
Cloudpis
Nowadays, multiple terminals owned by an individual make the synchronization of files or other information difficult and troublesome. Thus, cloud-based personal storage, such as Google Docs, Dropbox, etc., has come into being and increasingly become part of human life. In addition to providing basic storage services, these cloud-based storages also provide a nice, easy-to-use sharing function among friends without copying and pasting. The data stored in those cloud-based storages are encrypted, and the data transmission is also protected by various methods, such as authentication, authorization, etc. The CloudPIS will leverage existing privacy protection techniques for CloudPIS's personal information protection and control. (Details are beyond the scope of this article.) In practice, researchers have designed such a common storage layer called Wukong. 25 This system provides a user-friendly and highly available facilitative data access method for mobile devices in cloud settings and supports heterogeneous storage services.
Privacy protection on the CloudPIS is an important component. The reality is that many countries have corresponding privacy laws or acts, which have a mandatory requirement on the privacy issue, such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States.
26
In addition, the corporations of those commercial storage products, such as
In HONEY, the fall scene video clip is uploaded to the CloudPIS, and we expect that caregivers can obtain this video clip automatically and immediately, without complex and difficult file sharing between the client and the server. Cloud-based personal storage provides a good foundation for us to build our CloudPIS. By leveraging these services for storing and sharing personal information, such as medical history and other medical information, caregivers have access to such information using a PC or personal digital assistant. At the same time, the aims of informing caregivers of a fall that has happened and providing them more video information in the CloudPIS are achieved. In HONEY, the CloudPIS is designed to use this type of service by providing a general storage interface for different cloud storage services, and the user can use it without consideration of the different features between various services. The Homeserver can upload the video to the CloudPIS, and the user can manage his or her own medical history and other medical information with the cloud platform. Doctors can also use the video and the user's medical history stored in the CloudPIS to give some detailed suggestions or treatments for the user to improve living quality or rescue him or her in case of an emergency.
To this point, we have described the three-step detection scheme of HONEY and all the aforementioned methods that make HONEY a reliable, safe, and responsive system. HONEY provides reliable fall detection by using a triaxial accelerometer, speech recognition, and on-demand video. HONEY also sets the user's concerns at rest by using various privacy protection measures. The response time is also short enough to avoid the serious consequence caused by not having any medical assistance in time.
Implementation
We have implemented a prototype of HONEY. The program running on the FallSensor is written in Java for the Android environment, and the one running on the Homeserver is written in C/C++ and C# for the Windows environment. The prototype implements the most core functions of HONEY. The devices and standards used in our prototype are described in Table 1.
Description of Devices and Standards
FPS, frames per second; SSL, secure socket layer.
Fallsensor
A triaxial accelerometer is integrated in the FallSensor, and the FallSensor sends “early warning information” if the trigger conditions are met with handling the three axes' sample values at a frequency of 45 Hz. In our prototype, an HTC (New Taipei City, Taiwan) G3 Hero smartphone acts as the FallSensor. The G3 smartphone has a triaxial accelerometer having a frequency of up to 70 Hz. In addition, the Bluetooth module and a high-performance processor on the G3 smartphone satisfy FallSensor's requirements very well.
It is known and has been verified that a sensor based on a triaxial accelerometer can distinguish body movements more precisely when it is fixed on the patient's waist.
27
In HONEY, the G3 smartphone is worn on the waist of the body (Fig. 2). The triaxial accelerometer will output three acceleration values on the x-, y-, and z-axes at every sampling point, and the unit of measurement is m/s
2
. When the body is stationary, the total acceleration of the body is g (the gravity of Earth, 9.8 m/s
2
), vertically down. When the body is moving, the acceleration changes along with the movement intensity. Our algorithm running on the FallSensor is based on the assumption that a fall is always associated with a magnitude impact. An estimation of the degree of body movement intensity can be obtained from the signal magnitude vector (SVM), which is defined by the following relation:

The FallSensor is fixed on the waist. In the upright position, the orientation of the x-axis, y-axis, and z-axis in relation to the sensor is also illustrated.
where xi is the i-th sample value of the x-axis signal (similarly for yi and zi ). Therefore, comparing the SVM with a preset SVM threshold (SVM_th) allows detecting the associated fall.
Similarly, when the body falls, the space relationship between the body and the ground also changes significantly. In order to determine the space posture of the body, we define the TA as the angle between the positive z-axis and the SVM by the following relation:
where z is the sample value of the z-axis signal. TA refers to the relative tilt of the body in space. We also need an angle distinction between the upright postures of sitting and standing, as well as lying in various conditions. The work of Karantonis et al. 8 provides the range of TA corresponding to the different body postures: if the patient's TA is from 0 to 20°, it is classified as standing, whereas values from 20° to 60° indicates a sitting posture; if TA is between 60° and 120°, it is regarded as a lying posture. In most cases, a fall starts from a standing posture and directly ends with lying on the floor. However, no fall would be predicted if the user falls in such a way that he or she was not parallel with the ground. This is important in various cases during a fall. A user will try to grasp a wall, chair, or other objects and end up slumping next to the object, such as sitting on a chair, rather than lying on the floor. Therefore, a sitting posture following a magnitude SVM is regarded as a fall by our application too. Moreover, if a patient is intact, a short recovery interval (around 20 s) should be provided for the patient to recover. The algorithm that is simplified running on the FallSensor is developed from the AMA and illustrated in Figure 3. We add the Bluetooth communication process, which provides for a simple information exchange between the FallSensor and the Homeserver. The FallSensor just needs to send the triggering information (“early warning information”) and the posture information (“alert information”), so we also subtract the second posture determination process following the first delay interval of AMA. The delay interval of HONEY is 20 s, whereas that with AMA is 12 s.

The algorithm running on the FallSensor. SVM, signal magnitude vector; TA, tilt angle.
When the current SVM exceeds the SVM_th, “early warning information” is sent out. In the following time interval, the FallSensor will wait for “abort information.” If no “abort information” statement is received, the FallSensor predicts the body's posture according to the current TA at the end of the recovery interval. Then, the positive or negative “alert information” will be sent to the Homeserver. Finally, the FallSensor comes back to the initial state.
Homeserver
The Homeserver is the control center of HONEY. It connects the FallSensor and the external network. The speech recognition, kinescope recording, image capturing, e-mail sending, and video uploading are under the control of the Homeserver. Speech recognition and image capturing are important means to detect a fall. Therefore, the camera should have a high resolution and a wide view. In this way, the images and video are clear for later review. The microphone for speech recognition should also have a high sampling rate, thus reducing the negative effect on accuracy caused by audio quality.
In our prototype, a laptop running the Windows 7 operating system acts as the cornerstone of the Homeserver. The camera and the microphone are peripherals connected to the laptop. A Bluetooth module and Internet access are also available on the laptop. All those devices work together as the Homeserver. Figure 4 provides the algorithm running on the Homeserver. In an idle period, the Homeserver listens for whether a message comes in. If “early warning information” is received, the Homeserver starts image capturing and speech recognition in the following interval (around 20 s). If a key word is detected, the Homeserver sends “abort information” to the FallSensor and then determines whether the alarm e-mail should be sent or not, depending on the key word detected. If no key word is detected, the Homeserver waits for the FallSensor's “alert information.” If positive “alert information” is received, the Homeserver sends an alarm e-mail. Otherwise, the Homeserver goes back to the initial state.

The algorithm running on the Homeserver.
Speech recognition
In some situations, a patient falls without injury or light injury, and he or she can ask for help or cancel the alert by speaking out some words. In HONEY, a user is asked to speak out some key words if he or she wants help or wants to cancel the alert. Those key words are prespecified. The user can also modify them to make them easy to remember or customized. But, it is better if they do not conflict with daily words.
Today, several companies provide free speech recognition engines. The newest operating system from Microsoft® (Redmond, WA), Windows 7, also integrates an engine for free. It is easy to access by using Microsoft's Speech API 28 tools. This engine can identify multiple languages, including English and Chinese. Training is also available to improve the engine's performance. We do not train the engine because HONEY should work well for anyone rather than an individual patient. The user could train the engine for a particular object. We also define two types of key words: positive type and negative type, and they have opposite meanings. Positive-type key words, including “I fall,” “Help me,” and “Call 120” (in China) or “Call 911” (in the United States), are used for affirming the alert. Negative-type key words, including “I'm fine” and “Cancel alarm,” are used for canceling the alert. In addition, the accuracy of speech recognition relies on the microphone's location and the quality of voice sample. In order to reduce the distance between the microphone and the patient, we place the microphone in the middle of the room. We also adjust the microphone's sampling frequency and quality to the highest level.
Kinescope recording, image capturing, and e-mail sending
Vision information can provide information directly and in detail. In the design of HONEY, the Homeserver should upload the video clip to the user's CloudPIS, and a doctor could acquire the patient's symptoms by reviewing the fall video immersive. Then targeted aid measures could be done immediately. Because the video transmission is time-consuming, we run this task in the background. When the alert e-mail sending is finished, the video transmission begins.
In our prototype, we use a camera to capture images of the fall scene and send those images to a doctor through an e-mail. The camera is 2 megapixels with a highest resolution of 640×480 pixels. The horizontal view angle is 60°, and the vertical view angle is 44°. The camera is installed on the wall, 2 m high from the ground, where most areas of the experimental chamber can be included in the video. The mat used for the experiment is located in front of the camera on the ground. The images are saved in JPG format. The image size is 640×480 pixels, and those images are compressed before they are sent.
When a fall is detected, HONEY sends an alarm e-mail, which includes six images, to corresponding caregivers' mailboxes, such as those of a doctor or relatives. A mailbox account is needed for sending the alert e-mail. The user has to specify the target mailboxes in advance. In our prototype, we use two Gmail accounts as the source and the destination. The alarm e-mail contains fixed text and the images captured beforehand.
Video transmission
As designed, HONEY will transmit a 12-s fall video to the CloudPIS once a fall is detected. In order to simulate the real application environment, we transmit the fall video to a commercial cloud-based storage in China,
Privacy Protection
The disclosure of privacy usually happens when sensitive information is transported on the Internet. There are three data transmission processes in HONEY: the Bluetooth communication between the FallSensor and the Homeserver, the video transmission, and uploading the images. We will discuss the privacy protection concretely in the following sections.
Bluetooth communication
Privacy protection on Bluetooth communication includes two parts: one is the configuration of Bluetooth properties, and the other is the definition of the format of special messages. The following approaches are adopted to configure Bluetooth: First, we make the Bluetooth module work in a safe mode. FallSensor should be set to undiscoverable for other Bluetooth devices. Thus other irrelevant devices cannot launch an initiative connection with the FallSensor. Second, an 8-bit decimal PIN code is used for establishing a safe channel. The PIN code is preset in HONEY. After the channel is established, Bluetooth provides a service-level security, supporting authentication, encryption, and authorization. Third, we use the media access control address to establish the channel directly and make this channel in point-to-point mode. Thus, the Homeserver could find the FallSensor without device searching even if the FallSensor is undiscoverable. The specific messages' size is two bytes: the first byte is an identity (ID) that presents the current detection, and the second byte is the message that presents specific information transferred between the FallSensor and the Homeserver. Figure 5 shows the format of four messages. The ID is generated randomly within 0 to 255, and the FallSensor and the Homeserver do not save it until the current detection ends. This means that the messages own the same ID in the same detection. If the ID is different from the one that the FallSensor and the Homeserver save, we believe that HONEY fails or is attacked, and we should resume HONEY.

Four types of message format. Bit0 is the highest bit of the message, and bit15 is the lowest.
SSL communication
When HONEY determines an alarm e-mail should be sent, six images will be sent as the attachments of the email. Those images will be uploaded to the mail server before delivery. In addition, the video will also be transmitted after a fall occurs. Thus, a privacy disclosure may happen during the image uploading and video transmission periods. Therefore, we establish a secure connection channel between HONEY and the corresponding servers using SSL (version 3.0). The data transferred through SSL are encrypted, and both end points are authenticated. Most common mail servers support SSL, and it is easy to use SSL. In HONEY, we choose Gmail as the mail service provider because Gmail only provides SSL connection and it is common and widely used. The video storage server provided by
Evaluation
In this section, we design four experiments for evaluating the HONEY performance, including determination of SVM threshold, speech recognition accuracy, video transmission, and overall HONEY performance. We invited 10 volunteers to do the following tests, include 2 women and 8 men.
Determination of SVM Threshold
In HONEY, the fact that the value of SVM exceeds SVM_th is the trigger condition of fall detection. Thus, a suitable SVM_th should distinguish the daily activities and falls. In order to select a reasonable value, we invited three volunteers to do this test. The average age, weight, and height were 24 years old, 65 kg, and 168 cm, respectively. The G3 smartphone was fixed on the waist of the volunteers. In the test, we asked volunteers to wear the G3 smartphone and do the following tasks described in Table 2: walking, running, ascending stairs, descending stairs, sitting down, standing up, squatting, rising, and fall. All those activities are done as natural movements without any restriction. When a subject performs those actions, the G3 smartphone will record SVM in real time at 45 Hz. After all tests, we chose 500 sample values of each action except fall for further processing. Because the duration of a fall is short, we only chose 200 sample values. Then we calculated the maximum, the minimum, and the upper and lower limits of the 95% confidence interval for each action statistically. Figure 6 provides all actions' results and the SVM_th (1.9 g) we selected.

The 95% confidence interval of the signal magnitude vector (SVM). The four leftmost values for each movement represent the maximal value, the upper limit of the confidence interval, the lower limit of the confidence interval, and the minimal value, in that order. The selected SVM_th is marked (dashed line).
Description of Tasks
From Figure 6 we can see that the SVM_th we selected could distinguish most daily activities from falls except running, ascending stairs, and descending stairs. The maximum values of walking, sitting down, standing up, squatting, and rising are lower than the SVM_th, so those actions will not trigger the detection. All data of a fall and some data of running, ascending stairs, and descending stairs are above the SVM_th. Therefore, if an older person falls down, the event of the SVM exceeding SVM_th will certainly trigger the detection of HONEY. Besides a fall, running, ascending stairs, and descending stairs also may be regarded as a fall by mistake. In this situation, we can correct it in the following detection.
In this test, the acceleration data are acquired from the activities performed by those young volunteers. Although those data certainly differ from the data from real elderly people's activities, in the following tests, all the volunteers we invited are of similar physical status, including age, height, and other aspects. Therefore, the SVM_th selected here is appropriate in the following test.
Speech Recognition Accuracy
Speech recognition accuracy is an important factor affecting HONEY's performance. In practice, we cannot ensure the fall environment is quiet, and other voice sources, such as a TV or a radio, may exist in the room. Therefore, two sets of experiments were done to evaluate the speech recognition accuracy. One is the “quiet environment” test, where the background volume is 25 dB. This is a typical quiet home environment. The other is the “noisy environment” test, where the background volume is 65 dB; this also is the normal speech volume. By playing chamber music in the room, we kept the background volume of the test environment at 65 dB. In addition, the sampling quality of the microphone determined the result of speech recognition. We preset the sample frequency at 96,000 Hz and the bit-depth at 24. Then we did the test at different distances away from the microphone. At each distance, we asked three volunteers to say a 50-word paragraph, which consisted of the five key words mentioned above (“I fall,” “Help me,” “Call 120” (in China), “I'm fine,” and “Cancel alarm”). The average age, weight, and height of those volunteers were 24 years old, 65 kg, and 168 cm, respectively. The normal speech volumes of those volunteers were 62 dB, 64dB, and 68 dB. Those key words were repeated by the participants in any order, and the volunteer spoke in his or her normal speech volume at distances of 1, 2, 3, 4, and 5 m separating the microphone and the speaker. The microphone was installed on the floor at the center of the room.
The average accuracy of each point is illustrated in Figure 7 for both environments. Along with the increasing distance, the accuracy decreases. It is also obvious that the accuracy in a quiet environment is higher than that in a noisy environment. In an ordinary family, the area of most rooms is smaller than 6×6 m2. If we install the microphone in the middle of a room, the longest distance away from microphone in the room is within 4.3 m except the corner areas. Under these conditions, the accuracy rate in a quiet environment is above 90%, versus 77% for the noisy environment. The low accuracy in a noisy environment is mainly caused by the following three reasons. First, the microphone we used is sensitive to all directions, which is needed in our system, although this is not helpful for recognition accuracy. A professional microphone with a high unidirectional sensitivity may be the best choice for speech recognition, whereas this is not suitable in our application. Second, the noisy volume also has an influence on the accuracy. In our test, the noisy volume is roughly equal to the volunteers' voice volume. When the speaker is far away from the microphone, the actual speaker's voice volume detected by the microphone is lower than the speaker's normal speech volume, so the noisy volume will be much higher than the speaker's voice volume. Of course, the noisy volume may be higher than we assumed in our tests, which would lead to even lower accuracy. Third, the speech recognition engine is not professional. This may also affect the accuracy. However, this engine provided by the Windows 7 operating system is easy to use and access in the home. Overall, the accuracy level can satisfy the practical application of HONEY in a home environment. Therefore, in the overall HONEY performance test, we did all experiments within a certain range of 3 m.

The average accuracy of speech recognition at different distances in a quiet and a noisy environment.
In this test, limitations also existed. When an elderly person falls, his or her voice may be weak or speech unclear. This affects the performance of speech recognition. However, this may not affect the performance of HONEY, as the accelerometer on the FallSensor will continue to detect the fall if the speech recognition fails. Another limitation is the noisy voice volume. In the real scenario, the noisy volume changes according to the elderly persons' living environment. Here we assume that a TV's or a radio's volume is equal to the normal speech volume. So we set the noisy volume at 65 dB. Sometimes, an elderly person may need a larger volume to enjoy a TV or radio show, because of poor hearing. Here we assume the hearing of elderly people is strong enough to heart the TV's or radio's sound, even if with a hearing aid. Ideally, the closer the microphone is located to the user, the higher the recognition accuracy is. In this situation, the distance between the user and the microphone is less than 1 m, and the recognition accuracy approximates 100% according to our experimental result showed in Figure 7. Consequently what we plan to do is to integrate the microphone into the FallSensor, which would make HONEY more practical.
Video Transmission
This function has been implemented in our prototype. In our system, the video has a constant time length and frames per second, so the video file size is mainly dependent on the resolution. Thus a separate test evaluated the performance of video transmission function. Four kinds of resolution and two types of network access are considered in this test. As we all know, the bandwidth, as well as the data length, affects the data transmission time for both upload and download. In China, the bandwidth of connection to ADSL is normally 2 megabits per second in a home environment. The download speed is 256 kilobits per second (Kbps), and the upper limit of the upload speed is 56 Kbps. Another network access is through the 3G cell phone network. Smartphone usage is widespread, and one can connect to the network with a 3G cell phone. Hence the video transmission performance on the 3G network is also evaluated. A computer uses a 3G network adapter to gain access to the 3G cell phone network. The actual network bandwidth of 3G is dynamically changing based on the 3G signal quality in different areas.
Without any other network connections operating, we uploaded a 25.91-megabyte file to network storage and repeated this operation five times under two different network conditions. The same applied to the download test. Then we concluded that the average upload speed was 44 Kbps, and the download was 256 Kbps with an ADSL access. In contrast, the average upload speed of the 3G network was 94 Kbps, and the download speed was 59 Kbps.
Next, we kinescoped the 12-s video at different resolutions, and the frames per second value was fixed to 30. We did this with a proper illumination level, under which conditions the video is clear enough to distinguish objects. We estimated the upload and download times based on the actual network speed. Table 3 provides the resolution, the related file size, and the estimated times under different network conditions. The estimated upload time and estimated download time are calculated by the following equations:
Estimated Upload and Download Times for Different Network Types
ADSL, asymmetric digital subscriber line; MB, megabytes.
According to the results shown in Table 3, it is obvious that no matter what the network condition is, the video file size and the estimated time decrease along with the resolution. The total time varies from 5 to 45 min. When the video is in the lowest resolution, the total time is about 335 s (less than 6 min). It is important to make the total time within 10 min (“Emergency Platinum Ten Minutes”), which means the emergency aid will be opportune. However, the poor quality video may affect the doctor's diagnosis. If we chose the video with a resolution of 320×240 or 352×288, the total time is less than 20 min. This exceeds 10 min, but compared with 60 min (“Golden Hour”), it is also appropriate in emergency service, and the video quality is more convenient for the doctor. With the highest-resolution video, the total transmission time is over 45 min. Even though the video quality is quite good, the transmission time is too long for emergency aid. From the perspective of time consumption, there is no significant time difference between ADSL and the 3G network. Because HONEY mainly works in the home environment, the effect on the system caused by network type is not as apparent as in a public environment, where the 3G network is easier to find and access. As part of HONEY's design, an alert e-mail will be sent to the caregivers once HONEY detects a fall before the video is transmitted. Because the time required for e-mail sending is much shorter than for video transmission, we run the transmission task on the Homeserver in the background, while the alert e-mail is sent immediately.
Overall Honey Performance
This test of the HONEY prototype is in a controlled lab environment. Figure 8 shows the real experimental scene. According to the result of speech recognition accuracy, all tests are performed within 3 m. We also define the “response time”—the time between detecting the magnitude SVM and sending the alarm e-mail. The delivery time of an alarm e-mail depends on network conditions, which we do not control, so the response time does not include this period. There are two groups of experiments based on different detection methods. One group is wearable sensor-based method as the comparison. The other is based on the HONEY system. We applied AMA on the G3 smartphone as the wearable sensor-based method. In each group, we asked 10 volunteers to do the test. The average age, weight, and height were 24 years old, 73 kg, and 173 cm, respectively. Each volunteer did 12 actions, including 7 daily activities and 5 falls. The 7 daily activities included walk-to-sit, run-to-lie, squat-to-sit, jump-to-sit, stand-to-sit, stair ascending, and stair descending. The 5 falls include 3 simple falls and 2 complex falls. The simple falls and complex falls all are defined in Table 2. In all tests, the G3 smartphone was fixed on the waist of the subject during the entire activity. In contrast, the tests for HONEY are different from those for AMA. The volunteers spoke the key words for speech recognition during the fall test. The test environment included a quiet environment and a noisy environment.

The real experimental scene. The volunteer is lying on the mat. The camera is capturing images, and the image and some text information are displayed on the user interface (laptop screen).
The other experiment conditions are the same. When the subject does those actions and a fall is detected by HONEY, the response time will be recorded.
Table 4 shows the results. The total accuracy of AMA is 76%. The false-positive rate is 22%, and the false-negative rate is 26 %. The reason for the low accuracy is the similarity between daily activities and falls. In those seven non-fall actions, run-to-lie and jump-to-sit are most similar to falling. The SVM values of running and jumping are easier to exceed the SVM_th than other movements, and this event will trigger the detection. The ending postures of those two activities are also similar to the ending postures of simple falls and complex falls. Therefore, those two activities are easy to be detected as falls according to AMA in our test. In all 15 false-positive cases, 13 of them are caused by run-to-lie and jump-to-sit; the 2 remaining cases are squat-to-sit. This may be caused by the fact that the subject squats down and rises up rapidly. In the fall tests, AMA gives the wrong results for 13 times. Five of them occur in simple falls, whereas the remaining cases occur in complex falls. Most of those false-negative cases are caused by the ending posture. In some tests, the subject did not lie on his or her back but sideways, and the G3 smartphone moves a lot caused by the magnitude of impact, which leads to a wrong posture prediction and an increase of wrong judgments.
Advanced Magnitude Algorithm and Home Healthcare Sentinel System Statistics
AMA, Advanced Magnitude Algorithm; HONEY, Home Healthcare Sentinel System.
When we used HONEY to do the same tests, the performance was better than for AMA. The total accuracy of HONEY was 94%, 18% higher than that of AMA. The false-positive rate was 3%, and the false-negative rate was 10%. In non-fall tests, HONEY missed 2 cases, which were caused by run-to-lie. In the non-fall test, HONEY may start detection without informing the users. Therefore, the user does not stop the operation to cancel detection, and then a false-positive occurs. In fall tests, there were 5 cases missed by HONEY. One of them was caused by not meeting the trigger condition. The reason for the remaining cases was speech recognition failure and system fault. Speech recognition caused three mistakes because of positive key word recognition misses. Failure to send the alarm e-mail also resulted in false detection. However, the total performance is much higher than that with AMA.
The response time of HONEY is fast and reasonable. Previous systems, such as iFall, 30 have implemented a smartphone-based fall detector, which will send an SMS to the prespecified cell phone number. However, the response time or detection time that this system would need was not evaluated. If we assume that the sending and receiving of the SMS can be completed in a short time, which is negligible, those systems' response time is up to the detection algorithm. The AMA method does not send any alarm message out, so the response time is decided by the constant algorithm, which sets the response time of AMA between 12 and 24 s. In all tests for HONEY, 47 e-mails, including 45 true alarms and 2 false alarms e-mails, were sent by HONEY. The average response time was 46.2 s (less than 1 min). If we add the video transmission time to the response time, according to the results for video transmission evaluation, the total time is less than 7 min and 21 min under low- and high-resolution conditions, respectively. As a result, the patient can get first aid from caregivers quickly.
In this test, all data were acquired through simulating the fall events performed by 10 young volunteers. Actually, it is dangerous and difficult to obtain the real fall data in a real fall event with older subjects. So there must be some differences between the simulated fall data we tested and real fall data. First, the fall environment is different from the real fall condition, which may have no other protection measures to ensure the safety of the experimenters. Second, the data will also be different caused by the difference in physical status between young volunteers and elderly people. Third, the instinctive reaction of young volunteers when a fall occurs is also more dexterous than elderly people. So we could not use the real fall data, which leads to a limitation in our experiment.
HONEY provides a robust fall detection method for a home-based environment. It reduces false-positive and false-negative findings significantly in terms of accuracy and makes the alert process fast enough for first aid. The consideration of privacy protection leads to a safe environment for applying our system.
Real-Life Trial
Next, we applied HONEY in a realistic environment to verify the system's performance and applicability. Eleven elderly people living in Guangdong Province, China, participated in our experiment voluntarily, including four men and seven women. The average age was 58 years old. All of them were living independently with their caregivers. For safety considerations, it was not possible that the elderly really fell in the experiment, and the children of these volunteers were also unwilling to allow us to do the fall test. So we only required the elderly to do activities of daily life. In the test, those volunteers wearing our device can do free activities, and our system monitors the actions of the elderly continually. At the end of the experiments, each participant took a short survey about their physical status and suggestions or comments to our system.
During the test, the elderly did some activities of daily life, mainly including walking, sitting down and standing up, ascending stairs and descending stairs, picking up things from the ground, lying in bed, etc. The results show that in a total of 46 times detection processes were triggered, and 4 false-positive alarms were generated. One of these false-positive alarms was detected by the TA judgment, and the rest of the false-positive alarms were caused by speech recognition failure. The false-positive rate is 9%, higher than the 3% value that we measured in the lab environment experiment. In the aspect of speech recognition, our system conducted recognition processes 14 times, with 10 successes and 4 failures. The accuracy is 71%. The average response time is 49.7 s, and this is almost the same with what we acquired in the lab trial.
After the test, we analyzed the data, and the following several reasons may explain why the results are not as good as those we obtained in the lab trial. First, one of the false-positive alarms caused by the TA checking is triggered by the action of lying in bed. Because the acceleration of lying in bed acted by one volunteer was larger than the SVM_th, our system began the detection. However, the lying-in-bed movement is very similar to falling on the ground, so the TA value is larger than 60°, which means a fall to the ground. Then a false-positive occurs. There are also a few other actions that trigger fall detection, such as ascending stairs and descending stairs, although these actions will not bring about a false-positive because the TA values are less than the threshold of fall determination. Second, with respect to speech recognition, during the whole test, recognition processes were conducted only 14 times. Although we trained the volunteers to use this function, the elderly rarely remembered to use it. Another fact was that the accented dialect has a significant influence on recognition accuracy. As a result the recognition error rate was 29%, higher than what we obtained in the speech recognition experiment. The main reason we think is that in our lab trial, all the young volunteers can pronounce the key words using a nearly standard Mandarin. However, some of the elder volunteers could not speak standard Mandarin fluently, and our system performed well with standard Mandarin. Those two reasons caused the high error rate in the real-life trial. Third, the response time is approximately the same as the lab result. The small difference is caused by the image and e-mail processes.
Obviously, the biggest limitation of this test is there was no real fall detection. As we all know, a fall is very dangerous to the elderly, and no elderly person would like to fall intentionally. So we cannot verify the performance of our system for real fall detection. This is a realistic problem to which rarely similar systems have been applied in real-life usage. Another limitation of our device is the battery lifespan. At present, our prototype device only works continuously for 4 h. This also makes it impossible to achieve 24/7 monitoring. But, we have already started to create a miniaturized device, which consumes much less energy and has a smaller volume. In the future, we will use this new device to implement a real all-day monitoring, and then the fall detection performance can be verified.
Related Work
Fall detection is an important application for elder healthcare. Fall can be associated with intrinsic or extrinsic factors. The fear of falling can lead to the deterioration of an individual's mental health, physical health, autism, and the general degradation of his or her living quality.
31
Currently, there are mainly three approaches for fall detection in the existing systems: 1. Acoustic or vibration recognition. Alwan et al.
32
achieved fall detection by using vibration recognition. It is based on the facts that human movements can cause measurable vibrations on the floor and that a fall can be detected by monitoring the vibration patterns in the floor. The method of Alwan et al.
32
can also distinguish a human fall from objects falling. Popescu et al.
33
provided an acoustic fall detector equipped with two microphones. A linear array of acoustic sensors was used in the detector. When a sound was detected, the features were extracted from this sound for pattern recognition. If those features matched the fall's features, an alarm was generated. 2. Video-based detector. By installing a camera at a fixed location, one can track a user and learn movement patterns. The systems
34,35
detect an event of a fall based on image processing that is designed to identify unusual inactivity. The Smart Inactivity Monitor using Array-Based Detectors (SIMBAD)
36
uses an infrared camera to capture blurry images of the user. A fall will be confirmed by analyzing the images later. Nevertheless, the user experience is not good because of the bad feeling of being watched. Williams et al.
14
used a distributed camera network to detect falling. To avoid the clumsy hardware and extra wearable device, they proposed a network of cameras. In this solution, the energy consumption of each camera is low, and several cameras work together in a single room to complete the detection. Because of the multiple cameras they used, this application also provides two-dimensional world coordinates of a fall. 3. Wearable sensor-based method. Several fall detection methods require the user to wear an external sensor. Usually these sensors are a triaxial accelerometer, a gyroscope, or other devices. These systems
2,9
develop different fall detection algorithms based on a triaxial accelerometer. A smartphone and wireless network were used by Sposaro and Tyson
30
and Zhang et al.
37
Brown
12
gave a series comparison of fall detection algorithms. A clear evolution of those algorithms was provided according to the different scenarios. Other applications
3,38
used for classifying movements of the human body could also detect a fall among various body movements. The ITALH project
39
implemented several prototypes about home-based healthcare systems. Their findings are similar to ours, but the methods are different. ITALH provided a full-time video surveillance that may be annoying to a patient. In HONEY, the video is short, and only when the doctor needs it is the video accessible. Moreover, the speech recognition is an important way to detect fall in HONEY, but ITLAH does not support. Another important fact is that HONEY is an on-line data process system, whereas ITLAH is off-line, and the data should be collected first, and then an analysis could be carried out.
Our HONEY has significant improvements and advantages compared with the aforementioned devices. First, our system is easier to distribute and operate than acoustic and vibration recognition. This kind of method needs many acoustic or vibration sensors, which should be installed in a specified arrangement. However, the microphones and cameras in our system can be installed more freely. Moreover, in our future work, the microphone will be integrated in the FallSensor, which make installation easier. Second, our system is more robust compared with other video-based fall detectors. HONEY and all video-based methods use a camera as a significant tool to detect falls. When the illumination is weak, the camera may be disabled and not function properly. But, the FallSensor of HONEY can work properly in a weak illumination environment, as well as speech recognition. So our system is more robust in some specific situations. Third, compared with the wearable sensor-based method, HONEY adds speech recognition and video support for the user and caregiver, which makes HONEY more accurate and functional. Another important advantage is ease of use, which is a key factor affecting the user's choice.
The wireless protocol between the FallSensor and the Homeserver has multiple options, such as Bluetooth, Zigbee, wireless identification and sensing platform (WISP), 40 etc. But, we decided to choose Bluetooth because of its wide deployment. The effective range of Zigbee is around 100 m, much larger than a house's range, which raises the chance of disclosing personal information. WISP could be a good choice because it does not need power, but it is not widely available and needs a dedicated server. In addition, the effective range of WISP is up to 10 feet with harvested radiofrequency power, which is not enough for covering the house.
In our application, we take some measures to protect the user's privacy. In the opinion of Malasri and Wang, 41 implantable medical devices will play a major role in pervasive healthcare, enabling applications ranging from patient identification to remote administration of drug treatments. However, the relay and physical attacks in internal identifications, denial of service in intermediate drivers, and the wireless reprogramming ability of the International Code of Diseases have not been well addressed in the literature. Some countermeasures, such as data encryption, data access control, etc., have been proposed. The data security is beyond the scope on which we focus. In the future, we can use more protection measures in our system with more professional technology.
Remaining Issues and Challenges
Through the experimental study, we conclude that the following three remaining issues need further investigation. First, speech recognition is an important part to detect a fall. But, the results of lab trial and real-life application show that dialectal voice recognition accuracy is not as good as the standard Mandarin. Although Lu et al. 42 have given a hybrid model of the hidden Markov model and the BP neural network to detect dialectal small-vocabulary, the corpus (a set of key words for training the model) and training processes could be serious problems. Thus, it is hard to really use in a real-time recognition process, like fall detection. So a recognition system that is compatible with a variety of dialects is needed in a real setting. Another problem is where the microphone is located. According to our research results, the closer the microphone is located to the user, the higher recognition accuracy is. So we intend to integrate the microphone into the FallSensor. The following result could be higher complexity and power consumption.
Second, the power management is another issue on which we are focusing. The power consumption directly influences the lifespan of continuous monitoring without battery recharge. In addition, the results of a questionnaire following the real-life trial showed that a half-year lifespan at least with no battery recharge is acceptable. At the same time, miniaturization reduces the battery capacity, which aggravates the power supply. Based on our research, the following methods may relieve the power consumption. First, develop an algorithm using fewer acceleration data will reduce the energy consumption of the accelerometer and processor because the sample frequency could be decreased. Second, if we leave the data to the Homeserver to handle, the wireless transmission will consume most of the energy of the entire device. Third, if we integrate the microphone into the FallSensor for enhancing speech recognition accuracy, extra energy consumption will be added. So how to balance those issues is also tricky work.
Third, an individual SVM threshold for each user could make a system more adaptable. In our system, the threshold is fixed before real application, and this threshold could not satisfy everyone's requirements because the physical condition and activity pattern are not all the same in most situations. An adaptive and autonomous learning mechanism may resolve this problem. Adding some algorithms such as the BP neural network and generic algorithm for an adaptive threshold determination can satisfy different cases of all users. But, the algorithm running time and the computing resource requirement may be a big problem for the processor capability and also require extra power consumption. So a simple adaptive and autonomous learning algorithm will be the next target.
Conclusions and Future Work
In this article, we present HONEY for fall detection in a home-based environment. We used a triaxial accelerometer to trigger detection. In order to improve performance and accuracy, we deployed speech recognition and images to reduce the false-positive and false-negative rates. Privacy protection was also considered in HONEY. Finally, we performed a comprehensive evaluation of HONEY. Our evaluation showed that HONEY provides reliable, safe, and responsive fall detection in a home environment.
As the next step, we intend to extend HONEY by making use of body area networks 43 and online diagnosis. Body area networks are made feasible by novel advances in lightweight, small-size, and intelligent monitoring sensors, such as those for blood glucose, blood pressure, electrocardiogram, electroencephalogram, electromyography, etc. The patient's identification can be stored in the radiofrequency identification tag, and the healthcare provider can retrieve not only the medical history but also the physiological information in real time. Thus, a medical staff can make a more accurate and timely diagnosis using the information, which can be used by data mining to make a diagnosis of potential disease.
Footnotes
Acknowledgments
This work is in part supported by Tongji University, Introduction of Innovative R&D Team Program of Guangdong Province (award number 201001D0104726115) and the Wayne State University Wireless Health Initiative fund. We thank the anonymous reviewers for their valuable comments and Zhiying Lin for discussions during the early stages of the project. We are also grateful to the volunteers, especially the 11 senior volunteers, in our experiments.
Disclosure Statement
No competing financial interests exist.
