Abstract
Exploiting dynamic spatial and temporal features of location information for robot modeling is of great importance in many real applications. It has gained increasing attention in the era of the Internet of Things (IoT). However, successful modeling and accurate localization for robot in indoor environment is still a challenge, where the environment factors are complex and unpredictable, such as signal noise, obstacles and spare fingerprints. Existing studies usually employ data driven and learning based models to capture spatial and temporal features for robot location estimation, modeling dynamics of robot and make robot decision. However, the modeling and localization performance is not satisfied. In this paper, to address above challenges, a novel deep learning framework called multi-faceted deep learning based dynamics modeling and robot localization learning (DMLoc) method is proposed. Specifically, a localization attention module is designed to capture the features from original fingerprints and optimized fingerprints information. Then, a multi-faceted localization module is proposed, which integrates extraction model and optimized model with long short-term memory (LSTM) and gate recurrent unit (GRU). Moreover, a multi-feature fusion layer is designed to fuse the extracted features and generate localization results. Extensive simulation results show the efficiency of the proposed DMLoc.
Introduction
Capturing the dynamics of robot and making decision are the cornerstone of robot systems for task-specific applications. In recent years, wireless indoor location modeling for robot has attracted significant research interest from both academia and industrial areas. Existing studies usually employ data driven models to capture spatial and temporal features for location estimation. Deep learning based dynamics modeling and robot localization learning are classical location prediction and decision problems, which aims to predict future locations of the robot based on historical observations. It has been widely used in many real-world applications, such as location based service, recommendation services, robot localization and decision-based services.
This paper aims to locate and model the dynamics of robot with WiFi signals from the carried smartphone. WiFi based localization is typically categorized into two types, namely fingerprinting [1] based and learning based [2] methods. Fingerprinting [1] methods collects measurements of WiFi signal as fingerprints for each spatial position. The localization is thus enable via characterizing a position with a signal pattern (e.g., a vector of signal strength indicator (RSSI) from different WiFi access points) [3]. Learning based methods [2] adopt deep learning techniques to capture the correlations of continuous locations and predict the future location of users. Some existing studies have been proposed to solve the localization issues. However, due to complex environment factors including some unpredictable noise, signal fluctuation and presence of obstacles, accurate indoor localization is challenging.
In the past decade, there have been may attempts to address these problems. In particular, data driven approaches have been extensively exploited in indoor localization [5, 15]. But these approaches do not consider the correlations of location information from temporal dimension and spatial dimension. As RSSI measurements are collected at fixed points deployed in continuous space. It is observed that measurements from neighboring locations and time steps are correlated with each other. Therefore, extracting the spatial-temporal correlations of sensed data will help to improve the localization accuracy. Recent advances in deep learning enable promising results in modeling the correlations of location information and capture the temporal and spatial features [6, 20]. Existing deep learning approaches usually adopt the convolution neural network (CNN) to model the spatial dependency, and recurrent neural network (RNN) or its variants long short-term memory (LSTM) and gate recurrent unit (GRU) to extract temporal features [7]. However, existing deep learning based approaches may not adequately model the spatial features as the convolution is typically based on Euclidean distance to capture spatial correlation.
Although deep learning based localization techniques has achieved excellent performance. However, these existing localization methods cannot effectively solve the following three problems: In indoor environments, fingerprints are influenced by factors such as indoor obstacles, uncertainty of signal and equipment precision, which make the values of fingerprints are uncertain. Fingerprints are too sparse for effective location calculation. The limits of monitoring environment and hardware cost make the amount of fingerprints insufficient for accurate localization. Most previous work focus on the case that fingerprints are sufficient, and seldom works consider this problem. The sensed data for localization show obvious correlation in both spatial dimension and temporal dimension. How to explore nonlinear spatial-temporal data to discover its inherent patterns and make accurate location predictions is essential to improve localization accuracy.
Aiming at the above three problems of indoor localization, in this paper, we propose a
Our contributions are summarized as follows. We design a novel data-driven deep framework for localization application, which incorporates with deep learning and data management techniques. The proposed framework efficiently solves the problems including uncertain fingerprints and sparse fingerprints in localization. We propose a multi-faceted framework which takes original and optimized fingerprints as input to improve the fingerprints quality and localization accuracy. We integrate cosine similarity and attention mechanism to optimize the uncertain fingerprints and aggregation information for localization. We conduct experiments to validate the performance of our proposed model. The experimental results show that our model can achieve excellent localization performances.
The rest of the paper is organized as follows: Section 2 gives the related work; the proposed model is introduced in Section 3, and experimental evaluations are presented in Section 4; Section 5 concludes the paper.
Related work
In recent years, the localization task has attracted much attention for its crucial role in location based services and applications.
Barsocchi et al. proposed a Principal Component Analysis (PCA) and KNN based localization method to extract ideal samples and reduce the impact of environmental factors on the localization accuracy [11]. Li et al. designed an indoor localization method which collected data from a large-scale wireless network environment an utilized. PCA technology to reduce the original features and reduce the computational cost. Salamah et al. [12] proposed a long-term memory (LF-DLSTM) deep learning framework based on local features, which efficiently reduced the influence of noise sensed data. Dong et al. [13] proposed to measure the RSSI measurements between the moving target and the fixed sensor nodes, and chose RSSI measurements with higher weights. Zafari et al. [14] proposed a particle filter based localization approach to reduce the impact of environmental noise on localization accuracy. Hsieh et al. [16] proposed a deep learning based indoor localization framework and trained the fingerprint datasets to predict the location of the interested target. Javadi et al. [17] designed a support vector machine (SVM) based localization framework to solve localization task. Jondhale et al. [18] adopted Kalman filter to deal with the uncertainty existed in fingerprints and an improved real-time tracking performance.
In recent years, machine learning and deep learning techniques have promoted the development of localization applications [4, 8, 10, 19, 23, 25, 30, 31, 32]. Li et al. [21] designed a deep residual network to model important features from a fingerprint database for improving fingerprinting localization. Ren et al. [9] proposed a quantization based localization approach, which employed quantization techniques to optimize the localization performance. Clancy et al. [22] proposed a robust localization method based on neural network, which used function coverage information to solve the uncertain problems in localization procedure. Jang et al. [24, 26] adopted convolution neural networks (CNN) to conduct accurate localization operations. The CNN based method can automatically learn the location patterns and reduce the computation consumption. Luo et al. [27] employed recurrent neural networks to implement Wi-Fi fingerprinting localization, which adopted encoder-decoder and stacked encoders of to obtain accurate feature representation for localization. Li et al. [28] proposed a deep learning neural network model for localization. Lemic et al. [29] designed an auto-encoder structure to process received noise signal strength, and adopted random forest regression, multi-layer perceptual classification, and multi-layer perceptual regression to achieve accurate localization.
Methodology
The overall architecture of the proposed model is shown in Fig. 1, which consists of two components, the attention module generates optimized fingerprints, and the aggregation module uses cosine similarity and LSTM to capture features for localization and locate the target. In the rest of the section, we will describe the designs of the framework in detail. Figure 2 describes the procedure of the model.The original fingerprint is first fed into the localization attention module to improve the original fingerprint quality and obtain optimized fingerprints, which is then fed into the multi-faceted feature extraction module to capture the multiple correlation. Finally, a fully connected neural network is applied to obtain the final output.

The architecture of proposed DMLoc. It consists of four components, namely localization attention, multi-faceted localization, fully-connected layer and output layer.

The flow chart of the model.
Attention mechanism is efficient to process signal data. To further improve the fingerprint quality, we propose the localization attention module to optimize the fingerprints. In the real application of indoor localization, due to the various irregular obstacles within the localization environment, the fingerprint values collected from PRs are usually irregular, which causes difficulties for accurate localization.
In real applications, the number of access points (APs) are limited, which makes the fingerprints for training sparse, so we further improve the original fingerprint quality and obtain optimized fingerprints using attention mechanism. The principle of the localization attention mechanism is to calculate the matching degree between the current input fingerprint vectors and the output vectors, and the higher the matching degree, the greater the attention score is. Based on this principle, we optimize the original fingerprint based on the standard fingerprints database. The attention mechanism is conducted into two stages, that’s similarity comparison and computation of attention score. We will describe the procedure of each stage in detail.

The architecture of localization attention module. In contains of two parts, namely cosine similarity and attention score calculation.
In the localization attention mechanism, we represent the fingerprint database as X = {X1, X2, X3, ⋯ , X
n
} ∈ Rm×n, where n is the number of reference points (RPs), m is the number of access points. And each fingerprint vector is represented as
We first calculate the similarity degree of each collected fingerprint X
i
and the corresponding standard fingerprint
For each fingerprint vector X
i
∈ X, we calculate its cosine similarity with the standard fingerprint
The optimized fingerprint set
In order to learn more location features and thus further improve the localization performance, we use both the original fingerprint and the optimized fingerprint as the input of the multi-faceted localization module, as shown in Fig. 4. Based on the original and optimized fingerprints, we design different methods to extract their features respectively. Considering the great amount of input fingerprint data, in order to avoid the problems of gradient explosion and network degradation, we adopt a residual layer in the network.

The architecture of multi-faceted localization module. It consists of three components, the extraction module, the optimized extraction module, and the fusion layer.
To capture the correlations in optimized fingerprints, we design a Long Short Term Memory (LSTM) based optimized extraction module, which processes data with fine-grained and extracts its features. The LSTM is composed of forgetting gates, input gates and output gates, the three gates jointly control the propagation of information so that the internal state can capture key information at a certain time. In real applications, the RSSI values of fingerprints are mainly correlated with the distance between APs and RPs, closer distance can bring better signal quality and the interference from noise is less. Inspired by observation, we aim to use the fingerprints from nearby the target to obtain more accurate localization results. The designed LSTM unit will take these observations into consideration. It remains more accurate signal information, while the signal measurements from long distances will be relatively weak and should be forgotten. At a certain time step t, let f
t
, i
t
and o
t
represent forgetting gate, input gate and output gate, respectively, the hidden state
Let F O denote the optimized extraction function, the output of the optimized extraction module is represented as follows,
For the original fingerprint, we employ the gated recurrent unit network (GRU) to extract features of fingerprints. Compared with the design of LSTM, GRU only contains update and reset gates, which is a simple and flexible model. However, the original fingerprint is usually noisy, using a simple and efficient way to process the original fingerprint will reduce the computation cost of the model. At a certain time step t, let z t and r t represent the update gate and reset gate, respectively, the hidden state h t of the extraction module is formulated as,
Let F E denote the extraction function, the output of the extraction module is represented as follows,
Having extracted the features of original fingerprints and optimized fingerprints via extraction module and optimized extraction module, we obtain the outputs H
E
and H
O
, respectively. Next, we design two fully connected layers to further learn the features. Let F
FC
denote the fully connected function, the output the module is represented as follows,
Considering the problem of gradient explosion in the network, and the degradation problem caused by the model depth increasing, we further design a jump connection layer. In particular, the output of the extraction module and the optimized extraction module are connected with the output of the fusion module through the jump connection. After the model is fused, we obtain the location prediction result. Let F
L
denote the location prediction function, the output of the module is represented as follows,
In experiments, we adopt the widely used root mean square error function as the loss function for model training, as follows.
We conduct extensive experiments to invalidate the performance of the proposed DMLoc method. We compare the localization performance of the proposed method with other baselines. Moreover, we conduct ablation studies to explore the effect of each component on the performance of the model.
Experimental setting
In the experiments, we deploy 1000 reference points, RSSI measurements from all APs at each RP are sampled and form fingerprint vectors, especially the RSSI measurements from all APs at each RP are formed one fingerprint. The monitoring area is a 50m × 50m rectangle area, to simulate the noise existed in measurements, we add a random noise followed with the Gaussian distribution, its range is within [-1.5dBm, 1.5dBm]. We adopt Adam optimizer and set epoch to 1000. DMLoc is implemented based on Python language and Tensorflow framework. We compare our proposed model DMLoc with some alternative methods, such as LSTM and GRU.
Experimental results
In this section, we compare the localization performance of the proposed DMLoc model with the baseline methods, LSTM and GRU. Furthermore, we explore influence of the system parameters such as training sample numbers, selected reference points, moving velocity on the localization performance. Table 1 describes the localization errors of the three methods. From the experimental results, we draw the following conclusions: It is observed that the deep learning based localization method, such as LSTM and GRU can achieve promising localization results. For all the cases, DMLoc outperforms other deep learning based methods, which indicates that our multiple faceted mechanism can better capture the features within fingerprints and improve localization accuracy. DMLoc achieves almost the best results among the previous state-of-the-art models, which indicates that the strategy of combing attention mechanism with LSTM and GRU based feature extraction can better explore the complex correlations of fingerprints.
Localization error of different methods
Localization error of different methods
In this section, we study the influence the varying training sample numbers on the localization accuracy. Figure 5 shows the localization results of different training sand ample numbers. From the figure, we can see that as the increasing of training samples numbers, the localization error also increases. Although the localization accuracy of our model also decreases, our model still obtains the best performance. Moreover, our localization model can achieve stable performance under different cases.

Localization error of different methods.
This section studies the impact of reference points deployment on the localization performance. We construct two typical experimental sites. site 1 is meeting room, there is a meeting table and some chairs in the center of the meeting room. Moreover, there is an air conditioner in the corner of the room. site 2 is a hall of experimental bundling, there are no obstacles in the center of the hall, and there is a lift in the corner of the hall. Four different fingerprint maps have been created for each site. Figures 6 and 7 plot the impact of reference points deployment on the localization performance. When reference points are located in the corner of the room, the localization accuracy of the proposed model is the worst. The reason is that the air conditioner and lift in the corner affect the RSS signal. In contrast, the localization results in the center is much better, that’s because the signal strengths are strong in this area and most access points can be detected and measured.

Impact of reference points deployment.

Localization error distribution under different reference points deployment.
In this section, we further study the impact of user’s moving velocity on the localization accuracy. In the experiments, the user is simulated to move at varying velocity. As shown in the Fig. 8, the localization error is increasing slightly as the velocity increases, while the whole trend is stable. The reason is that the greater velocity makes the signal difference from the access points at different reference points robust, which contributes to improving the localization precision.

Impact of walking velocity.
In this section, we explore the localization results under different walking trajectories. From Fig. 9, we observe that the proposed localization method can follow the real trace well even during the frequently changing area. For the liner trace, the localization accuracy is much better, the maximal localization error is less than 0.5 units. However for the curve trajectory, our localization model can capture the continuous direction changing and achieve stable localization results.

Examples of localization results. The blue line is the ground truth, the orange line is the estimated trace.
In this section, we conduct ablation study to verity the effect of each component to the final localization performance of the model. We term the variants of DMLoc as follows: DMLoc/E: remove the extraction module from the DMLoc. The input of the module is only X. DMLoc/O: remove the optimized extraction module from the DMLoc. The input of the module is only
Figures 10 and 11 shows the localization results of various variants and DMLoc. We can see that among all modules, optimized extraction module (DMLoc/O) has the most important influence to the localization performance. Without optimized extraction module, the localization error increases from 0.22 to 0.48, 0.5 to 0.75, 0.71 to 1.2 and 2.81 to 3.7, when the number of training samples is 250, 500, 750 and 1000, respectively. We can conclude that optimized extraction module can efficiently solve the noisy and sparse fingerprints problem, capture corrected location features from optimized fingerprints and obtain excellent localization results.

Localization error of different variants and DMLoc.

Localization error distribution of different variants and DMLoc.
From the figure 10, we can also see that extraction module (DMLoc/E) has the second greatest influence for localization prediction. Without extraction module, the localization error increases from 0.22 to 0.32, 0.5 to 0.6, 0.71 to 0.8 and 2.81 to 3.2, when the number of training samples is 250, 500, 750 and 1000, respectively. This observation shows that extraction module are essential for the success of the proposed deep learning network, it can also capture useful features from original fingerprints.
Moreover, from the ablation study, we can also conclude that the multi-faceted localization mechanism integrates original and optimized fingerprints as input, and exploit features from two aspect can efficiently solve the problem of noisy fingerprints and further improve the localization performance.
In this paper, we proposed DMLoc, a data driven and multi-faceted deep learning framework is proposed to improve the localization accuracy. In particular, we used two modules, that’s extraction module and optimized extraction module to jointly extract features from original fingerprints and optimized fingerprints. Moreover, we employed cosine similarity to weight the importance of fingerprints from different access points. Furthermore, the output of multiple extraction modules were fused to general the final localization results. Finally, we implemented extensive simulations, the experimental results invalidate the performance superiority of our proposed model in terms of localization accuracy. In the future, we aim to apply the proposed model in large-scale location-based data mining application.
