A linear state space model is of two equations, one of state space and the other of measurement. Estimation methods for the parameters of the model have been developed from the historic Kalman filter method. The Bayes estimation has also been used under a variety of conditions on the parameter space. We explored the availability of the Bayes method for parameter estimation with no constraints on the parameter space and found that the estimation for the state space is acceptable as long as the priors are not vague on both the state and the parameter space. We also investigated the model where the measurement matrix is contaminated with noise and found that the estimates for the state space were more accurate than those by the methods in literature. We made remarks on extended applications of the Bayes method for the linear state space model where a variety of constraints are imposed on the parameter space.
Consider a linear state space model consisting of two equations as
where is a state vector, is a measurement vector, and are and coefficient matrices, respectively, and and are noises with and where stands for ‘-variate Normal distribution’. The state changes with time and the measurement is made on the state at every time point. The first equation is called a state equation and the second a measurement equation. and are called system and measurement matrices, respectively.
This model is also called a dynamic linear model which was introduced by Kalman [12] and Kalman and Bucy [13]. The model was introduced as a model for aerospace-related research and its application has been extended to a wide range of research fields for analyzing data from economics, medicine, earth sciences, and signal processing among others [4, 5, 6, 8, 11].
The Kalman filtering method is one of the most prevalent methods for the state space model and related research is abundant in literature. In essence, the model is to be analyzed in a two-step recursive process. We first make a one-time ahead prediction of the state and then the state prediction is updated based on the current data. The process we mentioned can be regarded as the Bayesian method and this viewpoint has been reflected in literature by many researchers. [16, 19] showed that the Kalman filter is derived within the Bayesian framework. Chen [20] derived the Kalman filter within a maximum a posterior (MAP) method.
One of the key assumptions in Kalman filtering is that all parameters are known except the state. However, in many real applications, the measurement matrix is not deterministic for the linear state space model [2, 7, 15, 17, 18]. Some stochastic assumptions were made on the measurement matrix in this case as described in the next section.
A main concern in the estimation of the measurement matrix and the state is that the two unknowns form one term in the measurement equation and that this may make the state unidentifiable. We found in this work that it is estimated by a Bayes method with good quality as long as at least one of the priors on the state and the measurement matrix is informative. For the case that the measurement matrix is contaminated with measurement noise, the Bayes method is shown to produce at least as good results as the existing methods in the context of estimation accuracy.
This paper is of 6 sections. Some previous works that are closely related to our work are briefly described in Section 2. In Section 3, we derived implicitly the formula of the posterior distribution of the state and the measurement matrix with Gaussian priors assigned on them individually. Numerical experiments were carried out in Section 4 by applying Gibbs sampling for parameter estimation and we checked on the sensitivity of the estimates on the priors. We also discussed the non-informative priors for the state. In Section 5, we considered the case where the measurement matrix is observable with error and compared our results with those in [18]. The experiments strongly suggest that the Bayesian approach is also available for the case where the measurement matrix is observable. The paper concludes in Section 6 with some discussions and remarks.
Related works
Yue and Shi [7] assumed that the measurement matrix is unknown but is limited to a special form. For instance, only the last row of the matrix is unknown and the entries for the remaining part are fixed as either 0 or 1 in accordance with a specified model.
Bayesian approaches have been applied for estimating the measurement matrix in literature. For instance, Wills et al. [1] developed a fully parametrized linear state space model by the Gibbs sampling algorithm. However, this was under the limitation that the parameters are time-invariant. We extended their work to the model where the measurement matrix is variant over time.
We also considered a linear state space model where the measurement matrix is contaminated with measurement noise as considered in Ra et al. [18]. This contamination is possible under a certain situation of data collection, which is not rare in practice and will be described in detail in Section 5.
Bayesian approach with measurement matrix unknown
Under the assumption that all the parameters are known, the Kalman filter method is used to estimate the state. However, if some parameters are unknown, the Kalman filter method per se is not available. We consider the case where the measurement matrix is unknown. For this model, we apply a Bayesian method to estimate both and . Model (1) can be expressed as
For the initial state , we assign a multivariate normal prior,
As for the states , the posterior prediction of the mean and covariance, and , of given the data are used as the hyper-parameters of the normal prior of as
Note that, the posterior prediction of the mean and covariance of are computed as
Here, and are respectively the posterior mean and covariance of the state at time based on data . This Bayesian procedure is the same in spirit as for the ordinary Kalman filter [20].
Before assigning the prior distributions on , we transform into a vectorized form by concatenating the columns of . That is,
where is the -th column of . Then the measurement Eq. (1) can be expressed as
where is an identity matrix of rank and denotes the Kronecker product.
We assign the multivariate normal prior on as
where and are the prior mean and covariance matrix at time , respectively.
As for the posterior distribution for and at time ,
Since the posterior is not obtained in an explicit form, we used the Markov chain Monte Carlo (MCMC) method, especially, the Gibbs sampling algorithm [3] to compute the posterior distribution.
To obtain samples by the Gibbs sampling algorithm, we need to derive the full conditional distribution for and from the posterior distribution. The full conditional distribution of is derived from , which is the same as the posterior obtained by the maximum a posterior(MAP) method used for the ordinary Kalman filter [20]. We derive the full conditional distribution for from Eq. (2) as
where
Similarly, the full conditional distribution for is derived in the form of a multivariate normal distribution as
In the Gibbs sampling, we sample as where is from the full conditional distribution of , . The sample of is obtained as where is from the full conditional distribution of , . As for the initial values, and , they are obtained as
[t!] Gibbs sampling algorithm for Bayes estimation of and
Burn in the first samples and calculate posterior estimates
,
end
Before we estimate and , we need to check the convergence of the samples. An initial part of the sample may not be appropriate for the desired posterior distribution. Therefore, we remove an initial part of the sample which is referred to as the ‘burn in’ part of the sample. A simple way to determine the burn-in part is to draw a trace plot of the sample and burn in an initial part of the sample before the stationarity of the sample is observed. Several burn-in diagnostic methods were developed including that by Geweke, Gelman-Rubin and Heidelberger-Welch [9]. After burning in the first samples, the posterior mean and covariances of and are computed. This procedure is summarized in Algorithm 3.
Numerical experiments: Effect of priors
In this section, we will investigate the effect of the priors upon the estimates of the state and the measurement coefficient by numerical experiments and suggest a guideline for priors for a safe estimation process.
We set values for the parameters of model Eq. (1) and the initial value of the state as in Table 1 to generate data for our experiment. We denote by the true initial value of the state and let where the first two coordinates are the and coordinates of the target on a plane at time , and the target moves with the velocity from the initial position until .
The parameter values set for data generation in model Eq. (1)
Parameter values
,
,
Initial state value
Data size
Number of data set
Prior distributions on the state and the measurement matrix. In the table, , ‘diag)’ denotes the size diagonal matrix with components and ‘diag)’ denotes the size diagonal matrix with components
Case
1
2
3
Informative
Non-informative
Informative
Informative
Informative
Non-informative
diag
diag
diag
In addition to the specification on the parameters and the state, we impose priors on the state and the measurement matrix with different levels of uncertainty as in Table 2. We consider three cases of the uncertainty for the state and , both informative and only one part informative. The priors are a reflection of the domain knowledge, represented by more informative distributions as we get more domain knowledge.
We will look into the case that both are non-informative separately. In the experiment, we take as constant over time. In estimating and , we used an iterative process applying an MCMC method with a burn-in of 1000 iterations.
For performance evaluation, we used the mean distance error (MDE) defined for as
where is the th component of based on the th data set. This MDE is the mean of the estimation error of . As for , its estimation is evaluated by the MDE given by
where is the entry of the matrix based on the th data set. Note that is the average of the Frobenius norms of .
In the evaluation, the result by the ordinary Kalman filter is included as a reference since is assumed known in the Kalman filter method. The MDE’s are compared among the three cases of uncertainty levels of the priors, and the comparison is summarized in Fig. 1 and 2 regarding and respectively.
MDE() for 3 cases of priors. The three cases of prior assignment on and are: informative priors for both, non-informative and informative priors respectively, and informative and non-informative priors respectively. The MDE by the ordinary Kalman filter is given for reference.
In Fig. 1, it is strongly indicated that a high uncertainty (case 2 in Table 2) about the initial state is reflected in unstable estimates of at an early stage of the estimation process which ends up with an estimate as good as those in the other cases of uncertainty. This is as expected with non-informative priors. The estimation error decreases as data accumulates enough so that the initial uncertainty due to a vague prior may be dominated by the information in data. The instability, if any, at an early stage takes place in various forms in size and time depending on the information amounts in the prior and the data. The shape of instability in the figure is typical when the prior is of no or little information. It may also have some fluctuations at the beginning with the size decaying down at a rate as data accumulates.
MDE() for 3 cases of priors. See the caption of Fig. 1 for the prior assignment.
Mean distance errors of and with the non-informative priors assigned on and respectively in Table 2. The upper graph is of MDE() and MDE() in the other graph.
It is noteworthy in Fig. 2 that MDE() is large and unstable when the prior of is non-informative (case 3 in Table 2) while the estimate of is stable and as good as those under the other cases of the priors. This phenomenon is possible since many different values of are possible for given values of and . One concern though is that an accurate estimate of with a poor estimate of may not be welcome in the context of statistical modeling.
When both of and are with non-informative priors, we could anticipate a bizarre estimation result as shown in Fig. 3. The scales are quite different between the graph in Fig. 1 and the upper one in Fig. 3. It is about 60 for the latter graph while it is about 10 for the former graph. If we ignore the case (case 2 in Table 2) of the non-informative prior on , the scale difference is much larger. We also see an analogous phenomenon with respect to MDE().
We further investigated the case where the priors on are non-informative with a wide range of uncertainty levels on as listed in Table 3. The prior on is assigned as informative. The estimation results are displayed in Fig. 4. The graph in panel (b) is based on one set of data rather than the average of the results based on 30 data sets in order to show a detailed description of the estimation process. The results for the other data sets were more or less the same as that in the figure.
Prior distributions on . In the table, ‘diag)’ denotes the size diagonal matrix with components
diag
diag
diag
diag
The results of simulation under non-informative priors on in Table 3. (a) The MDE of under each priors. (b) The estimates of for a data set. The solid line in (b) indicates the true state of and the circled points indicate the estimates at the beginning. The priors on are assigned informative as in Table 2.
We could see in this figure that the estimation was bizarre when is given as the first one (denote it by ) in Table 3. Panel (b) in the figure displays the estimates of the () coordinate of . Note that the true initial target location and velocity is (40,40,1) while they are set as (100,100,5) in the prior. As for the prior , the estimate of starts at around (50,60), then moves smoothly towards the origin until the x coordinate reaches about 25, then the trace changes its direction drastically wandering for a short while (see the sharp up-and-down in panel (a)), then the trace moves slowly towards the origin along the actual trace line of the target. This bizarre movement of the estimate was not shown for the other values of in Table 3 that are of a much higher uncertainty level than . This is an apparent indication that vagueness in prior helps for a good level of estimation accuracy when imposing prior distributions with little or no prior information. Although the initial target location was set far away from the true location in the prior, the vagueness, well incorporated with data, took us near the target location a few iterations after the beginning of the estimation process.
In target localization problems, we may impose priors on subject to the signal reception system and its function reliability. In this context, the priors on could be informative in real world. Furthermore, may be given as a function of data and parameters as considered below. In this case, the prior would be relatively easy to determine. We will address this issue in comparison with a traditional estimation method as proposed in [18].
When is contaminated with measurement noise
In target localization problems, we use signal data that are received from the target and there are several signal reception methods. The data may be collected as angles of signal arrival to sensors from the target, as time difference of signal arrival, or as frequency difference of signal arrival which utilizes the frequency Doppler shift phenomenon, among others[14]. In case we use the method of time difference of signal arrival for data collection, we can form the measurement Eq. (1) as consisting of the target location as an unknown parameter and the observed values of the distances between sensors and the time differences of signal arrival. We can easily imagine this phenomenon if we consider a triangle of the three points of two sensors and one target. An explicit expression of the equation is given in Eq. (11).
Ra et al. [18] applied an optimization approach by using a cost function for target localization when is contaminated with measurement error. This method is called the non-conservative robust Kalman filter (NCRKF). They considered the model in Eq. (1) with replaced with as
with additional conditions that
They derived the formulae for update and prediction of as
() Update:
() Prediction:
Provided the measurement matrix in model Eq. (1) is observed with error, we may take the observed , , as the hyper-parameter as
We will call this prior an empirical prior.
Note that in Eq. (4) can be expressed as a function of as in Eq. (9) below:
Equation (8) is obtained by the matrix inversion lemma[10]. It is noteworthy that Eq. (9) is the same as the state estimate at time in the ordinary Kalman filter(OKF). However, is unknown unlike for the OKF. Therefore, the estimation error by the OKF method is generally smaller than that by the Bayes method with the posterior distribution in Eq. (2) and we have checked this through numerical experiments in Section 4.
The in Eq. (5) can be expressed, under the conditions in Eq. (5), as
where and . Note that and .
The update equations for are different between the NCRKF and the Bayesian methods as shown in Eqs (9) and (10), respectively. The first two terms on the right hand side of Eq. (10) are analogous to those for in Eq. (9), but has another term,
which may be regarded as an error term due to the measurement error and its related function . What’s worse is that this error term is influenced by . In case of target localization, when the target is far off from the sensors, would be a main source of instability of estimation at the initial stage of the estimation process. This will be demonstrated in numerical experiments below.
Numerical experiment for performance comparison
In this subsection, we compare the performance between our Bayesian approach and the estimation approach as proposed in [18] in target localization. Han et al. [15] considered a target localization problem where the signal is received from the target by observing the time difference of signal arrivals on different sensors. Suppose we have 5 sensors with their positions denoted by for 0, 1, 2, 3, 4 where the main sensor is labeled 0. Denote by the distance between the target and sensor and by the distance between sensors 0 and . Let
and denote the target position by on a plane. For notational simplicity, we ignore the time index in these notations.
Then we get the following equation by applying a cosine law to the triangle formed by the positions of the target and the sensors and on a plane:
If we denote the left-hand side of this equation plus a measurement error by and let , then we have a measurement equation as
where
Here, and represent the velocities of the target and sensors, and and their accelerations, respectively. is the observation of and is the zero mean error which satisfies Eq. (5).
With the time interval , the state equation is given by
with
and
A detailed description of the parameters are explained in [15].
For numerical experiments, the parameter values for the state and measurement equations were specified as in Table 4 in [15]. We generated data under the same specification for our experiment for performance comparison between the NCRKF and Bayes methods.
Specification for data generation from the model consisting of Eqs (11) and (12)
Target
Velocity 20(m/s)
Direction
Initial position (5000,0)(m)
Sensors
Initial velocity 270 (m/s)
Direction
Initial position of leader sensor (0,0) (m)
300 m for 1, 2, 3, 4
Noise variance
Range difference
Sensor position ,
Data
Time interval 0.04
Number of time points 450
Number of data sets 30
Mean absolute error of the four components of , the and coordinates of the target position and the target velocity. The results by NCRKF are given in black as 95% confidence intervals and those by the Bayes method in red as 95% credible intervals.
The priors for and are given as
and
where
The values of , , and are taken in accordance with the specification in Table 4.
The initial value of was generated for the NCRKF method from the distribution . Each of the 30 data sets generated was analyzed for the estimation of and the results are summarized in Fig. 5 as 95% confidence intervals for the NCRKF and as 95% credible intervals for the Bayes method. The interval types are different due to the difference in the estimation methods.
We can see in the figure that the Bayes method performs better than the NCRKF. In particular, the estimates by the NCRKF are quite unstable at an early stage of the process while those by the Bayes method look well under control. If we look at the coordinates of , we can see in Table 5 that the mean absolute error by the Bayes method is about 0.84 times as large as that by the NCRKF on average. We also compared the MDE values of between the two methods and the results are in Fig. 6. The Bayes estimates of are closer to the true value than the estimates by the NCRKF where the estimates of are the same as .
Comparison of the mean absolute error of the -coordinate of the target position with the hyper-parameter in Eq. (5.1)
Time()
2
4
6
8
10
12
14
16
18
Bayes method
54.66
48.61
45.49
34.85
29.39
21.79
12.23
7.89
1.83
NCRKF
76.86
76.66
57.51
37.65
35.36
24.77
12.65
7.70
2.21
Bayes/NCRKF
0.71
0.63
0.79
0.93
0.83
0.88
0.97
1.02
0.83
Comparison of the mean absolute error of the -coordinate of the target position for and
Time()
2
4
6
8
10
12
14
16
18
69.44
72.22
62.49
35.13
36.50
21.19
11.86
7.98
2.29
59.50
51.13
55.00
31.29
31.15
23.03
11.24
7.31
1.97
54.66
48.61
45.49
34.85
29.39
21.79
12.23
7.89
1.83
51.66
79.14
84.78
120.41
88.44
48.41
18.48
22.15
5.90
MDE values for . MDE() is in red and MDE() in black.
Mean absolute error of the four components of as in Fig. 5 with the hyper-parameter in panel (a), in panel (b) and in panel (c).
The comparison above has been made with the prior on with the hyper-parameter as in Eq. (5.1). The comparison is affected by the covariance structure . We considered three other structures for as given by
The comparison results are summarized in Fig. 7. If we denote the in Eq. (5.1) by , the performance of estimating was the best with the hyper-parameter among the four values and the worst was with . We can also see in Table 6 that performance is the best with and worst with . This result indicates that the prior on with a high uncertainty may yield estimates of with a lower accuracy than the estimates by a traditional method such as the NCRKF. This result is quite intuitive in the sense that a high uncertainty on the observed would make the observed value less useful. In the same context, a lower uncertainty on would make the estimates of more influenced by . This result is observed in panel (a) where the estimation accuracy is more or less the same between the NCRKF and the Bayes method. So as far as the empirical prior is concerned, it seems desirable that the uncertainty level is low but not too low.
In the Gibbs sampling used for the Bayes estimation of and , we set 1500 and 500 at an early stage of data accumulation and then we switched to 1100 and 100 to reduce the computing time with as good results as those without the switching.
Concluding remarks
In Bayes estimation for the state and the measurement matrix, we found that the state is estimated reasonably well as long as the priors are informative for at least one of the state and the measurement matrix. In reality, it is often the case for target localization problems that we have little or no information about the initial state. In this case, it is imperative that the prior on the measurement matrix is informative. If some of the entries of the measurement matrix are known, we may apply the Bayes method for estimation with highly informative priors imposed on the corresponding entries.
There are situations where the measurement matrix is a function of observed data. In the measurement equation, is a function of . If we express , under a Bayesian framework, as a sum of and a matrix of white Gaussian noises, then we can confine the estimate of within a range of in probability. This confinement and values will produce the full conditional posterior distribution of from which the conditional posterior mean of is obtained. Given this mean and the data , we can find the full conditional posterior distribution of from which the conditional posterior mean of is obtained. This iterative process ultimately converges since this process leads to the maximum of the posterior distribution of and . In this process, we begin with a randomly generated value for and this could yield subsequently a conditional posterior mean of which is far off the truth. But, if we have any information from data concerning as is the case considered above, the conditional posterior mean of could be controlled within a range of the truth. Another point to consider concerning the noise contaminated measurement matrix is that the measurement errors of and may be correlated. This correlation may well be reflected in the full conditional posterior distributions of and respectively. These are merits of the Bayes approach with empirical priors. A precaution is that an empirical prior with a too low uncertainty leads to data sensitive estimates.
We could apply the Bayes method to a larger set of state space models including nonlinear models as
where and are known nonlinear functions and unknown. We may consider non-normal errors for and . In this case, the full conditional posterior distributions of and may be given in complicated expressions. Stochastic computing algorithms such as Metropolis Hastings algorithm would be a useful tool for handling the full conditional posterior distributions.
Footnotes
Acknowledgments
This work was supported by the NRF grant (No. 2016R1D1A1B03936155) of the Republic of Korea.
References
1.
WillsA.SchönT.B.LindstenF. and NinnessB., Estimation of linear systems using a Gibbs sampler, in: IFAC Symposium on System Identification, 2012.
2.
ZhengF., A robust H2 filtering approach and its application to equalizer design for communication systems, IEEE Transactions on Signal Processing53(8) (2005), 2735–2747.
3.
CasellaG. and GeorgeE.I., Explaining the Gibbs sampler, The American Statistician46(3) (1992), 167–174.
4.
HamiltonJ.D., A new approach to the economic analysis of nonstationary time series and the business cycle, Econometrica57 (1989), 357–384.
5.
HamiltonJ.D., Time series analysis, Princeton: Princeton University Press, 1994.
6.
DurbinJ. and KoopmanS.J., Time Series Analysis by State Space Methods 2nd ed., Oxford: Oxford University Press, 2012.
7.
YueJ. and ShiY., Multi-innovation self-tuning Kalman filter with unknown parameters systems, in: International Power, Electronics and Materials Engineering Conference, 2015.
8.
GordonK. and SmithA.F.M., Monitoring and modeling biomedical time series, Journal of the American Statistical Association85 (1990), 328–337.
9.
SahlinK., Estimating convergence of Markov Chain Monte Carlo simulations, Master Thesis, Stockholm University, Stockholm, 2011.
ShumwayR.H. and StofferD.S., Dynamic linear models with switching, Journal of the American Statistical Association86 (1991), 763–769.
12.
KalmanR.E., A new approach to linear filtering and prediction problems, Journal of Basic Engineering82 (1960), 35–45.
13.
KalmanR.E. and BucyR., New results in linear filtering and prediction theory, The American Society of Mechanical Engineers83 (1961), 95–107.
14.
KimS.H.ParkJ.H.YoonW.RaW.S. and WhangI.H., A note on sensor arrangement for long-distance target localization, Signal Processing133 (2017), 18–31.
15.
HanS.K.RaW.S.WhangI.H. and ParkJ.B, Linear recursive passive target tracking filter for cooperative sea-skimming anti-ship missiles, IET Radar, Sonar and Navigation8(7) (2014), 805–814.
16.
PeterkaV., Bayesian approach to system identification, in: Trends and Progress in System Identification, Pergamon Press, 1981, pp. 239–304.
17.
RaW.S.WhangI.H. and AhnJ.Y., Robust horizontal line-of-sight rate estimator for sea skimming anti-ship missile with two-axis gimballed seeker, IEE Proceedings, Radar Sonar Navigations152(1) (2005), 9–15.
18.
RaW.S.WhangI.H. and ParkJ.B., Non-conservative robust Kalman filtering using a noise corrupted measurement matrix, IET Control Theory and Applications3(9) (2009), 1226–1236.
19.
HoY.C. and LeeR.C.K., A Bayesian approach to problems in stochastic estimation and control, IEEE Transaction on Automatic Control9 (1964), 333–339.
20.
ChenZ., Bayesian Filtering: From Kalman Filters to Particle Filters, and Beyond, Adaptive Systems Laboratory, McMaster University, Hamilton, ON, Canada, 2003.