Statistical Inference for Interval-Valued Spatial Error Models

Abstract

In recent years, there is a growing need to analyze interval-valued data, and many scholars are also paying attention to the research on interval-valued data modeling and analysis. In this paper, we shall introduce a new model aimed at handling interval-valued data with spatial dependencies. Based on the idea of least square method of single-valued case, we give the parameter estimator for interval-valued spatial error model. Then we prove the estimator’s properties. Finally, we give the numerical simulation analysis. Also the proposed model was applied to three real datasets, and the empirical analysis demonstrates the effectiveness of the proposed model.

Keywords

interval-valued random variable spatial error model parameter estimator

1. Introduction

It is well known that classical linear regression model and time series models are most widely used in statistical inference, including medicine, education, finance, science, technology and many other fields. However, the models are mostly for single-valued random variables. In the real world, there are a lot of random phenomenas, which can not be characterized by a single-valued random variable. For example, the price of a stock on a given day. It is clearly unreasonable by using a single-valued data to describe the stock price (ex. stock’s daily min price or max price) on a given day, it will lose the fluctuation information in the process of stock trading, then the final analysis results provided to decision-makers are also one-sided. People also sometimes will pay more attention to the data in a certain range, such as the temperature for a given day, people do not pay attention to the temperature at a certain time of one day, but pay more attention to the maximum and minimum temperature of one day. In economic forecasting, economists mostly give a prediction range of economic growth rate. In the process of medical imaging diagnosis, the imaging result is usually a two-dimensional plan, and it is not a single-point value. In investment decision-making, investors are not only concerned about the price of a risk asset at a certain point in time, but also about the fluctuation range of the price at a period. So the interval-valued data are more appropriate and valuable, because they can more comprehensively reflect the uncertainty and ambiguity in practical problem. Thus it is necessary to consider the interval-valued statistical models and statistical inference problems.

Interval-valued random variables are special set-valued random variables. In the mid-20th century, Aumann and Debreu firstly used set-valued mapping when studying economic phenomena. Aumann (1965) gave the integral of set-valued random variables. Hiai and Umegaki (1977) gave the concept of conditional expectation of set-valued random variables. Lyashenko (1982, 1983) discussed the properties of set-valued random variables in Euclidean space, and introduced the definition of set-valued Gaussian random variables, and gave the definition of variance for set-valued random variables. Vitale (1985) studied the properties of $D_{p}$ (see the definition in Section 3) distance. Yang and Li (2005) gave the definitions of variance and covariance for set-valued random variables under the $D_{p}$ distance, and obtained excellent properties. Blanco Fernandez et al. (2008) defined the variance and properties of interval-valued random variables under a new distance. Hess (1991), Papageorgiou (1985, 1995), Li et al. (2010), Li and Ogura (1998), Li and Ogura (1999) explored the convergence theory of set-valued random variables under different conditions. Molchanov (2005), in Li et al. (2002) all systematically summarized the theory of set-valued random variables. The above research promoted the development of set-valued random variable theory. With the development of set-valued theory, interval-valued data modeling and analysis have received more attention and application in other fields. For example, Thierry etc. proposed the interval-valued utility theory for decision making in Thierry and Prakash (2020), Li etc. discussed the interval-valued risk measure models in Li et al. (2017) and Ida studied the portfolio selection problem with interval and fuzzy coefficient in Ida (2003) and Ida (2004).

For interval-valued statistical models, Billard and Diday (2000) established a linear regression model by using the midpoint of interval-valued random variables. Billard and Diday (2002) established linear regression models by using the two endpoints of interval-valued random variables respectively. Lima Neto and de Carvalho (2008) established linear regression models by using the center and radius of interval-valued random variables. Lima Neto and de Carvalho (2010) imposed non-negative constraints on the regression coefficients of radius on the basis of Lima Neto and de Carvalho (2008). Wang et al. (2012) proposed the complete information method to deal with the interval-valued linear regression model. Souza et al. (2017) introduced the parametrization method to linear regression model. Wang et al. (2015) used set-valued theory to study linear regression problems, and gave the least square estimator and the related properties. Li etc. studied interval-valued linear regression model based on assumption the error is asymmetric Laplace distribution in Guan and Li (2024). Eufr $\overset{´}{a}$ sio etc. had a research on robust regression model for interval-valued variables by using exponential-type kernel functions in Neto and de Carvalho (2018). But all the above research work is about the linear regression models of interval-valued random variables. In real life, some data have spatial dependencies, such as temperature in different cities and economic conditions in different regions. There is currently no research work on interval-valued data with spatial dependencies. Therefore it is natural and necessary to consider the interval-valued spatial regression models and spatial error models.

As for the single-valued spatial error model, Anselin (1988) gave the maximum likelihood estimation method. Prucha (2010) proposed the generalized moment estimation method. Yildirim and Kantar (2020) systematically summarized the methods of parameter estimation of spatial error model and proposed a new parameter estimation method based on likelihood equation. Many scholars have studied the classical linear regression and time series models of interval-valued random variables and achieved wonderful research results. The issue we are considering is the interval-valued spatial error models. This paper attempts to extend the classical spatial error model to interval-valued case. We will provide a parameter estimation method and discuss the properties of the parameter’s estimator. We will conduct numerical simulation analysis to evaluate the performance of the proposed model, and also apply it to actual datasets for validation.

The organization of this paper is arranged as follows: in Section 2, we mainly introduce the notations and basic concepts of interval-valued random theory. In Section 3, we mainly discuss the interval-valued spatial error model, and give the least square estimator of parameter and discuss the unbiasedness of the estimator, a series of digital characteristics and the consistency of parameter estimation. In Section 4, the effectiveness of the method is verified by numerical simulation. In Section 5, the model is applied to case analysis by three actual datasets.

2. Preliminaries on Interval-Valued Random Variables

2.1. $d_{p}$ Distance and $D_{p}$ Distance

Throughout this paper, we assume that $(Ω, A, μ)$ is a complete probability space. $R^{d}$ is a $d$ -dimensional Euclidean space, $‖ \cdot ‖$ and $⟨ \cdot, \cdot ⟩$ are the norm and inner product in $R^{d}$ respectively, and the family of compact convex subsets in $R^{d}$ is $K_{k c} (R^{d})$ . When $d = 1$ , $R^{1}$ is abbreviated as $R$ , then $K_{k c} (R)$ is a family of nonempty bounded closed intervals in $R$ , that is

K_{k c} (R) = {A = [\underline{a}, \bar{a}] : - \infty < \underline{a} \leq \bar{a} < \infty, \underline{a}, \bar{a} \in R} .

Here,

\underline{a}

and

\bar{a}

are the left and right endpoints of interval

A

respectively. In addition, interval

A

is also denoted as center radius form

A = (c_{A}; r_{A})

, where

c_{A} = (\bar{a} + \underline{a}) / 2

and

r_{A} = (\bar{a} - \underline{a}) / 2

are the center and radius of interval

A

respectively. For any sets

A

and

B

, the addition and multiplication operations are defined as:

\begin{aligned} A + B & = {a + b : a \in A, b \in B}, \end{aligned}

\begin{aligned} k A & = {k a : a \in A}, \forall k \in R . \end{aligned}

Interval is a special case of set, for

A = [\underline{a}, \bar{a}] = (c_{1}; r_{1})

B = [\underline{b}, \bar{b}] = (c_{2}; r_{2})

, its addition and multiplication operations are defined as:

\begin{aligned} A + B & = [\underline{a} + \underline{b}, \bar{a} + \bar{b}] = (c_{1} + c_{2}; r_{1} + r_{2}), \\ k A & = {\begin{cases} [k \underline{a}, k \bar{a}], & k \geq 0 \\ [k \bar{a}, k \underline{a}], & k < 0 \end{cases} = (k c_{1}; | k | r_{1}) . \end{aligned}

Note that if set

A

does not degenerate to a point,

A - A = A + (- A) \neq {0}

. Then

K_{k c} (R^{d})

is not a linear space with respect to addition and multiplication.

For any sets $A$ and $B$ in $K_{k c} (R^{d})$ , the subtraction operation is defined as: $A - B = {a - b : a \in A, b \in B}$ . For interval $A = [\underline{a}, \bar{a}] = (c_{A}; r_{A}), B = [\underline{b}, \bar{b}] = (c_{B}; r_{B})$ . The definition of subtraction operation is derived as follows:

A - B = [\underline{a} - \bar{b}, \bar{a} - \underline{b}] = (c_{A} - c_{B}; r_{A} + r_{B}) .

The support function of set $A \in R^{d}$ is defined as

s (x, A) = sup_{a \in A} ⟨ x, a ⟩, x \in R^{d} .

The $d_{p}$ distance is defined as follows: For any $1 \leq p < \infty$ , the $d_{p}$ distance between set $A$ and $B$ is

d_{p} (A, B) = [\int_{S^{d - 1}} | s (x, A) - s (x, B) |^{p} d μ (x)]^{\frac{1}{p}} .

Here,

S^{d - 1}

is the unit sphere of

R^{d}

μ

is a measure on

S^{d - 1}

, in particular, on

S^{0}

we can take

μ (1) = μ (- 1) = 1

. Further, from Yang and Li (2005),

(K_{k c} (R^{d}), d_{p})

is a complete separable space. Specially, for interval

A = [\underline{a}, \bar{a}] = (c_{A}; r_{A})

and

B = [\underline{b}, \bar{b}] = (c_{B}; r_{B})

, the

d_{p}

distance is

\begin{aligned} d_{p} (A, B) & = (| \underline{b} - \underline{a} |^{p} + | \bar{b} - \bar{a} |^{p})^{\frac{1}{p}} \\ = [((c_{B} - c_{A}) - (r_{B} - r_{A}))^{p} + ((c_{B} - c_{A}) + (r_{B} - r_{A}))^{p}]^{\frac{1}{p}} . \end{aligned}

In particular, if

p = 2

, then

\begin{aligned} d_{2} (A, B) & = [(\underline{b} - \underline{a})^{2} + (\bar{b} - \bar{a})^{2}]^{\frac{1}{2}} \\ = [((c_{B} - c_{A}) - (r_{B} - r_{A}))^{2} + ((c_{B} - c_{A}) + (r_{B} - r_{A}))^{2}]^{\frac{1}{2}} \\ = [2 (c_{B} - c_{A})^{2} + (r_{B} - r_{A})^{2}]^{\frac{1}{2}} . \end{aligned}

Call set-valued mapping $F : Ω \to K_{k c} (R^{d})$ be a set-valued random variable, if for any closed sets $C \in K_{k c} (R^{d})$ ,

F^{- 1} (C) = {ω \in Ω : F (ω) \cap C \neq \emptyset} \in A .

Let $U [Ω, K_{k c} (R^{d})]$ denote the family of set-valued random variables in $K_{k c} (R^{d})$ . The expression of $D_{p}$ distance between set-valued random variables $F_{1}$ and $F_{2}$ is

D_{p} (F_{1}, F_{2}) = [E d_{p}^{p} (F_{1}, F_{2})]^{\frac{1}{p}} .

Similarly, for interval-valued random variables, the $D_{p}$ distance between interval-valued random variables $F_{1} = [{\underline{f}}_{1}, {\bar{f}}_{1}] = (c_{F_{1}}; r_{F_{1}})$ and $F_{2} = [{\underline{f}}_{2}, {\bar{f}}_{2}] = (c_{F_{2}}; r_{F_{2}})$ is

\begin{aligned} D_{p} (F_{1}, F_{2}) & = [E ({\underline{f}}_{2} - {\underline{f}}_{1})^{p} + E ({\bar{f}}_{2} - {\bar{f}}_{1})^{p}]^{\frac{1}{p}} \\ = [E ((c_{F_{2}} - c_{F_{1}}) - (r_{F_{2}} - r_{F_{1}}))^{p} + E ((c_{F_{2}} - c_{F_{1}}) + (r_{F_{2}} - r_{F_{1}}))^{p}]^{\frac{1}{p}} . \end{aligned}

Further, from Yang and Li (2005), $(K_{k c} (R^{d}), D_{p})$ is a complete separable distance space. In particular, if $p = 2$ , then

\begin{aligned} D_{2} (F_{1}, F_{2}) & = [E ({\underline{f}}_{2} - {\underline{f}}_{1})^{2} + E ({\bar{f}}_{2} - {\bar{f}}_{1})^{2}]^{\frac{1}{2}} \\ = [E ((c_{F_{2}} - c_{F_{1}}) - (r_{F_{2}} - r_{F_{1}}))^{2} + E ((c_{F_{2}} - c_{F 1}) + (r_{F_{2}} - r_{F_{1}}))^{2}]^{\frac{1}{2}} . \end{aligned}

2.2. Moment of Set-Valued Random Variables

The expectation of set-valued random variable $F \in U [Ω, K_{k c} (R^{d})]$ is given by Aumann (1965),

E [F] = \int_{Ω} F d μ = {\int_{Ω} f d μ : f \in S_{F}} .

Here

S_{F}

is the integrable selection set of

F

that is,

S_{F} = {f \in L^{p} [Ω, R^{d}] : f (ω) \in F (ω) a . e . (μ)} .

Yang and Li (2005) introduced the variance and covariance of set-valued random variables based on $D_{p}$ distance

For set-valued random variable $F \in U [Ω, K_{k c} (R^{d})]$ , the variance is defined as following:

\begin{aligned} Var (F) & = D_{2}^{2} (F, E [F]) \\ = E [d_{2}^{2} (F, E [F])] \\ = E [\int_{S^{d - 1}} (s (x, F) - s (x, E [F]))^{2} d μ (x)] . \end{aligned}

For two set-valued random variables $F_{1}, F_{2} \in U [Ω, K_{k c} (R^{d})]$ , the covariance is defined as follows

Cov (F_{1}, F_{2}) = E [\int_{S^{d - 1}} (s (x, F_{1}) - s (x, E [F_{1}])) (s (x, F_{2}) - s (x, E [F_{2}])) d μ (x)] .

If $F = (c_{F}; r_{F})$ is an interval-valued random variable, then

\begin{aligned} Var (F) = & E [\underline{f} - E [\underline{f}]]^{2} + E [\bar{f} - E [\bar{f}]]^{2} \\ = & E [(c_{F} - E [c_{F}]) - (r_{F} - E [r_{F}])]^{2} + E [(c_{F} - E [c_{F}]) + (r_{F} - E [r_{F}])]^{2} . \end{aligned}

The covariance of interval-valued random variables $F_{1}, F_{2} \in U [Ω, K_{k c} (R)]$ is

\begin{aligned} Cov (F_{1}, F_{2}) & = E [({\underline{f}}_{1} - E [{\underline{f}}_{1}]) ({\underline{f}}_{2} - E [{\underline{f}}_{2}])] + E [({\bar{f}}_{1} - E [{\bar{f}}_{1}]) ({\bar{f}}_{2} - E [{\bar{f}}_{2}])] \\ = E [(c_{F_{1}} - E [c_{F_{1}}] - (r_{F_{1}} - E [r_{F_{1}}])) (c_{F_{2}} - E [c_{F_{2}}] - (r_{F_{2}} - E [r_{F_{2}}]))] \\ + E [(c_{F_{1}} - E [c_{F_{1}}] + (r_{F 1} - E [r_{F_{1}}])) (c_{F_{2}} - E [c_{F_{2}}] + (r_{F_{2}} - E [r_{F_{2}}]))] . \end{aligned}

Through calculation, we can easiely have

\begin{aligned} Var (F) & = 2 E [c_{F} - E [c_{F}]]^{2} + 2 E [r_{F} - E [r_{F}]]^{2} \\ = 2 Var (c_{F}) + 2 Var (r_{F}), \end{aligned}

\begin{aligned} Cov (F_{1}, F_{2}) = & 2 E [(c_{F_{1}} - E [c_{F_{1}}]) (c_{F_{2}} - E [c_{F_{2}}])] + 2 E [(r_{F_{1}} - E [r_{F_{1}}]) (r_{F_{2}} - E [r_{F_{2}}])] \\ = & 2 Cov (c_{F_{1}}, c_{F_{2}}) + 2 Cov (r_{F_{1}}, r_{F_{2}}) . \end{aligned}

The variance and covariance of interval-valued random variables will be used in Section 3. For more information about the variance and covariance of set-valued random variables, readers can refer to Yang and Li (2005).

3. Interval-valued Spatial Error Model

In this section, we assume the explanatory variable $X = (X_{1}, X_{2}, \dots, X_{n})^{⊤}, X_{i}$ is a $p -$ dimensional single-valued vector, the response variable $Y = (Y_{1}, Y_{2}, \dots, Y_{n})^{⊤}$ , where $Y_{i} (1 \leq i \leq n)$ are interval-valued variables, and the unknown parameter $β^{⊤} = (β_{1}, \dots, β_{p}), β_{i}$ are all interval-valued, $u, ε$ are single-valued error terms, $W$ is the known $n \times n$ space weight matrix, which is row normalized $λ$ is a spatial autoregressive coefficient.

Y = X β + u, u = λ W u + ε, | λ | < 1,

(3.1)

where the error item

ε \sim N (0, σ^{2} I_{n})

I_{n}

is an identity matrix.

By transforming, the model (3.1) becomes,

(I_{n} - λ W) Y = (I_{n} - λ W) X β + ε,

(3.2)

denoted by

\begin{aligned} Y_{λ} & = (I_{n} - λ W) Y, \\ X_{λ} & = (I_{n} - λ W) X . \end{aligned}

Then the model (3.1) can be expressed as

\begin{aligned} Y_{λ} & = X_{λ} β + ε, \\ E (Y_{λ}) & = X_{λ} β . \end{aligned}

(3.3)

Definition 3.1

If $Y_{λ} = (Y_{λ 1}, Y_{λ 2}, \dots, Y_{λ n})^{⊤}$ is the n-dimensional vector of interval-valued observations, $X_{λ} = (x_{λ i j})_{n \times p}$ is the $n \times p$ single point valued design matrix, $β^{⊤} = (β_{1}, β_{2}, \dots, β_{p})$ is a p-dimensional interval-valued parameter vector, and satisfies the relationship of formula (3.3), this model is called interval-valued space error model.

Remark 1

The row normalization of the space weight matrix is an ordinary assumption in spatial data analysis. Under this assumption, the summation of each row of $W$ is $1$ , and the parameter space of $λ$ can be restricted as $| λ | < 1$ . See Hillier and Martellosio (2018) for more details.

Remark 2

Specifically, when $Y$ and $β$ degenerate into single point value, the model becomes a classical statistical model. Therefore, from this perspective, interval-valued models are an extension of classical models. Compared with single point value data, interval value data has more complex operations, and the space composed of intervals is not a linear space. Therefore, it will be more complex to study interval-valued models.

Remark 3

Here, $u$ and $ε$ are assumed to be single-values rather than interval-values. There are two reasons for this assumption. Firstly, it is simple and easy to deal with. Secondly, the model (3.1) can be transformed to model (3.3) because $u = λ W u + ε$ if and only if $u - λ W u = ε$ for single values. But it is not equivalent for interval values, because the $K_{k c} (R)$ is not a linear space. This is why we assume $ε$ and $u$ are single values.

Now, we give the algorithm for multiplication of the matrix and interval values.

Definition 3.2

Let $A_{i} = [\underline{a_{i}}, \bar{a_{i}}] = (c_{i}; r_{i}), i = 1, \dots, p$ be the interval in $K_{k c} (R)$ , the interval value vector $A = (A_{1}, A_{2}, \dots, A_{p})^{⊤}$ is multiplied by any $n \times p$ dimensional matrix $(m_{i j})_{n \times p}, i = 1, 2, \dots, n; j = 1, 2, \dots, p$ , the algorithm is defined as follows:

\begin{aligned} (m_{i j})_{n \times p} A & = (\begin{matrix} m_{11} A_{1} + \dots + m_{1 p} A_{p} \\ ⋮ \\ m_{n 1} A_{1} + \dots + m_{n p} A_{p} \end{matrix}) \\ = (\begin{matrix} m_{11} (c_{1}; r_{1}) + \dots + m_{1 p} (c_{p}; r_{p}) \\ ⋮ \\ m_{n 1} (c_{1}; r_{1}) + \dots + m_{n p} (c_{p}; r_{p}) \end{matrix}) \\ = (\begin{matrix} m_{11} [\underline{a_{1}}, \bar{a_{1}}] + \dots + m_{1 p} [\underline{a_{p}}, \bar{a_{p}}] \\ ⋮ \\ m_{n 1} [\underline{a_{1}}, \bar{a_{1}}] + \dots + m_{n p} [\underline{a_{p}}, \bar{a_{p}}] \end{matrix}) . \end{aligned}

For the general single-valued linear model, the idea of the least squares estimation method is to minimize the sum of the squares of the residuals. We shall use the same mathematical idea here.

For interval-valued spatial error model, the least square estimation of interval-valued unknown parameter $β$ is to minimize $d_{2}^{2} (Y_{λ}, X_{λ} β)$ under the definition of $d_{2}$ distance

\begin{aligned} d_{2}^{2} (Y_{λ}, X_{λ} β) = & \sum_{i = 1}^{n} d_{2}^{2} (Y_{λ i}, x_{λ i 1} β_{1} + x_{λ i 2} β_{2} + \dots + x_{λ i p} β_{p}) \\ = & \sum_{i = 1}^{n} [(c_{Y_{λ i}} - x_{λ i 1} c_{β_{1}} - \dots - x_{λ i p} c_{β_{p}}) \\ {- (r_{Y_{λ i}} - | x_{λ i 1} | r_{β_{1}} - \dots - | x_{λ i p} | r_{β_{p}})]}^{2} \\ + \sum_{i = 1}^{n} [(c_{Y_{λ i}} - x_{λ i 1} c_{β_{1}} - \dots - x_{λ i p} c_{β_{p}}) \\ {+ (r_{Y_{λ i}} - | x_{λ i 1} | r_{β_{1}} - \dots - | x_{λ i p} | r_{β_{p}})]}^{2} \\ = & 2 \sum_{i = 1}^{n} [{(c_{Y_{λ i}} - x_{λ i 1} c_{β_{1}} - \dots - x_{λ i p} c_{β_{p}})}^{2} \\ + {(r_{Y_{λ i}} - | x_{λ i 1} | r_{β_{1}} - \dots - | x_{λ i p} | r_{β_{p}})}^{2}], \end{aligned}

where

c_{m}

and

r_{m}

represent the center and radius of interval value

m

respectively. The above formula is the quadratic function of

c_{β_{j}}

and

r_{β_{j}}

, and

d_{2}^{2} (Y_{λ}, X_{λ} β) \geq 0

, so there is a minimum value.

Next, calculate the partial derivatives of $c_{β_{j}}$ and $r_{β_{j}}$ respectively

{\begin{cases} \frac{\partial d_{2}^{2} (Y_{λ}, X_{λ} β)}{\partial c_{β_{j}}} = 0 \\ \frac{\partial d_{2}^{2} (Y_{λ}, X_{λ} β)}{\partial r_{β_{j}}} = 0 \end{cases}, j = 1, 2, \dots, p .

that is,

{\begin{matrix} \sum_{i = 1}^{n} (c_{Y_{λ i}} - x_{λ i 1} c_{β_{1}} - \dots - x_{λ i p} c_{β_{p}}) (- x_{λ i j}) = 0 \\ \sum_{i = 1}^{n} (r_{Y_{λ i}} - | x_{λ i 1} | r_{β_{1}} - \dots - | x_{λ i p} | r_{β_{p}}) (- | x_{λ i j} |) = 0. \end{matrix}

The regular equation is:

{\begin{cases} X_{λ}^{⊤} c_{Y_{λ}} = X_{λ}^{⊤} X_{λ} c_{β} \\ | X_{λ} |^{⊤} r_{Y_{λ}} = | X_{λ} |^{⊤} | X_{λ} | r_{β} . \end{cases}

where

| X_{λ} | = (| x_{i j} |)_{n \times p}

After the regular equation is obtained, the parameter estimation of the interval-valued spatial error model can be obtained by solving the regular equation. The following is the result about the rank of $X_{λ}$ , denote $r k (X)$ as the rank of $X$ .

Lemma 3.1

If $r k (X) = p$ , then $r k (X_{λ}) = p$ .

Proof.

Easy to know that

\begin{aligned} r k (X_{λ}) & = r k ((I_{n} - λ w) X) \\ \leq min (r k (I_{n} - λ w), r k (X)) \\ \leq r k (X) \\ = p \end{aligned}

and

r k ((I_{n} - λ w) X) \geq r k (I_{n} - λ w) + r k (X) - n = p,

r k ((I_{n} - λ w) X) = r k (X_{λ}) = r k (X) = p .

The result is proved.

Based on Lemma 3.1 and suppose $r k (| X_{λ} |) = p$ , the estimator $\hat{β}$ of interval-valued spatial error model can be obtained by solving the regular equation, which is shown in the form of theorem below.

Theorem 3.2

Under the condition of Lemma 3.1, the least squares estimation of interval-valued spatial error model is unique, which is denoted as

\begin{aligned} {\hat{β}}_{L S} (λ) & = ((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤} c_{Y_{λ}}; (| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤} r_{Y_{λ}}) \\ = ((X^{⊤} (I_{n} - λ W)^{⊤} (I_{n} - λ W) X)^{- 1} X^{⊤} (I_{n} - λ W)^{⊤} (I_{n} - λ W) c_{Y}; \\ (| (I_{n} - λ W) X |^{⊤} | (I_{n} - λ W) X |)^{- 1} | (I_{n} - λ W) X |^{⊤} (I_{n} - λ W) r_{Y}) . \end{aligned}

Proof.

We formulate (3.2) as

{\begin{cases} (I_{n} - λ W) c_{Y} = (I_{n} - λ W) X c_{β} + ε \\ (I_{n} - λ W) r_{Y} = | (I_{n} - λ W) X | r_{β}, \end{cases}

(3.4)

then we obtain the estimates by the ordinary least square method.

After obtaining the estimation form of unknown parameter $β$ , we then discuss the properties. Firstly, consider the unbiassedness of ${\hat{β}}_{L S} (λ)$ .

Remark 4

When $λ = 0$ , the form of ${\hat{β}}_{L S} (λ)$ is the same as that given in Theorem 3.1 of Wang et al. (2015).

In summary, we give the algorithm 1 to obtain estimates of the model (3.1).

Theorem 3.3

The least squares estimate ${\hat{β}}_{L S} (λ)$ is an unbiased estimate of $β$ .

Proof.

By Theorem 3.2,

\begin{aligned} E ({\hat{β}}_{L S} (λ) & = ((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤} E [c_{Y_{λ}}]; (| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤} E [r_{Y_{λ}}]) \\ = ((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤} X_{λ} c_{β}; (| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤} | X_{λ} | r_{β}) \\ = (c_{β}; r_{β}) = β . \end{aligned}

The result is proved.

For the interval-valued spatial error model, when $r k (X_{λ}) = r k (| X_{λ} |) = p$ , the covariance of ${\hat{β}}_{L S} (λ)$ can be obtained, as shown in the following result.

Theorem 3.4

If $r k (X_{λ}) = r k (| X_{λ} |) = p, E (Y_{λ}) = X_{λ} β$ and $C o v (c_{Y_{λ}}) = c_{σ^{2}} I_{n}$ , $C o v (r_{Y_{λ}}) = r_{σ^{2}} I_{n}$ , then the covariance matrix of ${\hat{β}}_{L S} (λ)$ is

(1) $i \neq j$ ,

\begin{aligned} C o v ({\hat{β}}_{L S}^{(i)} (λ), {\hat{β}}_{L S}^{(j)} (λ)) = & 2 c_{σ^{2}} ((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤})_{(i)} (X_{λ} (X_{λ}^{⊤} X_{λ})^{- 1})_{(j)} \\ + 2 r_{σ^{2}} ((| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤})_{(i)} (| X_{λ} | (| X_{λ} |^{⊤} | X_{λ} |)^{- 1})_{(j)}, \end{aligned}

(2) $i = j$ ,

\begin{aligned} C o v ({\hat{β}}_{L S}^{(i)} (λ), {\hat{β}}_{L S}^{(j)} (λ)) = & 2 c_{σ^{2}} (X^{⊤} (I_{n} - λ W)^{⊤} (I_{n} - λ W) X)^{- 1} \\ + 2 r_{σ^{2}} (| X |^{⊤} | I_{n} - λ W |^{⊤} | I_{n} - λ W | | X |)^{- 1} . \end{aligned}

Where

{\hat{β}}_{L S}^{(i)} (λ), {\hat{β}}_{L S}^{(j)} (λ)

represent the

i

th,

j

th elements of

{\hat{β}}_{L S} (λ)

respectively, and

A_{(i)}, A_{(j)}

represent the

i

th,

j

th rows of matrix

A

respectively.

Proof.

For the $i$ th and $j$ th elements of ${\hat{β}}_{L S} (λ)$ , if $i \neq j$ , it has

\begin{aligned} C o v ({\hat{β}}_{L S}^{(i)} (λ), {\hat{β}}_{L S}^{(j)} (λ)) & = C o v {(((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤})_{(i)} c_{Y_{λ}}; ((| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤})_{(i)} r_{Y_{λ}}), \\ (((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤})_{(j)} c_{Y_{λ}}; ((| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤})_{(j)} r_{Y_{λ}})} \\ = 2 C o v (((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤})_{(i)} c_{Y_{λ}}, ((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤})_{(j)} c_{Y_{λ}}) \\ + 2 C o v (((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤})_{(i)} r_{Y_{λ}}, ((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤})_{(j)} r_{Y_{λ}}) \\ = 2 ((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤})_{(i)} C o v (c_{Y_{λ}}) ((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤})_{(j)}^{⊤} \\ + 2 ((| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤})_{(i)} C o v (c_{Y_{λ}}) (| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤})_{(j)}^{⊤} \\ = 2 c_{σ^{2}} ((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤})_{(i)} (X_{λ} (X_{λ}^{⊤} X_{λ})^{- 1})_{(j)} \\ + 2 r_{σ^{2}} ((| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤})_{(i)} (| X_{λ} | (| X_{λ} |^{⊤} | X_{λ} |)^{- 1})_{(j)} . \end{aligned}

When

i = j

, it has

\begin{aligned} C o v ({\hat{β}}_{L S}^{(i)} (λ), {\hat{β}}_{L S}^{(i)} (λ)) & = 2 c_{σ^{2}} ((X_{λ}^{⊤} X_{λ})^{- 1} + 2 r_{σ^{2}} (| X_{λ} |^{⊤} | X_{λ} |)^{- 1}) \\ = 2 c_{σ^{2}} (X^{⊤} (I_{n} - λ W)^{⊤} (I_{n} - λ W) X)^{- 1} \\ + 2 r_{σ^{2}} (| X |^{⊤} | I_{n} - λ W |^{⊤} | I_{n} - λ W | | X |)^{- 1} . \end{aligned}

The result is proved.

Next we discuss the estimation of error $ε$ and error variance. We mainly consider the expectation and covariance of interval-valued error estimation.

Theorem 3.5

The error estimator $\hat{ε}$ can be obtained from $\hat{ε} = (c_{Y_{λ}} - X_{λ} c_{{\hat{β}}_{L S}}; r_{Y_{λ}} - X_{λ} r_{{\hat{β}}_{L S}})$ , and its expectation and variance are as follows:

$E (\hat{ε}) = 0,$

$C o v (\hat{ε}) = 2 c_{σ^{2}} (I_{n} - P_{x_{λ}}) + 2 r_{σ^{2}} (I_{n} - P_{| x_{λ} |})$ , where $P_{X_{λ}} = X_{λ} (X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤}, P_{| X_{λ} |} = | X_{λ} | (| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤}$ .

Proof.

(1) Since

\begin{aligned} E [c_{Y_{λ}} - X_{λ} c_{{\hat{β}}_{L S}}] & = E [c_{Y_{λ}}] - E [X_{λ} c_{{\hat{β}}_{L S}}] \\ = X_{λ} c_{β} - X_{λ} c_{β} \\ = 0. \end{aligned}

Similarly, we can prove that

E [r_{Y_{λ}} - X_{λ} r_{{\hat{β}}_{L S}}] = 0.

That means

E (\hat{ε}) = 0.

(2) On the other hand,

\begin{aligned} \hat{ε} & = (c_{Y_{λ}} - X_{λ} ((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤} c_{Y_{λ}}; r_{Y_{λ}} - (| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤} r_{Y_{λ}}) \\ = ((I_{n} - X_{λ} (X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤}) c_{Y_{λ}}; (I_{n} - | X_{λ} | (| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤}) r_{Y_{λ}}) \\ = ((I_{n} - P_{X_{λ}}) c_{Y_{λ}}; (I_{n} - P_{| X_{λ} |}) r_{Y_{λ}}) . \end{aligned}

Then the ith element of

\hat{ε}

((I_{n} - P_{X_{λ}})_{(i)} c_{Y_{λ}}; (I_{n} - P_{| X_{λ} |})_{(i)} r_{Y_{λ}}) .

Thus when

i \neq j

\begin{aligned} C o v (\hat{ε_{i}}, \hat{ε_{j}}) & = C o v {((I_{n} - P_{X_{λ}})_{(i)} c_{Y_{λ}}; (I_{n} - P_{| X_{λ} |})_{(i)} r_{Y_{λ}}), \\ ((I_{n} - P_{X_{λ}})_{(j)} c_{Y_{λ}}; (I_{n} - P_{| X_{λ} |})_{(j)} r_{Y_{λ}})} \\ = 2 C o v ((I_{n} - P_{X_{λ}})_{(i)} c_{Y_{λ}}, (I_{n} - P_{X_{λ}})_{(j)} c_{Y_{λ}}) + \\ 2 C o v ((I_{n} - P_{| X_{λ} |})_{(i)} r_{Y_{λ}}, (I_{n} - P_{| X_{λ} |})_{(j)} r_{Y_{λ}}) \\ = 2 (I_{n} - P_{X_{λ}})_{(i)} C o v (c_{Y_{λ}}) (I_{n} - P_{X_{λ}})_{(j)} + \\ 2 (I_{n} - P_{| X_{λ} |})_{(i)} C o v (r_{Y_{λ}}) (I_{n} - P_{| X_{λ} |})_{(j)} . \end{aligned}

Where $A_{(i)}, A_{(j)}$ respectively represent the $i, j$ rows of matrix $A$ . When $i = j$ ,

\begin{aligned} C o v (\hat{ε}) & = 2 (I_{n} - P_{X_{λ}}) C o v (c_{Y_{λ}}) (I_{n} - P_{X_{λ}}) + \\ 2 (I_{n} - P_{| X_{λ} |}) C o v (r_{Y_{λ}}) (I_{n} - P_{| X_{λ} |}) \end{aligned}

The result is proved.

From Theorem 3.5, we konw that $E (\hat{ε}) = 0$ , so it can be seen as a single point value, then it makes sense to assume that the error term is a single point value in model.

Next, we consider the estimation of $c_{σ^{2}} = C o v (c_{Y_{λ}})$ and $r_{σ^{2}} = C o v (r_{Y_{λ}})$ , denote ${\hat{c}}_{ε} = (I_{n} - P_{X_{λ}}) c_{Y_{λ}}$ , ${\hat{r}}_{ε} = (I_{n} - P_{X_{λ}}) r_{Y_{λ}}$ .

Theorem 3.6

${\hat{c}}_{σ^{2}} = \frac{{\hat{c}}_{ε}^{⊤} {\hat{c}}_{ε}}{n - p}$ and ${\hat{r}}_{σ^{2}} = \frac{{\hat{r}}_{ε}^{⊤} {\hat{r}}_{ε}}{n - p}$ are unbiased estimators of $C o v (c_{Y_{λ}})$ and $C o v (r_{Y_{λ}})$ respectively.

Proof.

Since $(I_{n} - P_{X_{λ}})$ is an idempotent matrix, there is

\begin{aligned} {\hat{c}}_{ε}^{⊤} {\hat{c}}_{ε} & = ((I_{n} - P_{X_{λ}}) c_{Y_{λ}})^{⊤} ((I_{n} - P_{X_{λ}}) c_{Y_{λ}}) \\ = c_{Y_{λ}}^{⊤} (I_{n} - P_{X_{λ}}) c_{Y_{λ}} . \end{aligned}

\begin{aligned} E [{\hat{c}}_{ε}^{⊤} {\hat{c}}_{ε}] & = E [c_{Y_{λ}}^{⊤} (I_{n} - P_{X_{λ}}) c_{Y_{λ}}] \\ = (X_{λ} c_{β})^{⊤} (I_{n} - P_{X_{λ}}) (X_{λ} c_{β}) + t r (I_{n} - P_{X_{λ}}) C o v (c_{Y_{λ}}) \\ = C o v (c_{Y_{λ}}) t r (I_{n} - P_{X_{λ}}) \\ = C o v (c_{Y_{λ}}) (n - p) . \end{aligned}

Then the estimator of $c_{σ^{2}}$ is gived as

{\hat{c}}_{σ^{2}} = \frac{{\hat{c}}_{ε}^{⊤} {\hat{c}}_{ε}}{n - p} .

\begin{aligned} E ({\hat{c}}_{σ^{2}}) & = E (\frac{{\hat{c}}_{ε}^{⊤} {\hat{c}}_{ε}}{n - p}) = \frac{1}{n - p} C o v (c_{Y_{λ}}) (n - p) = C o v (c_{Y_{λ}}) . \end{aligned}

\begin{aligned} {\hat{r}}_{ε}^{⊤} {\hat{r}}_{ε} & = ((I_{n} - P_{| X_{λ} |}) r_{Y_{λ}})^{⊤} ((I_{n} - P_{| X_{λ} |}) r_{Y_{λ}}) \\ = r_{Y_{λ}}^{⊤} (I_{n} - P_{| X_{λ} |}) r_{Y_{λ}} . \end{aligned}

Furthermore

\begin{aligned} E [{\hat{r}}_{ε}^{⊤} {\hat{r}}_{ε}] & = E [r_{Y_{λ}}^{⊤} (I_{n} - P_{| X_{λ} |}) r_{Y_{λ}}] \\ = (| X_{λ} | r_{β})^{⊤} (I_{n} - P_{| X_{λ} |}) (| X_{λ} | r_{β}) + t r (I_{n} - P_{| X_{λ} |}) C o v (r_{Y_{λ}}) \\ = r_{σ^{2}} t r (I_{n} - P_{| X_{λ} |}) \\ = r_{σ^{2}} (n - p) . \end{aligned}

The estimator of $C o v (r_{Y_{λ}})$ is given as

{\hat{r}}_{σ^{2}} = \frac{{\hat{r}}_{ε}^{⊤} {\hat{r}}_{ε}}{n - p} .

So there is

E ({\hat{r}}_{σ^{2}}) = \frac{1}{n - p} C o v (r_{Y_{λ}}) (n - p) = C o v (r_{Y_{λ}}) .

The result is proved.

In the following, we discuss the independence of ${\hat{β}}_{L S} = ({\hat{c}}_{β}; {\hat{r}}_{β})$ and $\hat{σ^{2}} = ({\hat{c}}_{σ^{2}}; {\hat{r}}_{σ^{2}})$ .

Theorem 3.7

${\hat{c}}_{σ^{2}}$ and ${\hat{c}}_{β}$ are independent of each other, ${\hat{r}}_{σ^{2}}$ and ${\hat{r}}_{β}$ are independent of each other.

Proof.

Since

\begin{aligned} \hat{σ^{2}} & = ({\hat{c}}_{σ^{2}}; {\hat{r}}_{σ^{2}}) \\ = (\frac{c_{Y_{λ}}^{⊤} (I_{n} - P_{X_{λ}}) c_{Y_{λ}}}{n - p}; \frac{r_{Y_{λ}}^{⊤} (I_{n} - P_{| X_{λ} |}) r_{Y_{λ}}}{n - p}) . \end{aligned}

\begin{aligned} {\hat{β}}_{L S} (λ) & = ({\hat{c}}_{β}; {\hat{r}}_{β}) \\ = ((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤} c_{Y_{λ}}; (| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤} r_{Y_{λ}}) . \end{aligned}

It can be seen that ${\hat{c}}_{σ^{2}}$ is the quadratic form of $c_{Y_{λ}}$ , ${\hat{c}}_{β}$ is the linear form of $c_{Y_{λ}}$ , and $c_{Y_{λ}} \sim N (0, c_{σ^{2}} I_{n}) .$

According to the independence theorem of quadratic form and linear form of normal variables, it is necessary to prove that they are independent of each other, that is, the product of linear part, variance part and quadratic part of normal variables is 0.

\begin{aligned} (X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤} c_{σ^{2}} I_{n} (I_{n} - P_{X_{λ}}) & = c_{σ^{2}} I_{n} ((X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤} - (X_{λ}^{⊤} X_{λ})^{- 1} X_{λ}^{⊤} P_{X_{λ}}) \\ = 0. \end{aligned}

Similarly, ${\hat{r}}_{σ^{2}}$ is the quadratic form of $r_{Y_{λ}}$ , ${\hat{r}}_{β}$ is the linear form of $r_{Y_{λ}}$ , and $r_{Y_{λ}} \sim N (0, r_{σ^{2}} I_{n}) .$

\begin{aligned} (| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤} r_{σ^{2}} I_{n} (I_{n} - P_{| X_{λ} |}) & = r_{σ^{2}} I_{n} ((| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤} - \\ (| X_{λ} |^{⊤} | X_{λ} |)^{- 1} | X_{λ} |^{⊤} P_{| X_{λ} |}) \\ = 0. \end{aligned}

so ${\hat{c}}_{σ^{2}}$ and ${\hat{c}}_{β}$ are independent of each other, ${\hat{r}}_{σ^{2}}$ and ${\hat{r}}_{β}$ are independent of each other.

Theorem 3.8

In the sense of $D_{2}$ distance, the sufficient conditions for the strong consistent estimation of ${\hat{β}}_{L S} (λ)$ as $β$ is:

lim_{n \to \infty} (S_{n}^{- 1} + | S_{n} |^{- 1}) = 0

where,

S_{n} = X_{λ}^{⊤} X_{λ}

| S_{n} | = | X_{λ} |^{⊤} | X_{λ} | .

Proof.

According to Theorem 3.3, ${\hat{β}}_{L S} (λ)$ is an unbiased estimate of $β$ , namely,

E ({\hat{β}}_{L S} (λ)) = β,

and

\begin{aligned} V a r ({\hat{β}}_{L S} (λ)) & = 2 c_{σ^{2}} (X^{⊤} (I_{n} - λ W)^{⊤} (I_{n} - λ W) X)^{- 1} + \\ 2 r_{σ^{2}} (| X |^{⊤} | I_{n} - λ W |^{⊤} | I_{n} - λ W | | X |)^{- 1}), \end{aligned}

From condition $lim_{n \to \infty} (S_{n}^{- 1} + | S_{n} |^{- 1}) = 0,$

\begin{aligned} lim_{n \to \infty} V a r ({\hat{β}}_{L S} (λ)) & = lim_{n \to \infty} (2 c_{σ^{2}} (X^{⊤} (I_{n} - λ W)^{⊤} (I_{n} - λ W) X)^{- 1} + \\ 2 r_{σ^{2}} (| X |^{⊤} | I_{n} - λ W |^{⊤} | I_{n} - λ W | | X |)^{- 1}) \\ = lim_{n \to \infty} (2 c_{σ^{2}} S_{n}^{- 1} + 2 r_{σ^{2}} | S_{n} |^{- 1}) \\ = 0 \end{aligned}

and

E ({\hat{β}}_{L S} (λ)) = β,

\begin{aligned} lim_{n \to \infty} V a r ({\hat{β}}_{L S} (λ)) & = lim_{n \to \infty} D_{2}^{2} ({\hat{β}}_{L S} (λ), E ({\hat{β}}_{L S} (λ))) \\ = 0 a . e . \end{aligned}

Therefore, in the sense of $D_{2}$ metric, ${\hat{β}}_{L S} (λ)$ is the strong consistent estimation of $β$ .

4. Numerical Simulation

In this part, the parameter estimation process of interval-valued spatial error model is further explained by numerical simulation. Based on the $d_{2}$ distance of the interval value, the mean square error of the parameter estimation obtained by the model is calculated and taken as one of the criteria for evaluating the goodness of the estimation.

Based on equation (3.4), we obtain observations ${c_{y_{i}}; r_{y_{i}}}_{i = 1}^{n}$ of $Y = (c_{Y}; r_{Y})$ , where

\begin{aligned} (\begin{matrix} c_{y_{1}} \\ c_{y_{2}} \\ ⋮ \\ c_{y_{n}} \end{matrix}) & = (\begin{matrix} 1 & x_{1} \\ 1 & x_{2} \\ ⋮ \\ 1 & x_{n} \end{matrix}) (\begin{matrix} c_{β_{1}} \\ c_{β_{2}} \end{matrix}) + (I_{n} - λ W)^{- 1} ε, (\begin{matrix} r_{y_{1}} \\ r_{y_{2}} \\ ⋮ \\ r_{y_{n}} \end{matrix}) = (I_{n} - λ W)^{- 1} | (I_{n} - λ W) (\begin{matrix} 1 & x_{1} \\ 1 & x_{2} \\ ⋮ \\ 1 & x_{n} \end{matrix}) | (\begin{matrix} r_{β_{1}} \\ r_{β_{2}} \end{matrix}) . \end{aligned}

Using the first-order adjacency method, assuming that $n$ samples are arranged in one font, the spatial weight matrix $W$ can be written as

\begin{aligned} W & = (\begin{matrix} 0 & 1 & 0 & \dots & 0 & 0 \\ 0.5 & 0 & 0.5 & \dots & 0 & 0 \\ 0 & 0.5 & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & 0 & 0.5 \\ 0 & 0 & 0 & \dots & 1 & 0 \end{matrix}) . \end{aligned}

In the simulation, we generate data as follows. $x_{i} \sim N (2, 2^{2})$ ; $λ$ is specified as 0.1 or 0.4. the true values of given interval-valued parameters are $β_{1} = [1, 2] = (1.5; 0.5) and β_{2} = [1.5, 2.5] = (2; 0.5)$ ; $ε_{i} \sim N (0, 0.5)$ . The number of repetition is 500.

Next, the mean square error (MSE) of the parameter estimation is calculated as one of the criteria to measure the goodness of the estimation. The calculation method is based on interval value $d_{2}$ distance:

MSE = \frac{1}{500} \sum_{i = 1}^{500} d_{2}^{2} (β, {\hat{β}}^{(i)}),

where

{\hat{β}}^{(i)}

denotes the estimate of

β

in the

i

-th repetition. We calculate averages of estimates of 500 repetitions as well. The simulation results are summarized in Tables 1 and 2. In the meantime, we compare our proposed model with a non-spatial model (

λ = 0

in (3.1)).

Table 1.
Averages of $\hat{λ}$ and $\hat{β} = ({\hat{β}}_{1}, {\hat{β}}_{2})$ .

$R$ $λ = 0.1$ $λ = 0.4$

Method index $n = 100$ 200 300 n=100 200 300

spatial model $\bar{\hat{λ}}$ 0.1150 0.1163 0.1157 0.4478 0.4479 0.4463

${\bar{\hat{β}}}_{1} (\hat{λ})$ [1.0143, 1.9768] [1.0213, 1.9844] [1.0135, 1.9788] [1.1108, 1.8805] [1.1181, 1.8886] [1.1043, 1.8858]

${\bar{\hat{β}}}_{2} (\hat{λ})$ [1.4998, 2.5042] [1.4971, 2.5017] [1.4977, 2.5022] [1.5026, 2.5010] [1.5002, 2.4989] [1.5001, 2.4989]

non-spatial model ${\bar{\hat{β}}}_{1}$ [0.8913, 2.1001] [0.9002, 2.1055] [0.8945, 2.0966] [0.3124, 2.6781] [0.3282, 2.6798] [0.3247, 2.6604]

${\bar{\hat{β}}}_{2}$ [1.5210, 2.4829] [1.5183, 2.4806] [1.5187, 2.4818] [1.6015, 2.4027] [1.5988, 2.3997] [1.5987, 2.4028]

$R$	$λ = 0.1$		$λ = 0.4$
spatial model	$\bar{\hat{λ}}$	0.1150	0.1163	0.1157	0.4478	0.4479	0.4463
	${\bar{\hat{β}}}_{1} (\hat{λ})$	[1.0143, 1.9768]	[1.0213, 1.9844]	[1.0135, 1.9788]	[1.1108, 1.8805]	[1.1181, 1.8886]	[1.1043, 1.8858]
	${\bar{\hat{β}}}_{2} (\hat{λ})$	[1.4998, 2.5042]	[1.4971, 2.5017]	[1.4977, 2.5022]	[1.5026, 2.5010]	[1.5002, 2.4989]	[1.5001, 2.4989]
non-spatial model	${\bar{\hat{β}}}_{1}$	[0.8913, 2.1001]	[0.9002, 2.1055]	[0.8945, 2.0966]	[0.3124, 2.6781]	[0.3282, 2.6798]	[0.3247, 2.6604]
	${\bar{\hat{β}}}_{2}$	[1.5210, 2.4829]	[1.5183, 2.4806]	[1.5187, 2.4818]	[1.6015, 2.4027]	[1.5988, 2.3997]	[1.5987, 2.4028]

Table 2.

Sample Mean Square Error.

$R$		$λ = 0.1$			$λ = 0.4$
Method	index	$n = 100$	200	300	n=100	200	300
spatial model	$\hat{λ}$	0.0019	0.0011	0.0007	0.0030	0.0026	0.0023
	${\hat{β}}_{1} (\hat{λ})$	0.0274	0.0147	0.0094	0.0736	0.0516	0.0392
	${\hat{β}}_{2} (\hat{λ})$	0.0027	0.0012	0.0009	0.0024	0.0012	0.0009
non-spatial model	${\hat{β}}_{1}$	0.0456	0.0337	0.0284	0.9929	0.9449	0.9115
	${\hat{β}}_{2}$	0.0034	0.0020	0.0016	0.0244	0.0220	0.0207

It can be seen from Table 1 that whatever value $n$ takes, the obtained parameter estimates are close to the real values. With the increase of sample size, the obtained parameter estimation is closer to the real value. Table 2 also illustrates that with the increase of $n$ , the obtained estimates are closer to the real values since the sample MSEs of ${\hat{β}}_{1}, {\hat{β}}_{2}, \hat{λ}$ are smaller and smaller. It is shown in Table 2 that MSEs of the non-spatial model are larger than those of the spatial model and the differences become larger with $λ$ increases.

Reversely, we generate data from a data generative design, which is the same as the one described above, except that we set $λ$ to 0. Then we adopt spatial and non-spatial models to fit the data. For the spatial model, we take the spatial weight matrix $W$ as described above. Results are shown in Table 3.

Table 3.

Results of $\hat{λ}$ and $\hat{β} = ({\hat{β}}_{1}, {\hat{β}}_{2})$ .

		average			MSE
method	index	$n = 100$	200	300	$n = 100$	200	300
spatial model	$\bar{\hat{λ}}$	$- 0.0110$	$- 0.0077$	$- 0.0017$	0.0039	0.0014	0.0007
	${\bar{\hat{β}}}_{1} (\hat{λ})$	[ 0.9935, 2.0050]	[0.9956, 2.0068]	[0.9968, 1.9986]	0.0248	0.0115	0.0084
	${\bar{\hat{β}}}_{2} (\hat{λ})$	[1.5029, 2.4994]	[1.5001, 2.4973]	[1.5004, 2.4997]	0.0028	0.0013	0.0010
non-spatial model	${\bar{\hat{β}}}_{1}$	[0.9993, 1.9993]	[1.0013, 2.0013]	[0.9977, 1.9977]	0.0203	0.0097	0.0074
	${\bar{\hat{β}}}_{2}$	[1.5011, 2.5011]	[1.4986, 2.4986]	[1.5000, 2.5000]	0.0026	0.0013	0.0010

Table 3 implies that we can get good results with a spatial model even if the data is generated from a linear model.

5. Empirical Analysis

In this section, we choose three data sets to evaluate our proposed method. For every data set, We compare the linear regression model (LRM)( $λ = 0$ in (3.1)) and the interval-valued spatial error model (IVSEM), whose estimation methods are introduced in Section 4.

5.1. Temperature Data Set

5.1.1. Data Preparation

In this part, we select data of 31 cities from 31 provinces, autonomous regions in China (Excluding Hong Kong, Macao and Taiwan). Concretely, the response variable is air temperature, which is an interval consisting of the lowest and highest air temperatures on July 8, 2021 and the explanatory variable is latitude. The temperature data is sourced from Baidu Weather Forecast and the latitude data is collected from the website https://cn.bing.com/maps. Table 4 summarizes the corresponding data and Figure 1 displays the mean temperature of the day.

Figure 1.

Mean Temperatures on July 8, 2021.

Table 4.

Temperature Data Set.

City	Minimum Temperature	Maximum Temperature	Latitude
Hefei	24	29	31.82
Beijing	22	33	39.91
Chongqing	25	34	30.05
Fuzhou	27	38	26.07
Lanzhou	20	36	36.06
Guangzhou	27	34	23.13
Nanning	25	33	22.82
Guiyang	21	29	26.65
Haikou	26	33	20.04
Shijiazhuang	24	37	38.04
Haerbin	20	25	45.80
Zhengzhou	26	37	34.75
Wuhan	27	33	30.59
Changsha	25	33	28.23
Nanjing	26	29	32.06
Nanchang	28	35	28.68
Changchun	20	27	43.82
Shenyang	20	27	41.68
Huhehaote	19	31	40.84
Yinchuan	20	35	38.49
Xining	14	29	36.62
Xian	25	36	34.34
Jinan	25	33	36.65
Shanghai	26	32	31.08
Taiyuan	19	32	37.87
Chengdu	23	29	30.57
Tianjin	24	34	39.29
Wulumuqi	25	33	43.51
Lasa	12	23	29.65
Kunming	18	27	24.88
Hangzhou	27	35	30.25

Before modeling and analysis, spatial autocorrelation test is conducted on the data. Firstly, the spatial autocorrelation test for model (3.1) is based on the spatial weight matrix, which is selected as the distance-based weight matrix $W$ . Next, we introduce its calculation process.

Step 1 For the $i$ -th city and the $j$ -th city, we obtain their locations represented by latitude and longitude $({la}_{i}, {lo}_{i})$ and $({la}_{j}, {lo}_{j})$ respectively.

Step 2 Convert degrees to radians, i.e., we transform an angle $α$ into $α \times π / 180$ . Denote

\begin{aligned} ϕ_{i} & = l a_{i} \times π / 180, ψ_{i} = l o_{i} \times π / 180. \\ ϕ_{j} & = l a_{j} \times π / 180, ψ_{j} = l o_{j} \times π / 180. \end{aligned}

Step 3 Calculate the distance between two cities with Haversine formula, which is widely used for calculating geographic spatial distances. For more information about geospatial distance, readers can refer to Sinnott (2984), Ningchuan (2016).

\begin{aligned} a & = \sin^{2} (\frac{ϕ_{j} - ϕ_{i}}{2}) + \cos (ϕ_{i}) \cos (ϕ_{j}) \sin^{2} (\frac{ψ_{j} - ψ_{i}}{2}), \\ c & = 2 \arcsin min (1, \sqrt{a}), \\ d_{i j} & = R \cdot c, \end{aligned}

where

R

denotes the mean radius of earth, here we take

R = 6371 k m

Step 4 If $d_{i j}$ equals to 0, we set $w_{i j} = d_{i j}$ , otherwise we set $w_{i j} = 1 / d_{i j}^{2}$ . At last, we perform row regularization on $W = (w_{i j}) \in R^{31 \times 31}$ .

Spatial autocorrelation is based on the dependent variable, which is denoted by the interval consisted of the lowest temperature and the highest temperature. Therefore, in the spatial autocorrelation test, general linear models can be used to model the lower and upper endpoints of the interval respectively.

One of the main methods of the spatial autocorrelation test is to conduct global or local Moran’s I test. As can be seen from the Table 5, for the global Moran’s I test, the p values are less than the significant level of 0.05. Therefore, the original hypothesis is rejected and it is considered that the minimum and maximum temperatures of 31 provinces, cities and autonomous regions in China have a certain spatial autocorrelation.

Table 5.

Global Moran’s I Test Results of the Temperature Data Set.

	Minimum Temperature	Maximum Temperature
p-value	0.0002	0.0077

Figures 2 and 3 are Moran scatter plots. It can be seen that the lowest and highest gas temperatures of 31 regions in China have positive autocorrelation, that is, the trend of high high and low low.

Figure 2.

The Correlation Between the Minimum Temperature and the Spatially Lagged Minimum Temperature.

Figure 3.

The Correlation Between the Maximum Temperature and the Spatially Lagged Maximum Temperature.

5.1.2. Parameter Estimation

For the interval-valued spatial error model, the distance-based weight matrix is established in the way as that described in the subsection 5.1, except that step 4 is changed to: if $d_{i j}$ is greater than 400(km) or $d_{i j} = 0$ , we set $w_{i j} = 0$ , otherwise we set $w_{i j} = 1 / d_{i j}^{2}$ . In fact, we have considered three weight matrices including a Queen contiguity matrix whose element is 1 if the pair of provinces or autonomous regions where 31 cities are located share a common border or vertex, a distance-based matrix without a threshold introduced in the subsection 5.1, and a distance-based matrix with a threshold introduced above. We found that the result employing the third weight matrix with the threshold 400(km) is satisfying, which implied that the choice of the threshold is important. Meanwhile, we noted that the proportions of zero elements in these three matrices are respectively 85.85%, 3.23% and 93.96%, which implies that the impact of spatial correlation of 31 cities on the dependent variable is not very large.

We present interval valued parameter estimation results in Table 6 and obtain the estimate of $λ$ as 0.6115. Then we can calculate the fitted values ${\hat{y}}_{i}$ using algorithm 1. We evaluate the goodness-of-fit by the sum of squares of the residuals (SSR) given by (5.1). Table 7 shows SSR of IVSEM and LRM. It can be seen that the IVSEM is better than LRM in terms of estimation, but it requires determining an appropriate spatial weight matrix.

SSR = \frac{1}{n} d_{2}^{2} (y_{i}, {\hat{y}}_{i}) .

(5.1)

Table 6.

Interval Valued Parameter Estimation Results.

	LRM	IVSEM
$\hat{β_{1}}$	(32.2490; 2.6806)	(28.7973; 4.2377)
$\hat{β_{2}}$	(-0.1443; 0.0555)	(-0.0547; 0.0107)

Table 7.

SSR of Two Models for the Temperature Data Set.

	LRM	IVSEM
SSR	30.8145	27.4867

Refer to Songgui et al. (2004), we perform a regression diagnostic by plotting studentized residuals. First, we calculate studentized residuals in the next steps.

Step 1 Formulate (3.1) as

{\begin{cases} c_{Y} = X c_{β} + u \\ r_{Y} = | X | r_{β} \end{cases}, u = λ W u + ε .

Step 2 Obtain the estimates ${\hat{c}}_{Y}$ and $\hat{λ}$ in algorithm 1.

Step 3 Calculate

\hat{u} = c_{Y} - {\hat{c}}_{Y}, \hat{ε} = (I - \hat{λ} W) \hat{u}, P_{X} = X {(X^{T} X)}^{- 1} X^{T}, {\hat{r}}_{i} = \frac{{\hat{ε}}_{i}}{\sqrt{{\hat{c}}_{σ^{2}} (1 - p_{i i})}}, i = 1, \dots, n,

where

{\hat{c}}_{σ^{2}}

is defined in Theorem 3.6,

p_{i i}

is the i-th diagonal entry of

P_{X}

Second, we plot studentized residuals in Figure 4. It is visible that points $(i, {\hat{r}}_{i}), (i = 1, \dots, n)$ are roughly located within the horizontal band $| r_{i} | \leq 2$ . In the meanwhile, they do not show any trend. This phenomenon is consistent with the model assumption that error term follows the normal distribution, i.e., $ε \sim N (0, σ^{2} I_{n})$ . Therefore, we consider the assumption of normality of the error term to be basically reasonable.

Figure 4.

Studentized Residuals of the Spatial Model.

Figure 5, shows the fitted values of the IVSEM, where ${\hat{c}}_{Y}$ and ${\hat{r}}_{Y}$ represent the center and radius of temperatures. It is observed in Figure 5 that the line representing ${\hat{c}}_{Y}$ has a downward trend, which means that the temperature and latitude of the 31 cities in China are negatively correlated. At the same time, it can also be seen that the line representing ${\hat{r}}_{Y}$ has a upward trend, i.e. the temperature difference (the difference between the maximum temperature and the minimum temperature) tends to expand with the increase of latitude, which is consistent with the large diurnal temperature difference in northwest and northeast of China. The characteristics of small temperature difference between day and night in central region of China, southeast and southwest of China.

Figure 5.

Fitted Lines of IVSEM.

5.2. AQI Data Set

Inspired by Qingqing et al. (2025), we collect air quality index(AQI) and real estate data sets in a similar way to how Qingqing et al. (2025) does. The AQI dataset collected from the website https://www.weather.com.cn/air/ is presented in Tables 8. We collect data regarding the AQI, air humidity, and wind speed for 31 cities on 31 July 2025 hour by hour, then we take the maximum value and the minimum value for AQI and the average value for other indicators.

We select a distance-based spatial weight matrix with a specific threshold, which is determined as the distance between Wulumuqi and its nearest neighboring city. This threshold is chosen deliberately to ensure that the sum of each row in the spatial weight matrix equals 1. Based on the spatial weight matrix, we conduct the global Moran’s I tests for the minimum and maximum air quality indices(AQIs) respectively. Table 9 shows that AQIs of cities exist spatial correlation. Therefore, we build IVSEM for the data set and compare it to LRM in Table 10. The result indicates that it is better to consider the spatial information.

Table 8.
AQI Data Set.

City Minimum AQI Maximum AQI mean Relative Humidity mean Wind Force

Hefei 12 29 93.21 3.00

Beijing 24 43 73.67 1.21

Chongqing 27 42 50.25 1.67

Fuzhou 27 42 70.38 1.75

Lanzhou 30 105 50.04 1.29

Guangzhou 16 45 90.08 0.88

Nanning 25 53 85.79 1.00

Guiyang 19 33 71.88 1.67

Haikou 17 33 72.38 2.00

Shijiazhuang 39 52 72.30 1.58

Haerbin 15 54 69.21 1.88

Zhengzhou 36 49 55.42 1.83

Wuhan 25 40 63.83 1.54

Changsha 23 51 75.92 1.29

Nanjing 9 15 100.00 2.71

Nanchang 10 32 58.42 1.79

Changchun 15 31 58.75 1.71

Shenyang 28 55 82.38 0.92

Huhehaote 18 52 68.83 1.83

Yinchuan 23 33 85.29 1.33

Xining 58 97 61.96 2.21

Xian 30 49 35.92 1.38

Jinan 29 45 71.92 2.13

Shanghai 9 44 90.25 0.75

Taiyuan 42 70 62.96 1.63

Chengdu 34 63 69.29 0.54

Tianjin 24 64 59.33 1.21

Wulumuqi 21 53 57.38 2.04

Lasa 12 33 45.54 1.42

Kunming 18 35 79.42 1.63

Hangzhou 8 19 76.83 2.54

City	Minimum AQI	Maximum AQI	mean Relative Humidity	mean Wind Force
Hefei	12	29	93.21	3.00
Beijing	24	43	73.67	1.21
Chongqing	27	42	50.25	1.67
Fuzhou	27	42	70.38	1.75
Lanzhou	30	105	50.04	1.29
Guangzhou	16	45	90.08	0.88
Nanning	25	53	85.79	1.00
Guiyang	19	33	71.88	1.67
Haikou	17	33	72.38	2.00
Shijiazhuang	39	52	72.30	1.58
Haerbin	15	54	69.21	1.88
Zhengzhou	36	49	55.42	1.83
Wuhan	25	40	63.83	1.54
Changsha	23	51	75.92	1.29
Nanjing	9	15	100.00	2.71
Nanchang	10	32	58.42	1.79
Changchun	15	31	58.75	1.71
Shenyang	28	55	82.38	0.92
Huhehaote	18	52	68.83	1.83
Yinchuan	23	33	85.29	1.33
Xining	58	97	61.96	2.21
Xian	30	49	35.92	1.38
Jinan	29	45	71.92	2.13
Shanghai	9	44	90.25	0.75
Taiyuan	42	70	62.96	1.63
Chengdu	34	63	69.29	0.54
Tianjin	24	64	59.33	1.21
Wulumuqi	21	53	57.38	2.04
Lasa	12	33	45.54	1.42
Kunming	18	35	79.42	1.63
Hangzhou	8	19	76.83	2.54

Table 9.

Global Moran’s I Test Results of the AQI Data Set.

	Minimum AQI	Maximum AQI
p-value	0.0013	0.0001

Table 10.

SSR of Two Models for the AQI Data Set.

	LRM	IVSEM
SSR	446.2418	392.4215

5.3. Real Estate Data Set

The data set collected from the website https://data.stats.gov.cn/ is shown in Table 11. The units for each indicator in the data are ten thousand yuan per square meter, ten thousand yuan per square meter, hundred million yuan, and yuan. For the maximum and minimum prices,we initially calculate the average monthly sales of commercial housing in each city, then take the highest and lowest values among these averages. We choose Queen contiguity matrix as the spatial weight matrix. Based on the spatial weight matrix, we carry out the global Moran’s I tests for the minimum and maximum prices respectively and build two models for the data set. Table 12 exhibits that prices of real estate of cities have spatial correlation. Table 13 indicates that is compared to LRM, IVSEM is somewhat more suitable for the data set.

Table 11.
Real Estate Data Set.

City Minimum Price Maximum Price development Investments per capita Disposable Income

Anhui Province 0.6355 0.9144 4016.1000 36782

Beijing City 2.3710 3.8895 3758.2900 85415

Chongqing City 0.6397 0.7814 2565.7900 39713

Fujian Province 0.8948 1.2336 3474.0800 47857

Gansu Province 0.5573 0.6149 1166.7500 26612

Guangdong Province 1.3850 1.8223 11197.9700 51474

Guangxizhuangzu AR 0.4977 0.5912 1163.0200 31125

Guizhou Province 0.5324 0.5917 1191.3200 28561

Hainan Province 1.3017 1.7290 1207.6300 34829

Hebei Province 0.7846 0.8540 2885.4000 34665

Heilongjiang Province 0.5919 0.6918 330.3600 31269

Henan Province 0.6018 0.6600 3908.4100 31552

Hubei Province 0.7152 0.9050 5146.6800 36947

Hunan Province 0.5639 0.7030 3350.5900 37679

Jiangsu Province 0.9809 1.1152 10701.8600 55415

Jiangxi Province 0.6386 0.7321 1470.1500 36007

Jilin Province 0.5092 0.7782 640.5300 31318

Liaoning Province 0.6997 0.8161 1393.6700 39844

Neimenggu AR 0.6115 0.7471 937.3800 40077

Ningxia Hui people AR 0.6106 0.7538 422.5200 33355

Qinghai Province 0.5570 0.7703 146.1000 30117

Shanxi Province 0.8739 1.2557 3294.5200 33905

Shandong Province 0.7542 0.8498 7544.1500 42077

Shanghai City 3.2398 5.0993 6228.9100 88366

Shanxi Province 0.6787 0.7219 1670.9200 32441

Sichuan Province 0.8446 1.0116 4793.5300 34325

Tianjin City 1.1506 1.7942 1262.2300 53581 1

Xinjing Uygur AR 0.5507 0.6608 1154.8600 30899

Xizang AR 0.7686 1.1397 45.2500 31358

Yunnan Province 0.6524 0.7404 1228.5900 29932

Zhejiang Province 1.3228 2.0797 11982.6700 67013

City	Minimum Price	Maximum Price	development Investments	per capita Disposable Income
Anhui Province	0.6355	0.9144	4016.1000	36782
Beijing City	2.3710	3.8895	3758.2900	85415
Chongqing City	0.6397	0.7814	2565.7900	39713
Fujian Province	0.8948	1.2336	3474.0800	47857
Gansu Province	0.5573	0.6149	1166.7500	26612
Guangdong Province	1.3850	1.8223	11197.9700	51474
Guangxizhuangzu AR	0.4977	0.5912	1163.0200	31125
Guizhou Province	0.5324	0.5917	1191.3200	28561
Hainan Province	1.3017	1.7290	1207.6300	34829
Hebei Province	0.7846	0.8540	2885.4000	34665
Heilongjiang Province	0.5919	0.6918	330.3600	31269
Henan Province	0.6018	0.6600	3908.4100	31552
Hubei Province	0.7152	0.9050	5146.6800	36947
Hunan Province	0.5639	0.7030	3350.5900	37679
Jiangsu Province	0.9809	1.1152	10701.8600	55415
Jiangxi Province	0.6386	0.7321	1470.1500	36007
Jilin Province	0.5092	0.7782	640.5300	31318
Liaoning Province	0.6997	0.8161	1393.6700	39844
Neimenggu AR	0.6115	0.7471	937.3800	40077
Ningxia Hui people AR	0.6106	0.7538	422.5200	33355
Qinghai Province	0.5570	0.7703	146.1000	30117
Shanxi Province	0.8739	1.2557	3294.5200	33905
Shandong Province	0.7542	0.8498	7544.1500	42077
Shanghai City	3.2398	5.0993	6228.9100	88366
Shanxi Province	0.6787	0.7219	1670.9200	32441
Sichuan Province	0.8446	1.0116	4793.5300	34325
Tianjin City	1.1506	1.7942	1262.2300	53581 1
Xinjing Uygur AR	0.5507	0.6608	1154.8600	30899
Xizang AR	0.7686	1.1397	45.2500	31358
Yunnan Province	0.6524	0.7404	1228.5900	29932
Zhejiang Province	1.3228	2.0797	11982.6700	67013

Table 12.

Global Moran’s I Test Results of the Real Estate Data Set.

	Minimum Price	Maximum Price
p-value	0.0195	0.0242

Table 13.

SSR of Two Models for the Real Estate Data Set.

	LRM	IVSEM
SSR	0.1921	0.1893

6. Conclusion and Discussion

This paper proposed a novel model, interval-valued spatial error model, for modeling and analyzing interval-valued data with spatial dependence. This model has two advantages, firstly, it can handle the interval-valued data instead of traditional point value data. Secondly, it considers the interdependence between spatial units. This model considers both data uncertainty and spatial dependence simultaneously. In this paper, We give the estimation method for unknown parameters, and prove the properties of estimation. The experimental simulation and real case displayed the advantages of interval-valued spatial error model. Through simulation comparison with non-spatial model, we can easily see that the proposed model is effective. When $λ = 0$ , the model degenerates into an interval-valued linear model. Specially, when $X$ and $Y$ are single-valued, then the model becomes an classical spatial linear model.

The model in this paper is only available for some interval-valued data, such as when the response variable $Y$ is interval value and the explanatory variable $X$ is single point value. For example, $Y$ and $X$ represent temperature and latitude, respectively, or represent the human blood pressure and the time respectively. There are various types of interval data in real life, such that when the response variable $Y$ and the explanatory variable $X$ are both interval values. However, this type of data was not considered in this article. So there is still much work to be done and considered in the modeling and analyzing interval-valued data.

As a quantitative expression of uncertainty, interval-valued data has wide applications in fields such as financial risk assessment, environmental monitoring, and social surveys. However, there are still significant gaps in current research in areas such as statistical analysis of interval-valued data with missing data, interval-valued nonlinear models, interval-valued nonparametric models and modeling analysis of interval-valued missing data, etc. And the statistical modeling analysis of interval-valued data has theoretical significance. It is of great significance for its application in other fields.

Footnotes

Acknowledgment

The authors are grateful to the anonymous editors and reviewers for their valuable comments and suggestions, which improved this paper. The authors would like to thank the National Social Science Fund of China No.19BTJ017 for its financial support.

ORCID iD

Wei Zhang

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Conflict of Interest Statement

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Anselin

(1988). Spatial econometrics: Methods and models. Dordrecht: Kluwer Academic Publishers.

Aumann

R. J.

(1965). Integrals of set-valued functions. Journal of Mathematical Analysis and Applications, 12(1), 1–12.

Billard

Diday

(2000). Regression analysis for interval-valued data. In Conference of the international federation of classifification societies (pp. 369–374). Springer-Verlag.

Billard

Diday

(2002). Symbolic regression analysis. Studies in Classification Data Analysis and Knowledge Organization, 10, 281–288.

Blanco Fernandez

Corral

Gonzalez-Redriguez

Lubiano

M. A.

(2008). Some properties of the dk-variance for interval-valued sets. In D. Dubois, et al. (Eds.), Soft methods for handling variability and imprecision (Vol. 48, pp. 331–337). ASC.

Guan

(2024). Interval-valued linear regression model with an asymmetric laplace distribution. Journal of the Korean Statistical Society, 14, 377.

Hess

(1991). On multivalued martingales whose values may be unbounded: Martingale selectors and mosco convergence. Journal of Multivariate Analysis, 39, 175–201.

Hiai

Umegaki

(1977). Integrals, conditional expectations, and martingales of multivalued functions. Journal of Multivariate Analysis, 7(1), 149–182.

Hillier

Martellosio

(2018). Exact and higher-order properties of the mle in spatial autoregressive models, with applications to inference. Journal of Econometrics, 205(2), 402–422.

10.

Ida

(2003). Portfolio selection problem with interval coefficients. Applied Mathematics Letters, 16, 709–713.

11.

Ida

(2004). Solutions for the portfolio selection problem with interval and fuzzy coefficients. Reliable Computing, 1, 389–400.

12.

(2010). Stochastic integral with respect to set-valued square integrable martingales. Journal of Mathematical Analysis and Applications, 370, 659–671.

13.

Ogura

(1998). Convergence of set valued sub- and supermartingales in the kuratowskiMosco sense. The Annuals of Probability, 26, 1384–1402.

14.

Ogura

(1999). Convergence of set-valued and fuzzy-valued martingales. Fuzzy Sets and Systems, 101, 453–461.

15.

Ogura

Kreinovich

(2002). Limit theorems and applications of set-valued random variables. Netherlands: Kluwer academic publishers(Springer).

16.

Zhang

Wang

(2017). Interval-valued risk measure models and empirical analysis. In Fuzzy systems association, international conference on soft computing & intelligent systems IEEE.

17.

Lima Neto

de Carvalho

(2008). Centre and range method for fitting a linear regression model to symbolic interval data. Computational Statistics and Data Analysis, 52, 1500–1515.

18.

Lima Neto

de Carvalho

(2010). Constrained linear regression models for symbolic interval-valued variables. Computational Statistics and Data Analysis, 54, 333–347.

19.

Lyashenko

N. N.

(1982). Limit theorems for sums of independent compact random subsets of a Euclidean space. Journal of Mathematical Sciences, 20(3), 2187–2196.

20.

Lyashenko

N. N.

(1983). Statistics of random compacts in Euclidean space. Journal of Mathematical Sciences, 21(1), 76–92.

21.

Molchanov

I. S.

(2005). Theory of random sets. Berlin: Springer.

22.

Neto

E. D. A. L.

de Carvalho

F. D. A.

(2018). An exponential-type kernel robust regression model for interval-valued variables. Information Sciences, 445, 419–442.

23.

Ningchuan

(2016). GIS algorithms: Theory and applications for geographic information science & technology. London: SAGE Publications Ltd.

24.

Papageorgiou

N. S.

(1985). On the theory of banach space valued multifunction. 2. set valued martingales and set valued measures. Journal of Multivariate Analysis, 17, 207–227.

25.

Papageorgiou

N. S.

(1995). On the conditional expectation and convergence properties of random sets. Transactions of the American Mathematical Society, 347(8), 2495–2515.

26.

Prucha

K. I. R.

(2010). A generalized moments estimator for the autoregressive parameter in a spatial model. International Economic Review, 40(2), 509–533.

27.

Qingqing

Ruizhuo

Aibing

Hongyan

(2025). Fixed effects spatial panel interval-valued autoregressive models and applications. Spatial Statistics, 65, 1–20.

28.

Sinnott

R. W.

(2984). Virtues of the haversine. Sky and Telescope, 68(2), 158–159.

29.

Songgui

Jianhong

Yin

(2004). Introduction to Linear Models. Beijing: Science Press.

30.

Souza

Amaral

Filho

(2017). A parametrized approach for linear regression of interval data. Knowledge-Based Systems, 131, 149–159.

31.

Thierry

Prakash

P. S.

(2020). An interval-valued utility theory for decision making with dempster-shafer belief functions. International Journal of Approximate Reasoning, 124, 194–C216.

32.

Vitale

(1985). Lp metrics for compact, convex sets. Journal of Approximation Theory, 45(3), 280–287.

33.

Wang

Guan

(2012). Linear regression of interval-valued data based on complete information in hypercubes. Journal of Systems Science and Systems Engineering (English Edition), 21(4), 422–442.

34.

Wang

Denoeux

(2015). Interval-valued linear model. International Journal of Computational Intelligence Systems, 8(1), 114–127.

35.

Yang

(2005). The dp-metric space of set-valued random variables and its application to covariances. International Journal of Innovative Computing, Information and Control, 1, 73–82.

36.

Yildirim

Kantar

Y. M.

(2020). Robust estimation approach for spatial error model. Journal of Statistical Computation and Simulation, 90(3), 1–21.

Statistical Inference for Interval-Valued Spatial Error Models

Abstract

Keywords

1. Introduction

2. Preliminaries on Interval-Valued Random Variables

2.1. d p Distance and D p Distance

2.2. Moment of Set-Valued Random Variables

3. Interval-valued Spatial Error Model

5.1. Temperature Data Set

5.1.1. Data Preparation

Footnotes

Acknowledgment

ORCID iD

Funding

Conflict of Interest Statement

References

2.1. $d_{p}$ Distance and $D_{p}$ Distance