Separating Biological Variance from Noise by Applying Expectation–Maximization Algorithm to Modified General Linear Model

Abstract

The general linear model (GLM) has been widely used in research, where the error term has been treated as noise. However, compelling evidence suggests that in biological systems, the target variables may possess their innate variances. A modified GLM was proposed to explicitly model biological variance and nonbiological noise. Using the expectation and maximization (EM) scheme can distinguish biological variance from noise, termed EMSEV (EM for separating variances). The performance of EMSEV was evaluated by varying noise levels, dimensions of the design matrix, and covariance structures of the target variables. The deviation between EMSEV outputs and the predefined distribution parameters increased with noise level. With a proper initial guess, when the noise magnitude and the variance of the target variables were similar, there were deviations of 3% and 10%–16% in the estimated mean and covariance of the target variables, respectively, along with a 1.7% deviation in noise estimation. EMSEV appears promising for distinguishing signal variance from noise in biological systems. The potential applications and implications in biological science and statistical inference are discussed.

Keywords

design matrix expectation–maximization algorithm general linear model global optimum local optimum

1. INTRODUCTION

The general linear model (GLM), typically formulated as Y = Λβ + ε, has been extensively utilized in research. Y and Λ represent the dependent and independent variables, while β and ε denote the unknown mean effects and the unexplained errors, respectively. Despite its concise formulation, the GLM has spawned numerous variations to embrace diverse analytical needs. One such extension is the generalized linear model (McCullagh, 2019), which accommodates response variables with error distributions that deviate from the Gaussian. Another important variant is the mixed-effects model (McCulloch et al., 2001), which integrates both fixed and random effects, making it particularly suitable for hierarchical or nested data structures. These adaptations have substantially broadened the applicability of GLM techniques across various statistical modeling scenarios, thereby enhancing the ability to analyze complex data structures with greater accuracy and precision.

Notably, the treatment of the error term ε reveals a significant distinction between biological and nonbiological systems. In engineering contexts (nonbiological), it is conventional to attribute all unaccounted variance to noise. Consequently, under the assumption that ε is independently and identically distributed (i.i.d.), the variance of β_{_est} (estimate of β) can be derived as var(ε)(Λ^TΛ)⁻¹ (hereafter, the suffix “ _{_est} ” is used to denote an estimate of a parameter). However, in biological systems, compelling evidence suggests that biopsychological indicators (e.g., β) possess intrinsic variances that warrant separate consideration. In psychophysics, the principle of scale invariance relating the mean to the variance is encapsulated by Weber’s law (e.g., Fig. 2 in Chater and Brown, 2008). Neurophysiological research has demonstrated that stimulus onset can induce a reduction in neural variability (Churchland et al., 2010). These findings (among many others) underscore that ε in biological systems is not merely “noise” but also includes physiologically meaningful components that have historically been overlooked within the GLM framework.

This investigation commenced with an explicit modeling of the variance components in the GLM, expressed as Y = Λβ + ε, where β ∼ N(m,Ф) and ε ∼ N(0,Ψ). In this formulation, m and Φ represent the mean vector and covariance matrix of β, respectively, while Ψ denotes the covariance matrix of ε, under the assumption of multivariate normal distributions and i.i.d. This approach diverges from the conventional GLM by explicitly accounting for the variability in both the coefficients (β) and the error term (ε). To estimate the parameters of this modified GLM, the expectation–maximization (EM) algorithm was used. The EM algorithm, an iterative method for finding maximum likelihood estimates, was particularly suitable due to its ability to handle the multiple hidden variables introduced in this model, named EMSEV for convenience hereafter (EM for separating variances).

Subsequently, simulations using a public data set were conducted to evaluate the validity of EMSEV. A strategy of “sliding design matrix” was utilized that leveraged existing psychological or physiological experimental structures as a basis, thus eliminating the need for extensive repetitive experiments. The performance of EMSEV was examined by testing it across various degrees of noise, titrating the ratios between the variances of ε and β from 0.1 to 10. In addition, different covariance structures of β were explored. The contexts in which this modified GLM finds application are also discussed, providing insights into its potential utility across diverse research scenarios.

2. MATERIALS AND METHODS

2.1. Mathematical derivation for solving EMSEV

The detailed derivation is provided in the first part of the Supplementary Data. The modified GLM is formulated as follows: Y_n = Λβ + ε, where β ∼ N(m,Ф), ε ∼ N(0,Ψ), and n ∈ 1∼N (N samples). Let ϴ denote parameter space {m,Ф,Ψ}. The key equations are Equations (14) and (15). Ф can either be a full matrix or a diagonal matrix, depending on whether the elements of β are assumed to be correlated or independent. Similarly, Ψ shares the same flexibility in structure; however, it is typically assumed to be diagonal, reflecting the common assumption that noise sources are independent of each other. The elements on the diagonal of Ψ can either be identical (isotropic noise) or distinct (heteroscedastic noise), depending on the assumed characteristics of the noise sources.

E step: Calculating the posterior distribution of β at sample n (ϴ is obtained from a previous M step)

Q_{n} (β) = P (β | Y_{n}, ϴ) = P (β, Y_{n} | ϴ) / P (Y_{n} | ϴ); Q_{n} (β) \sim N (μ_{n}, Ϭ_{n})

The denominator P(Y_n,ϴ) is irrelevant to β, so just focus on the nominator:

\begin{array}{l} P (β, Y_{n} | ϴ) = P (β | ϴ) P (Y_{n} | β, ϴ) = C_{1} {| Ф |}^{- 0.5} {| Ψ |}^{- 0.5} exp {- 1 / 2 {(β - m)}^{⊤} inv (Ф) (β - m)} \\ exp {- 1 / 2 {(Y_{n} - Λ β)}^{⊤} inv (Ψ) (Y_{n} - Λ β)} \end{array}

(1)

Since the product of Gaussian probability density functions retains a Gaussian distribution, the focus shifts to the exponential component, and rewrite Equation (1) to obtain the following:

\begin{array}{l} Q_{n} (β) = C_{2} exp {- 1 / 2 [β^{⊤} (inv (Ф) + Λ^{⊤} inv (Ψ) Λ) β - 2 β^{⊤} (inv (Ф) m + Λ^{⊤} inv (Ψ) Y_{n})]} \\ \sim N (μ_{n}, Ϭ_{n}) = C_{3} exp {- 1 / 2 [β^{⊤} inv (Ϭ_{n}) β - 2 β^{⊤} inv (Ϭ_{n}) μ_{n} + {μ_{n}}^{⊤} inv (Ϭ_{n}) μ_{n}]} \end{array}

\Rightarrow inv (Ϭ_{n}) = (inv (Ф) + Λ^{⊤} inv (Ψ) Λ),

(2)

{and μ}_{n} = Ϭ_{n} (inv (Ф) m + Λ^{⊤} inv (Ψ) Y_{n})

(3)

The distribution of Q_n(β) is thus derived. For a fixed design matrix Λ, Ϭ_n = Ϭ since it is irrelevant to sample Y_n. Ϭ is used hereafter. In addition, it is evident that

E_{Qn} [β] = μ_{n}

(4)

E_{Qn} [β β^{⊤}] = μ_{n} {μ_{n}}^{⊤} + Ϭ

(5)

Based on Jensen’s inequality, a lower bound of the log-likelihood L(ϴ) for the entire data set,

\sum \log (P (Y_{n} | ϴ)), is set as \sum E_{Qn} [\log (P (β | ϴ) P (Y_{n} | β, ϴ))] + constant,

(6)

where ∑ denotes summation over N samples (the same applies below).

M step: Taking derivative of Equation (6) with respect to ϴ.

For now, focus on E_Qn[log(P(β|ϴ)P(Y_n|β,ϴ)] for a particular sample n, that is, taking expectation of log(P(β|ϴ)P(Y_n|β,ϴ)) over Q_n(β).

From Equations (1) and (2):

\begin{array}{l} log (P (β | ϴ) P (Y_{n} | β, ϴ)) = C_{4} - 0.5 log (| Ф |) - 0.5 log (| Ψ |) - 1 / 2 [β^{⊤} inv (Ф) β - 2 β^{⊤} inv (Ф) m + m^{⊤} inv (Ф) m] \\ - 1 / 2 [Y_{n}^{⊤} inv (Ψ) Y_{n} - 2 β^{⊤} Λ^{⊤} inv (Ψ) Y_{n} + β^{⊤} Λ^{⊤} inv (Ψ) Λ β] \end{array}

(7)

Equation (7) can be divided into three parts. Integrating (taking expectation) over Q_n(β) for each part with the assistance of Equations (4) and (5) generates:

a . 0.5 log (| Ф |) + 0.5 log (| Ψ |)

(8)

b . [tr (inv (Ф) (μ_{n} μ_{n}^{⊤} + Ϭ)) - 2 tr (inv (Ф) {mμ}_{n}^{⊤}) + tr (inv (Ф) {mm}^{⊤})]

(9)

c . [Y_{n}^{⊤} inv (Ψ) Y_{n} - 2 tr (inv (Ψ) Y_{n} μ_{n}^{⊤} Λ^{⊤}) + tr (inv (Ψ) Λ (μ_{n} μ_{n}^{⊤} + Ϭ) Λ^{⊤})]

(10)

Summation over N samples obtain: F = C − NEq(8) $- 1 / 2 \sum$ Eq(9) $- 1 / 2 \sum$ Eq(10).

Ф and Ψ are treated as covariance matrices, with the standard property of symmetry. Now, first, taking derivative of F [or Eq. (6)] with respect to inv(Ψ) and set it to zero [from Eqs. (8) and (10)]:

\begin{array}{l} \partial F / \partial inv (Ψ) = 0 \Rightarrow 2 Ψ_{_est} - diag (Ψ_{_est}) = 2 Λ Ϭ Λ^{⊤} - diag (Λ Ϭ Λ^{⊤}) + 2 / N \sum [(Y_{n} - Λ μ_{n}) {(Y_{n} - Λ μ_{n})}^{⊤}] \\ - 1 / N \sum [diag ((Y_{n} - Λ μ_{n}) {(Y_{n} - Λ μ_{n})}^{⊤})] \end{array}

(11)

Second, take the derivative of F with respect to inv(Ф) and set it to zero [Eqs. (8) and (9)]:

\begin{array}{l} \partial F / \partial inv (Ф) = 0 \Rightarrow 2 Ф_{_est} - diag (Ф_{_est}) = 2 Ϭ - diag (Ϭ) + 1 / N \sum [2 μ_{n} {μ_{n}}^{⊤} - diag (μ_{n} {μ_{n}}^{⊤}) \\ - 2 μ_{n} m^{⊤} - 2 {mμ}_{n}^{⊤} + 2 diag (μ_{n} m^{⊤}) + 2 {mm}^{⊤} - diag ({mm}^{⊤})] \end{array}

(12)

Note that the optimal solution for F with respect to Ψ (or Φ) remains the same whether the derivative is taken with respect to Ψ or its inverse, and similarly for Φ and its inverse (Magnus and Neudecker, 2019). Last, take derivative of F with respect to m and set it to zero [Eq. (9)]:

\partial F / \partial m = 0 \Rightarrow m_{_est} = 1 / N \sum μ_{n}

(13)

Since μ_n and Ϭ have been obtained from the preceding E step, by computing m in Equation (13), Ф in Equation (12) can be determined. Thus, Ψ, m, and Ф (the three parameters in set ϴ) that maximize the lower bound of L(ϴ) are all derived.

Note that if Λ is allowed to change with Y_n, it will become Λ₁, Λ₂, Λ₃ … coupled with Y₁, Y₂, Y₃ …; and each μ_n has its correspondent Ϭ_n, referring to Equations (2) and (3). In this situation, Equations (11) and (12) need to be updated as follows:

\begin{array}{l} 2 Ψ_{_est} - diag (Ψ_{_est}) = 2 / N \sum [Λ_{n} Ϭ_{n} {Λ_{n}}^{⊤} + (Y_{n} - Λ_{n} μ_{n}) {(Y_{n} - Λ_{n} μ_{n})}^{⊤}] \\ - 1 / N \sum [diag (Λ_{n} Ϭ_{n} {Λ_{n}}^{⊤}) + diag ((Y_{n} - Λ_{n} μ_{n}) {(Y_{n} - Λ_{n} μ_{n})}^{⊤})] \end{array}

(14)

\begin{array}{l} 2 Ф_{_est} - diag (Ф_{_est}) = 2 / N \sum [Ϭ_{n} + (μ_{n} - m_{_est}) {(μ_{n} - m_{_est})}^{⊤}] \\ - 1 / N \sum [diag (Ϭ_{n}) + diag ((μ_{n} - m_{_est}) {(μ_{n} - m_{_est})}^{⊤})] \end{array}

(15)

In the subsequent simulations, Ψ was constrained to be diagonal, and two conditions were examined for Ф: a full (unrestricted) covariance matrix and an independent structure where the off-diagonal elements were set to zero, implying independence among the β components. When either Ψ or Ф was assumed to be diagonal, an additional diagonalization step was incorporated into each iteration of the EM computation.

2.2. Design of simulations to evaluate EMSEV solution performance

The distributions of β are arbitrarily set as follows: mean(β) = [2.5 1.5 0.5 −1] and cov(β) = [2 0.5 0.3 0.2; 0.5 1.5 0.4 0.3; 0.3 0.4 1.8 0.6; 0.2 0.3 0.6 1.7] (cov indicates covariance matrix). The off-diagonal elements indicate dependence between the elements of β. A simpler scenario by setting the off-diagonal elements of cov(β) to zeros was also examined. The mean of ε was zero and its variance was titrated from 1/10 to 10 times the mean of the diagonal of cov(β), that is, (2 + 1.5 + 1.8 + 1.7)/4. To achieve evenly spaced steps in log scale, the variance scaling factors were log-transformed, resulting in a range from −1 to 1 with an interval of 0.2, yielding a total of 11 distinct signal-to-noise ratios (SNRs). The design matrix Λ was loaded from the built-in sample data set “airlinesmall.csv” (columns 5 to 8; with Gram–Schmidt orthonormalization) in MATLAB (The MathWorks, Inc., Version R2023a), and simulations were subsequently executed using this software platform.

As for the design matrix, there are specific considerations. Simulations based on EMSEV with a fixed Λ over hundreds of iterations are not inherently problematic but may not be feasible or practical. In most psychological or physiological experiments, subjects undergo examination in only one or a few sessions (which typically comprise hundreds to thousands of trials). In addition, the dimension of β is usually much smaller than that of Y. Consequently, a strategy involving “sliding design matrix” Λ_n (unit of mini-experiment, see Fig. 1 and Section 4), which is a submatrix of Λ, was devised and is detailed below.

FIG. 1.

Illustration of data set partitioning and sliding strategies for generating mini-experiments. (a) Schematic showing how the entire data set is divided into sequential mini-experiments. (b) Comparison of two sliding strategies for defining each mini-experiment. In Sliding Strategy I, each window contains the same number of data points as regressors (i.e., square design matrices). In Sliding Strategy II, each window contains more data points than regressors, with flexible window lengths allowed.

In the simulation, the dimension of β is 4 × 1 and the required sample number is 500. Using a sliding design matrix Λ_n (4 × 4) and associated data Y_n (4 × 1)— “sliding strategy I” where Y_n, Λ_n, and β_n have the same row number—the dimension of Y 2000 × 1 would suffice. It is imperative to note that “sample number” N and “data point number” N × dimension of Y_n are distinct. Since EM iterations can become trapped in local optimum (Dempster et al., 1977), initializing the EM algorithm with a guess approximating the global optimum is highly desirable. A nice feature of sliding design matrix is that it facilitates deriving effective initial estimates of mean and variances (for both β and ε) by resolving the GLM through conventional methods for N times (N = 500 in this case). Strategy I’s limitation emerges when Λ_n is square and full-rank, causing the error term for each ε = Y_n − Λ_nβ_n to consistently diminish (close to zeros), resulting in suboptimal starting points and increased likelihood of converging to local optimum.

To tackle this issue, an alternative approach termed “sliding strategy II” used a Y_n with vector length of 6 × 1 and 8 × 1, representing a 50% and 100% increase of the data length in sliding strategy I, respectively. A key advantage of sliding strategy II lies in its ability to provide a suitable starting point while at the expense of incurring 50% to 100% extra data points compared with its predecessor (the sample number or the number of mini-experiments remained unchanged). Based on the above inferences, three sets of simulations with different sliding design matrices were explored as follows: (1) dimensions Y_n 4 × 1 and Λ_n 4 × 4, and a step size of 4 (Y_n to β_n dimension ratio = 1.0); (2) dimensions Y_n 6 × 1 and Λ_n 6 × 4, with a moving step 6 (Y_n to β_n dimension ratio = 1.5); and (3) dimensions Y_n 8 × 1 and Λ_n 8 × 4, with a moving step 8 (Y_n to β_n dimension ratio = 2.0). Compared with (2), (3) may provide insights into the potential impact of design matrix size on the results. Regarding “sliding strategy I,” the initial guesses for cov(β) and var(ε) were set to be identical, specifically a half of the covariance of the estimated β_n.

Five hundred samples of β and ε, based on predefined Gaussian parameters, were generated using the MATLAB function “mvnrnd.” It is noticed that at sample number 500, the covariance of 500 ε samples may not conform to i.i.d. assumption (i.e., the off-diagonal elements are not zeros) and the mean may be different from zeros. To address this issue, the error matrix was centered around the mean, and a transformation guided by Cholesky decomposition was applied, as follows. Assuming the 500 samples of ε generated by “mvnrnd” are denoted by RN_N [already mean-centered and hence mean (RN_N)=0] with covariance RV (not necessarily diagonal), and the desired covariance matrix is DV (a diagonal matrix). The Cholesky decomposition of RV and DV, respectively, yields RV_upper and DV_upper. To update RN_N to RN_New with its covariance conforming to DV, the transformation is fulfilled by RN_New = RN_Ninv(RV_upper)DV_upper. It can be verified that cov(RN_New) = (DV_upper)^⊤DV_upper = DV. This procedure ensured consistency in the evaluation of simulations.

The same preprocessing was applied to the mvnrnd-generated β samples. With the β and ε samples following their respective desired distributions, Y_n can be constructed using the GLM formula. Then, Y_n and Λ_n were input into EMSEV to derive the statistical estimates of β and ε. Meanwhile, in the conventional solution to the GLM, β_n = inv(Λ_n^⊤Λ_n)Λ_n^⊤Y_n and ε_n = Y_n − Λ_nβ_n. These conventional solutions (n = 1–500) not only serve as initial guesses but also provide another set of distribution for β and ε, which will be compared with the outcomes of EMSEV. The quality of the EMSEV solution was evaluated using the “Frobenius norm relative deviation,” specifically, by computing the Frobenius norm of the difference between the EMSEV-derived and ideal (predefined) vector or matrix, divided by the Frobenius norm of the latter. The EM iteration terminated when the difference between the relative deviation ratios of successive iterations fell below 10⁻⁷. The above simulations were repeated 100 times, and the reported results were based on the averaged statistics. The implementation of the EMSEV algorithm and the simulation procedures in MATLAB are presented in the second part of the Supplementary Data.

3. RESULTS

There are six sets of simulation results, covering two covariance structures for β (full matrix vs. diagonal matrix) combined with three different sliding design matrices. For each setting, EMSEV was tested across 11 distinct ε-to-β relative variance ratios. The predefined parameters used to generate the data were compared against EMSEV-derived estimates using relative deviation metrics. The performance of EMSEV was systematically influenced by multiple factors, including the number of rows in Λ_n (i.e., the dimension of Y_n), the SNR, the availability of an informative initial guess, and the structural complexity of cov(β). Notably, EMSEV performed better when the covariance structure of β was simpler (diagonal) and when larger Λ_n matrices were used. Across all tested conditions, the estimates of β remained stable and consistent. Given this consistency, the following discussion focuses on cov(β_{_est}) and var(ε), which showed more pronounced sensitivity to the tested conditions.

Most importantly, EMSEV substantially reduced the percentage deviation of distribution parameters compared with conventional GLM estimates. For example, at SNR = 1, EMSEV reduced the estimation error by approximately 6–10 times for cov(β_{_est}) and 23–29 times for var(ε), relative to GLM-derived values (see lower half of Tables 1 and 2). Furthermore, the reliability of the initial guess markedly affected EMSEV performance. Specifically, results from the Λ_n 4 × 4 design were inferior to those from the 6 × 4 and 8 × 4 designs, with relative deviation metrics for cov(β_{_est}) and var(ε) approximately 1.75–2.75 times and 9–15 times higher, respectively, when comparing 4 × 4 against 8 × 4 (see upper half of Tables 1 and 2). Focusing on the condition of SNR = 1 and using “sliding strategy II,” EMSEV achieved deviations of approximately 3% for mean(β), 10%–16% for cov(β), and 1.7% for var(ε). Detailed numerical results are presented in Tables 1 and 2, and visually summarized in Figure 2.

FIG. 2.

Relationship between signal-to-noise ratios [abscissa: –log10 (SNR), ranging from 10 to 1/10] and relative deviation from the underlying distribution (ordinate: %). At –log10 (SNR) = 0, the variances of β and ε are at the same level. Left subplot: Mean of β, with black segments indicating that the values of the four lines were very close. Right subplot: Covariance of β. “Dim” denotes the dimension of the mini-experiment matrix Λ_n (e.g., Dim6: 6 × 4; Dim8: 8 × 4). “DiagCov” refers to simulations where the covariance of β was diagonalized, while “FullCov” indicates that the covariance structure of β remained unchanged.

Table 1.

Percentage of Relative Deviation in Three Distribution Parameters Across Different Signal-to-Noise Ratio Levels, Using the Full (Nondiagonalized) Underlying Covariance Structure of β

	Dim(Λ_n)	4 × 4		6 × 4			8 × 4
SNR	β	cov(β)	var(ε)	β	cov(β)	var(ε)	β	cov(β)	var(ε)
0.1	9.39	175.42	14.27	9.33	87.62	2.18	9.89	88.79	1.64
1.0	3.12	28.15	24.80	3.01	16.06	2.89	3.07	15.88	1.75
10	1.02	5.15	31.32	0.99	4.25	2.84	0.96	4.00	1.61
0.1	N/A			9.33	927.08	66.66	9.89	931.96	50.24
1.0				3.01	93.53	66.91	3.07	93.49	50.21
10				0.99	10.14	66.74	0.96	10.14	49.97

Upper half: Results obtained from EM for separating variances (EMSEV).

Lower half: Results obtained by directly solving the general linear model (GLM) 500 times (sample mean and covariance).

cov, covariance; Dim, dimension of Λ_n; EM, expectation and maximization; N/A, not applicable because the GLM is full-rank, leading to zero error; var, variance.

Table 2.

Percentage of Relative Deviation in Three Distribution Parameters Across Different Signal-to-Noise Ratio Levels, Using a Diagonalized Underlying Covariance Structure of β

	Dim(Λ_n)	4 × 4		6 × 4			8 × 4
SNR	β	cov(β)	var(ε)	β	cov(β)	var(ε)	β	cov(β)	var(ε)
0.1	9.41	155.19	13.56	9.34	60.83	2.18	9.91	59.35	1.60
1.0	3.11	28.43	26.23	3.01	10.79	2.89	3.06	10.29	1.75
10	1.03	4.34	32.22	0.99	2.83	2.84	0.96	2.67	1.61
0.1	N/A			9.34	994.93	66.66	9.91	999.37	50.24
1.0				3.01	99.77	66.91	3.06	99.48	50.21
10				0.99	10.24	66.74	0.96	10.36	49.97

Upper half: Results obtained from EMSEV.

Lower half: Results obtained by directly solving the GLM 500 times (sample mean and covariance).

cov, covariance; Dim, dimension of Λ_n; N/A, not applicable because the GLM is full-rank, leading to zero error; var, variance.

As expected, increasing noise levels adversely affected estimation accuracy. Both the mean and covariance of β showed greater deviations at lower SNRs, with covariance estimates being more sensitive. For instance, at a noise level 10 times higher than the signal (SNR = 0.1), deviations in β mean reached approximately 10%, while deviations in cov(β_{_est}) escalated 60%–90%. This amplification in covariance error can be attributed to error propagation from the β mean estimation to its covariance.

Regarding computational performance, simulations using “sliding strategy II” and data-driven initial guesses converged efficiently. The total runtime varied from a few seconds to just over 10 minutes, depending on the noise level, when executed on a system equipped with an Intel Xeon E3-1200 v3/4th Gen Core Processor (Clock Speed: 3.3 GHz) and 8 GB RAM.

4. DISCUSSION

Investigating the variance of biological signals has a long and established history. The general EM framework proposed by Dempster et al. laid the core theoretical foundation for maximum likelihood estimation of variance components in linear mixed models (Dempster et al., 1977), which has since been extended to broader contexts (Laird and Ware, 1982; Vaida and Meng, 2005). Nevertheless, a robust algorithm capable of reliably distinguishing signal variance from noise had long been pending (Churchland et al., 2010). This study proposed EMSEV, which applies the EM algorithm to a modified GLM framework, as a platform to address this long-standing issue in biological science. The contribution of this work does not lie in algorithmic innovation, but rather in the application of a mature EM framework to separate distinct sources of variance. This approach is applicable to various types of biological signals, particularly psychophysiological metrics and neural signals obtained from event designs. The simulation results demonstrate that with appropriate initialization and moderate noise levels, EMSEV shows promise in distinguishing between signal and noise variances. Under the conservative assumption of i.i.d., the deviation of error terms remained below 3% even at SNR 10. When noise and signal levels were comparable, deviations in the mean and covariance of β were approximately 3% and 10%–16%, respectively. The “sliding strategy II,” which incorporates more data points in the mini design matrix and utilizes an initial guess, yielded the best outcomes. It is important to note that the cov(β) and var(ε) directly obtained through the conventional GLM solving were highly biased and are not recommended. Notably, Eq(3) provides a mathematical basis for Weber’s law by showing that the mean and variance scale proportionally.

In terms of methodology development, McIntosh et al. introduced a covariance-oriented approach, partial least squares [PLS, invented by econometrician and statistician Wold (1966)], to the field of brain science (McIntosh et al., 1996). Summary scores indicating the relationship between brain activity and design (or behavior) can be derived, and statistical significance can be assessed using permutation and bootstrap procedures. This technique has been applied to various neuroscientific issues, such as exploring the influence of baseline neural power patterns on event-related activities (Lee et al., 2011). Pascual-Marqui was perhaps the first researcher who treated the variance of signal and noise separately and explicitly in neural models (Pascual-Marqui, 2007). In his formulation of “exact low-resolution brain electromagnetic tomography” (eLORETA), the regularization parameter represents the ratio of measurement noise to biological variance. Both the above canonical methods contribute significantly to the application of (co)variance structures in neuroimaging data. However, they have limitations in effectively quantifying signal variance. EMSEV provides a novel pipeline aimed at potentially distinguishing between nonbiological noise and biological signal variances.

Quantifying the variance of biologically or psychologically meaningful indices has broad applications. While variability traditionally implied uncertainty (with a negative connotation), recent studies have recognized that variability in biological systems may have “functional” significance. For instance, a certain level of variability in heart rate is linked to better cardiovascular health and increased autonomic flexibility (Thayer et al., 2010). PLS-based analysis has indicated that younger, faster, and more consistent performers exhibit higher brain variability across cognitive tasks such as perceptual matching, attention cueing, and delayed match-to-sample (Garrett et al., 2011). Notably, the impact of altering variance is a complex and context-dependent matter. Some research suggests that increased variance in neural signals across specific brain regions may be associated with neuropsychiatric conditions (Scarapicchia et al., 2018). EMSEV could prove beneficial in this relatively new research domain. For example, studies investigating how response time is influenced by brain waves at different electrode positions could benefit from EMSEV by providing estimates of the covariance structure among these dependent variables (assuming isotropic noise, as in the simulations). Incorporating physiological measures such as heart rate variability and skin conductance is also feasible but would require treating noise variance separately for each variable (heteroscedastic noise modeling). These features, in turn, could facilitate associations with relevant clinical profiles, such as impulsivity or hyperactivity.

In addition, EMSEV offers potential applications for statistical modeling. Unlike current methods that often attribute variance solely to noise under the null hypothesis with a false-positive cutoff of p-value 0.05 (where an estimated indicator is compared against a null distribution), EMSEV calculates variances for both target and noise. This enables statistical analysis by comparing two distributions, providing a clear definition of both false positives and negatives (Benjamin et al., 2018). Simultaneous assessment of false-positive and false-negative outcomes in statistics offers practical advantages over traditional methods that focus primarily on false positives. This approach enables researchers to evaluate errors comprehensively and may improve the reliability of findings. Considering both types of errors promotes nuanced interpretation and enhances confidence in statistical conclusions.

To retrieve the distribution of β and ε, each Λ_n must contain the necessary data to inform all elements in β. Missing element will bias the estimation of the statistics in Q_n(β) and the parameter set ϴ. Therefore, Λ_n is regarded as a unit of a mini-experiment that encompasses all the factors of interest and may represent a miniature of the entire experiment. This presents a particularly stringent constraint when Λ is a categorical design matrix made of dummy variables, whereas for continuous-variable design matrices, this is generally not a concern as long as the matrix is not singular. The limitation of EMSEV in this “categorical context” is evident: it operates under the assumption of ergodicity and necessitates substantial regularity in experimental designs. Fortunately, such regularity is common in the existing experimental designs in psychophysics, neurophysiology, neuroimaging, and cognitive psychology. Moreover, if temporal correlation (order of data) is not a concern, the rows of the GLM (observed data and the design matrix) can be permuted and reorganized to ensure that each new Λ_n carries adequate information about every element of β. Notably, as demonstrated by the two sliding strategies, constructing the design matrix to offer a reasonable initial estimate for the EM algorithm is recommended. In summary, it is suggested that the model matrix Λ be structured as a concatenation of mini-experiments (Λ_n). In this mini-experiment, all target variables should be present, and the number of data points (rows in Λ_n) should exceed the number of target variables (columns in Λ_n). Whether overlapping sliding and variable-length sliding affect the performance of EMSEV remains an interesting question that warrants further investigation.

Despite being more economical in data usage, the “sliding strategy I” yields subpar results compared with its counterparts, which could be partially attributed to the problem of local optima inherent to the EM algorithm. To cope with this issue, multiple initializations with diverse starting parameters can be used, selecting the model with the highest likelihood or lowest loss function. Recent work has increasingly emphasized the algorithmic efficiency of EM-based variance component estimation and its integration with advanced regularization strategies to address high-dimensional and complex modeling scenarios. For example, Ghosh et al. developed an algorithmic framework for systematically searching optimal variance component estimators within linear mixed models, thereby improving both estimation accuracy and computational tractability (Ghosh et al., 2023). Oliveira et al. further extended this direction by introducing the EMLMLasso algorithm, which seamlessly incorporates lasso-type regularization into the EM framework for linear mixed-effects models (Oliveira et al., 2025), enabling robust variable selection and variance estimation in high-dimensional settings. These developments suggest that combining efficient EM algorithms with modern computational techniques can substantially broaden the applicability and scalability of EMSEV.

5. CONCLUSION

Historically, signal variability has often been attributed to noise in mathematical modeling, despite its acknowledged significance. The application of the EM algorithm within a modified GLM, known as EMSEV, offers a promising approach to estimating both the variances of the target variable and the error. This method can effectively address the challenges of signal variability and improve the precision of statistical inferences. EMSEV may possess the potential to reform both neurobiological and psychophysiological modeling, given the abundance of event designs in these fields.

Footnotes

AUTHOR DISCLOSURE STATEMENT

The author declares no conflict of interest.

FUNDING INFORMATION

This work was supported by the NeuroCognitive Institute (NCI) and the NCI Clinical Research Foundation Inc.

AUTHOR’S CONTRIBUTIONS

T.-W.L. is the sole contributor: Conceptualization, methodology, formal analysis, investigation, visualization, writing—original draft, and writing—review and editing.

Supplemental Material

References

Benjamin

, Berger

, Johannesson

, et al. Redefine statistical significance. Nat Hum Behav, 2018; 2(1):6–10; doi: 10.1038/s41562-017-0189-z

Chater

, Brown

. From universal laws of cognition to specific cognitive models. Cogn Sci, 2008; 32(1):36–67.

Churchland

, Yu

, Cunningham

, et al. Stimulus onset quenches neural variability: A widespread cortical phenomenon. Nat Neurosci, 2010; 13(3):369–378; doi: 10.1038/nn.2501

Dempster

, Laird

, Rubin

. Maximum likelihood from incomplete data via the EM algorithm. J Royal Statis Soci: series B (Methodological), 1977; 39(1):1–22.

Garrett

, Kovacevic

, McIntosh

, et al. The importance of being variable. J Neurosci, 2011; 31(12):4496–4503; doi: 10.1523/jneurosci.5641-10.2011

Ghosh

, Peng

, Piepho

H-P

. An algorithm for searching optimal variance component estimators in linear mixed models. J Statistical Plan Inference, 2023; 227:34–56.

Laird

, Ware

. Random-effects models for longitudinal data. Biometrics, 1982; 38(4):963–974.

Lee

, Yu

, Wu

, et al. Do resting brain dynamics predict oddball evoked-potential? BMC Neurosci, 2011; 12:121; doi: 10.1186/1471-2202-12-121

Magnus

, Neudecker

. Matrix differential calculus with applications in statistics and econometrics. John Wiley & Sons: 2019.

10.

McCullagh

. Generalized linear models. Routledge, 2019.

11.

McCulloch

, Searle

, Neuhaus

. Generalized, Linear, and Mixed Models. Wiley Online Library: 2001.

12.

McIntosh

, Bookstein

, Haxby

, et al. Spatial pattern analysis of functional brain images using partial least squares. Neuroimage, 1996; 3(3 Pt 1):143–157; doi: 10.1006/nimg.1996.0016

13.

Oliveira

, Schumacher

, Oliveira

, et al. Revolutionizing high-dimensional regularization: EMLMLasso algorithm for linear mixed-effects models. Proc Series of the SBMAC, 2025; 11(1):1–7.

14.

Pascual-Marqui

. Discrete, 3D distributed, linear imaging methods of electric neuronal activity. Part 1: Exact, zero error localization. arXiv Preprint arXiv, 2007.

15.

Scarapicchia

, Mazerolle

, Fisk

, et al. Resting State BOLD Variability in Alzheimer’s Disease: A Marker of Cognitive Decline or Cerebrovascular Status? Front Aging Neurosci, 2018; 10:39; doi: 10.3389/fnagi.2018.00039

16.

Thayer

, Yamamoto

, Brosschot

. The relationship of autonomic imbalance, heart rate variability and cardiovascular disease risk factors. Int J Cardiol, 2010; 141(2):122–131; doi: 10.1016/j.ijcard.2009.09.543

17.

Vaida

, Meng

X-L

. Two slice-EM algorithms for fitting generalized linear mixed models with binary response. Statistical Modelling, 2005; 5(3):229–242.

18.

Wold

. Estimation of principal components and related models by iterative least squares. Multivariate Analysis, 1966:391–420.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.06 MB