Multinomial,Poisson and Gaussian statistics in count data analysis

Abstract

It is generally known that counting statistics is not correctly described by a Gaussian approximation. Nevertheless, in neutron scattering, it is common practice to apply this approximation to the counting statistics; also at low counting numbers. We show that the application of this approximation leads to skewed results not only for low-count features, such as background level estimation, but also for its estimation at double-digit count numbers. In effect, this approximation is shown to be imprecise on all levels of count. Instead, a Multinomial approach is introduced as well as a more standard Poisson method, which we compare with the Gaussian case. These two methods originate from a proper analysis of a multi-detector setup and a standard triple axis instrument.

We devise a simple mathematical procedure to produce unbiased fits using the Multinomial distribution and demonstrate this method on synthetic and actual inelastic scattering data. We find that the Multinomial method provide almost unbiased results, and in some cases outperforms the Poisson statistics. Although significantly biased, the Gaussian approach is in general more robust in cases where the fitted model is not a true representation of reality. For this reason, a proper data analysis toolbox for low-count neutron scattering should therefore contain more than one model for counting statistics.

Keywords

Poisson statistics Multinomial statistics data analysis neutron scattering

1. Introduction

The nature of the physical sciences is to apply a hypothesis to a system, such that it is possible to either confirm its accuracy, or falsify it, based on observation [2]. Usually, this observation consists of physically measured data which necessitates a statistical analysis, the type of which depends on the observation in question. In this article, we investigate analysis methods for low-statistics counting measurements, in particular inelastic neutron scattering data. Here, the current common practice is, due to convenience, to utilize the Gaussian limit of the Poisson statistics. This limit allows for the evaluation of fits by using the least squares method for which many algorithms are readily available, and to enable easier data transformation and normalisation. The approximative nature of the Gaussian treatment is well known and some software libraries are equipped to perform both the least squares method as well as the statistically correct Poisson treatment, e.g. MANTID [1].

Numerous previous studies of counting statistics and their influence on Poisson parameter estimation have been published both in the statistical case, see e.g. Ref. [12], or in the case of both single crystal and powder diffraction [5]. In the latter case, both the low and high count limits are of concern, with the high limit being more common in the elastic case. The low limit results in wrong estimation of the counting uncertainty when using the fitting method of Gaussian least squares. However, in the high count regime, the counting uncertainty no longer provides the main source of error and thus, counting “too” long results in an underestimation of the uncertainties. This, in turn, obscures and possibly falsifies the parameter uncertainty in the presence of systematic errors originating from the experimental setup, an oversimplification in the model utilized, or other sources [5]. We will here only be concerned with the question of statistical uncertainty, which will interchangingly be denoted as uncertainty and error.

In this article, we deal with the low-count limit of the Gaussian approximation, which we denote the Poisson regime. This is usually taken to be the regime with $N = 10$ or less counts [2]. However, we show that the inaccuracies in the Gaussian parameter fitting in fact extend well outside this Poisson regime, their relative systematic error in the case of a constant background diminishing only as $1 / N$ . We discuss the merit of using alternative true Poisson and multinomial fitting methods and pinpoint the advantages and drawbacks of all methods.

2. Model fitting by Gaussian and Poisson statistics

Parameter estimation of a suggested model given a data set can be seen as a problem particularly well suited for the Bayesian approach. Using this method, it is possible to update the estimates of model parameters, given particular observations. That is, given the initial, or a priory, information I for a model and set of parameters M, one updates their probabilities given the measured data D, according to Bayes Theorem [2] $\begin{matrix} (1) & P (M | D I) = \frac{P (D | M I)}{P (D | I)} P (M | I) . \end{matrix}$ Here $P (M | I)$ represents the initial probabilities of the model and its parameters, $P (D | I)$ is the probability of obtaining the observed data, $P (D | M I)$ gives the probability of the observation assuming a specific model and parameters, and finally $P (M | D I)$ is the posterior estimation of probabilities for the model and parameters. In order to apply this formula in practice, the probability of obtaining the data given the model needs to be found. As the model parameters are the intended result of the experiment, one needs to perform a fit that obtains these. This can be done in the Bayesian formalism by updating the parameter estimates with the new data as described in Eq. (1). However, in the case where no prior parameter values are more likely than others, one models this with a top hat prior. This requires the parameter to be finite, and as this usually is the case the prior can be set to be flat within the range of sensible values. This, in turn, makes the term $P (M | I)$ constant for all plausible parameter values. As $P (D | I)$ can be seen as a normalization constant independent of the model, we have $\begin{matrix} (2) & P (D | M I) \propto P (M | D I) . \end{matrix}$ To optimize the probability, one simply optimizes the so-called likelihood term, $P (M | D I)$ . This is in practice done by minimizing the negative log-likelihood: $\begin{array}{l} (3) & - ln (P (M | D)) = - ln (P (D | M)) & + ln (P (M)) - ln (P (D)) . \end{array}$ For both brevity and clarity I has been removed from the above equation, as the symbol will later be used to denote the measured neutron count.

2.1. The Poisson and Gaussian distributions

Because of the discrete and uncorrelated nature of counting statistics, it is known that it follows the Poisson distribution [2] with the probability of observing n counts for a process that has an expected mean count of λ, $\begin{matrix} (4) & P_{P} (n | λ) = \frac{λ^{n} e^{- λ}}{n!}, \end{matrix}$ with a standard deviation given by $\sqrt{λ}$ [2]. In the case of large mean counts, the Poisson distribution tends towards a Gaussian distribution, which also has mean λ and standard deviation $σ = \sqrt{λ}$ , i.e. $\begin{matrix} (5) & P_{P} (n = x | λ) \approx \frac{1}{\sqrt{2 π λ}} e^{- \frac{{(x - λ)}^{2}}{2 λ}} . \end{matrix}$

2.2. Statistics on scattering

For simplicity, let us limit our discussion to reactor-based instruments with a monochromatic incoming beam. In most triple axis instruments, the process of measuring the scattering intensities, or more correctly the scattering cross section, for different processes in a material is either performed through a series of scans or with a multi-detector setup. Multi-detectors are also used for SANS, imaging, and powder diffraction. Here each detector (or detector pixel) corresponds to a specific momentum transfer Q (and possibly energy transfer E). Despite the apparent differences the resulting statistics is the same. This can be seen by first considering the case of a point by point measurement. At each setting, one of two things can happen; either the neutron ends in the detector or it does not. This gives two pixels. At the next instrument setting, the same outcomes are possible. If the neutrons hit the detector, they are collected in pixel number 2, while neutrons missing are added to the missing neutrons from the previous setting. This goes on throughout the scan. Alternatively, if multiple detectors are used simultaneously, one splits all neutrons into the neutrons hitting individual detectors plus one for the neutrons that do not hit any detector.

Although these two methods might appear to be completely equivalent, they are in fact not. Even though the end spectra seem equivalent there is one key difference. When all data points are measured at the same time, it is known that any neutron entering the instrument had the same probability distribution of being detected, and the total number of neutrons was fixed. When a single point at a time is being measured for a certain amount of time, or equivalently number of neutrons released from the source, it is not known that each detector setting had the exact same number of incoming neutrons, only that the total spectrum had a certain number. This is, albeit small, a difference between the two measurement styles. When the multi-detector instrument is used in a scanning setup the knowledge of the same total incoming neutron count is lost and one is to revert back to the same analysis as for the scanning setup. An example where these two setups are in use is a time-of-flight spectrometer measuring a powder sample and a single crystal. In the prior case nothing is moved or scanned over during a spectrum acquisition while this is not the case for a single crystal. Here, usually the sample is rotated.

Looking at the case of many pixels being measured simultaneously, these are denoted $n_{1}, n_{2}, n_{3}, \dots, n_{m}$ , such that there are m different pixels. In addition, all neutrons not measured in these pixels (neutrons that do not reach any detector) are collected into $n_{0}$ . That is, $\begin{array}{l} (6) & Δ N = \sum_{i = 1}^{m} n_{i}, \\ (7) & N = \sum_{i = 0}^{m} n_{i} . \end{array}$ Thus, in total N neutrons hit the sample where $Δ N$ of these hit the detectors and consequently $N - Δ N$ hit outside of the detectors or are absorbed. The probabilities of a general neutron being detected in the individual pixels are denoted $p_{i}$ , yielding $\begin{array}{l} (8) & \sum_{i = 0}^{m} p_{i} = p_{0} + \underset{Δ p}{\underset{︸}{\sum_{i = 1}^{m} p_{i}}} = 1, \\ (9) & p_{0} = 1 - Δ p . \end{array}$ It is these $p_{i}$ ’s that are of interest to the physical properties of the system and their values are correlated through the models of the scattering cross section. That is, in a simple case where the model is given by $p_{i} = A e^{- {(μ - x_{i})}^{2} / (2 σ^{2})} + B$ , i.e. a Gaussian peak on a flat background, the probabilities depend on each other through their $x_{i}$ position (which could represent Q or E) and the model.

Multinomial distribution. In order to optimize these parameters, one needs to maximize the likelihood, which is given by a Multinomial distribution $\begin{array}{l} (10) & L = N! \prod_{i = 0}^{m} \frac{p_{i}^{n_{i}}}{n_{i}!} = \frac{N!}{n_{1}! n_{2}! \dots n_{m}! (N - Δ N)!} p_{1}^{n_{1}} p_{2}^{n_{2}} \dots p_{m}^{n_{m}} {(1 - Δ p)}^{N - Δ N} \end{array}$ By performing a Stirling’s approximation and introducing the normalized quantities, see Appendix A, $\begin{array}{l} (11) & q_{i} = \frac{n_{i}}{Δ N}, \sum_{i = 1}^{m} q_{i} = 1 \\ (12) & {\tilde{p}}_{i} = \frac{p_{i}}{Δ p}, \sum_{i = 1}^{m} {\tilde{p}}_{i} = 1 \end{array}$ one can get to the log-likelihood $\begin{array}{l} (13) & ln (L) = & N [\frac{Δ N}{N} \sum_{i = 1}^{m} (q_{i} ln (\frac{{\tilde{p}}_{i}}{q_{i}})) + \frac{Δ N}{N} ln (\frac{Δ p}{\frac{Δ N}{N}}) + (1 - \frac{Δ N}{N}) ln (\frac{1 - Δ p}{1 - \frac{Δ N}{N}})] . \end{array}$

The above log-likelihood is found when considering a collection of measurement data with a fixed total number of neutrons, i.e. N.

Poisson distribution. Taking one step back from the above derivation, what is usually performed is an analysis dealing with a data set where the total number of counts is not fixed, i.e. corresponding to the standard triple axis setup. This corresponds to removing the $p_{0}$ pixel. With this relaxation, the likelihood is given by the product of binomial terms for each detector, as $\begin{matrix} (14) & P (D | M) = \prod_{i = 1}^{m} P (D_{i} | M) = \prod_{i = 1}^{m} p_{i}^{n_{i}} {(1 - p_{i})}^{N - n_{i}} \frac{N!}{n_{i}! (N - n_{i})!}, \end{matrix}$ where $n_{i}$ are the number of neutrons hitting the detector i, which has a probability of $p_{i}$ , and the total number of neutrons are N. Taking this as a starting point, and going to the limit $n_{i} ≪ N$ , one readily finds the likelihood to be a product of Poisson distributions [2] $\begin{matrix} (15) & P (D | M) \approx \prod_{i = 1}^{m} \frac{λ^{n_{i}} e^{- λ_{i}}}{n_{i}!}, \end{matrix}$ where $λ_{i} = N p_{i}$ is the average number of counts. The largest possible probabilities are found when all $n_{i} \approx λ_{i}$ . If the data is given as a vector of counts $n_{i}$ as a function of the index i, then using equation (15) yields $\begin{matrix} (16) & - ln (P (D | M)) = \sum_{i = 1}^{m} [- n_{i} ln (λ_{i}) + λ_{i} + ln (n_{i}!)] . \end{matrix}$ Now, applying a model to the data is equivalent to demanding that the “true” values, $λ_{i}$ , follow a particular functional form $\begin{matrix} (17) & λ_{i} = M_{i} (x_{1}, x_{2}, \dots) = M_{i} (x_{α}), \end{matrix}$ where $x_{j}$ are the model parameters, shortened to the vector $x_{α}$ , that are to be optimized in the fitting procedure. Examples for data sets and fitting are given in Section 3.

2.3. Gaussian distribution

It is instructive to compare by repeating the similar calculation for data governed by Gaussian statistics, which can be found in an expansion of the Poisson result (15) around a large value of $λ_{i}$ [2]: $\begin{matrix} (18) & - ln {(P (D | M))}_{Gauss} = \sum_{i = 1}^{m} \frac{{(n_{i} - λ_{i})}^{2}}{2 σ_{i}^{2}} - \frac{1}{2} ln (2 π σ^{2}) . \end{matrix}$ As the last term is independent of the model, $λ_{i}$ , it merely represents a constant and is often removed. The same is true for the factor of 2 in the denominator of the first term. Maximizing the log-likelihood is thus equivalent to minimizing the quantity often denoted the chi-square, $\begin{matrix} (19) & χ^{2} = \sum_{i = 1}^{m} \frac{{(n_{i} - λ_{i})}^{2}}{σ_{i}^{2}} . \end{matrix}$ The whole procedure of minimizing this equation is often known as least squares fitting [2]. However, applying this Gaussian statistical treatment, a relation between $σ_{i}$ and the intensity is needed.

2.4. Fitting experimental data

Fitting a model using the above found likelihoods then consists of optimizing $ln (L)$ where the model parameters, $x_{α}$ , give the values for $p_{i}$ or $λ_{i}$ . It is important to note that only the dependence of the log-likelihood on these parameters matters; everything else is constant and can be discarded.

The Multinomial log-likelihood can be split into two parts; one concerning the zeroth pixel, the other the rest. The parameters only changes the latter part, which can be found to be proportional to $\begin{matrix} (20) & - ln (P (D | M)) = - Δ N \sum_{i = 1}^{m} q_{i} ln ({\tilde{p}}_{i}), \end{matrix}$ see Appendix A for details.

For the Poisson distributed data, the negative log-likelihood contains the term $ln (I_{i}!)$ which is independent of the model parameters. In effect, one has to optimize $\begin{array}{l} (21) & - ln (P (D | M)) = \sum_{i = 1}^{m} [- n_{i} ln (λ_{i}) + λ_{i}] . \end{array}$ Comparing the two above log-likelihoods, they are almost equivalent except for the $λ_{i}$ term and only normalized terms in Eq. (20). This is exactly the difference between the two measurement techniques; for the term in the Poisson $\sum_{i = 1}^{n} λ_{i} = \sum_{i = 1}^{n} N p_{i} = Δ N$ , but there is no constraint on $Δ N$ relative to N. In the Multinomial case, a term $\sum_{i = 1}^{n} {\tilde{p}}_{i}$ was present, but is known to always sum to unity.

Lastly, for the Gaussian distribution the log-likelihood is simply proportional to $χ^{2}$ and does thus not need to reformulated.

In the Gaussian log-likelihood, it would also be possible to use the model value for the uncertainty, i.e. $σ_{i} = \sqrt{λ_{i}}$ . However, a lot of computational flexibility (e.g. in normalization and background subtraction) is gained if $σ_{i}$ can be determined in a model-free way, i.e. directly from the individual data point. Hence, the approximation used in almost any fitting program is $σ_{i} \approx \sqrt{n_{i}}$ .

Difficulties arise from using this equation in the extreme low-count limit. In particular, when a counting number of 0 is measured, we have $σ_{i} = 0$ , corresponding to a (physically unreasonable) zero uncertainty on the data point. Statistically, this would mean that it is known with certainty that the true value, $λ_{i}$ , equals zero. This will, in turn, result in the model fits being forced through zero at these points.

For these reasons, practical applications of modeling of scattering data use different tactics to accommodate zero count values. The most often used way to circumvent the zero-count problem is by increasing the uncertainty of the zero-measurement to unity [1,3]. This, however, allows for the unphysical situation where $λ_{i}$ is just as likely to be positive as negative. One could device another method where zero-measurements are removed altogether. This of course introduces a strong bias, as measuring a point with zero counts contains a lot of information being ignored. Alternatively, one can shift the intensity of zero counts to 0.5 and use this value also as the uncertainty. Table 1 show these three different tactics.

Table 1
Three different tactics for dealing with zero count values in Gaussian statistics. The third method simply removes zero count points

Method Intensity₀ $σ_{0}$

BG1 0 1

BG2 0.5 0.5

BG3 Remove Remove

Method	Intensity₀	$σ_{0}$
BG1	0	1
BG2	0.5	0.5
BG3	Remove	Remove

In the similar case, when a count of unity is found, the corresponding uncertainty is then $σ_{i} = 1$ . This means that a negative value is only ‘1 σ away’ corresponding to the true value being positive with a probability of 84.1%, leaving an almost 16% probability of it being negative – which is again unreasonable. We do not here consider modifications of the errorbar of count values of 1. However, we can state that if a data set contains many low-count numbers, the use of Gaussian statistics is certainly imprecise.

In the rest of this paper, we will quantify how these introduced imprecisions affect the data analysis in a few simple examples, where we also compare with the more accurate Poisson and multinomial treatments.

3. Fits of two simple models

We here set out to investigate the difference between minimizing the three different log-likelihoods, when used on simple, synthetic counting data. We first show a study of a data set of no features, i.e. a flat background. Later, we discuss the case of one simple Gaussian peak on a flat background.

For the flat background, 1000 individual spectra are generated using the numpy.random.poisson method, implemented in Python [11], where the mean count is calculated from the model. All $x_{i}$ lie within −1 to 1. 1. For the peak shape on a flat background 10000 individual spectra with a total of 1000 counts in each were used, with once again $x_{i}$ -values between −1 and 1. As the total neutron count is fixed, the spectra are generated by the numpy.random.multinomial method. In both cases, each spectrum is fitted using (a) the Multinomial log-likelihood, (b) the Poisson log-likelihood and, (c) the Gaussian least squares method, where $σ_{i} = \sqrt{I_{i}}$ for a series of model parameters. In the latter case, the three different tactics of dealing with zero counts from Table 1 were used in turn. In order to ensure physical convergence, a bound has everywhere been imposed on the background and amplitude variables, B and A, so that $B ⩾ 0$ and $A ⩾ 0$ .

3.1. Constant background

We here consider the simplest model $\begin{matrix} (22) & λ_{i} (x_{i} | B) = B . \end{matrix}$ We estimate the background value, $B_{est}$ for true B values lying in the range 0 to 20, using the different schemes discussed above and 21 data points per series. As the Multinomial log-likelihood requires an optimization of the normalized probabilities, there remain no parameter to fit. Thus, the Multinomial log-likelihood is not fitted to the featureless background.

Figure 1 shows the mean estimation parameter and the standard deviation on it, the size of which is described in Section 4. While the Poisson fit shows a striking agreement with the underlying model, we observe a clear underestimation of the background parameter in all the Gaussian least square method fits. This is visible for all medium and large values of the background parameter, $B ⩾ 1.5$ , i.e. also outside the Poisson regime. This is a feature of Gaussian statistics, caused by the fact that lower count numbers are ascribed smaller error bars and therefore have higher relative weights in the chi-square fit (19). At large values of B, it can be shown that the deviation tends to a constant $Δ B = ⟨ B_{est} ⟩ - B = - 1$ , see Appendix C. This means that the relative error, $Δ B / B \sim 1 / B$ and thus still around 10% when we leave the Poisson regime.

Turning to the case of low background rates, $B ⩽ 1$ , we note that the Gaussian methods produce larger results than the true value ( $Δ B > 0$ ) for methods BG2 and BG3; with the worst results coming from BG3. This overestimation of the background as compared to the other chi-square fits is a natural consequence of modifying the zero-count observations.

In the medium range $B = 1.5$ to 3.0, the methods BG1 and BG2 are systematically too low, and here BG3 becomes more precise. We note that the BG2 method (setting zero counts to 0.5 and the same value for the corresponding error) is everywhere worse than the BG1 method (setting the zero-count error bar to unity). The BG3 method (ignore zero counts) is found to be the most precise method in the range $1 ⩽ B ⩽ 6$ . Overall, however, for the flat background case the BG1 method can be judged to be the best of the Gaussian methods across all scales of background amplitude. This result justifies the frequent use of the BG1 tactics.

None of the Gaussian methods, however, compare anywhere near the Poisson method in fitting precision for this simplest of models.

Fig. 1.

Left: average estimated background value obtained for fits to 1000 random data sets, fitted with the Poisson log-likelihood method and the three least-square methods, as explained in the text. Black lines signify the true parameter value, $λ_{i}$ . The standard deviation on the background estimations are shown as an “errorbar” on each point. Note that the standard deviation of the plotted mean value is therefore a factor $\sqrt{1000}$ lower than the plotted standard deviation. Right: example of a single dataset for background value of $B = 20$ with confidence interval of $1 σ$ is shown for all fits, as explained in Section 4. All least square methods result in the same fit and uncertainty area.

3.2. A single Gaussian peak on a constant background

An example very relevant to scattering is that of a peak with the shape of a Gaussian on a constant background. This model is given by $\begin{matrix} (23) & λ_{i} (x_{i} | A, μ, σ, B) = A e^{- \frac{{(x_{i} - μ)}^{2}}{2 σ^{2}}} + B, \end{matrix}$ where A is the amplitude of the peak, μ is the mean value, σ is the peak width (which should not be confused with the statistical standard deviation of the counting data), and B is the constant background.

In contrast to the above data, it here makes sense to also do parameter optimization using the Multinomial log-likelihood, where $\begin{array}{l} (24) & {\tilde{p}}_{i} & = \frac{p_{i}}{\sum_{j = 1}^{m} p_{j}} = \frac{A e^{- \frac{{(x_{i} - μ)}^{2}}{2 σ^{2}}} + B}{\sum_{j = 1}^{m} A e^{- \frac{{(x_{j} - μ)}^{2}}{2 σ^{2}}} + B} \\ (25) & = \frac{\frac{A}{B} e^{- \frac{{(x_{i} - μ)}^{2}}{2 σ^{2}}} + 1}{\sum_{j = 1}^{m} \frac{A}{B} e^{- \frac{{(x_{j} - μ)}^{2}}{2 σ^{2}}} + 1}, \end{array}$ which is a model only depending on three parameters: $A / B$ , μ, and σ, and not the four present in the other fitting schemes.

For this model the different values of the background, B, was used with fixed center, $μ = 0$ , amplitude, $A = 20$ , width, $σ = 0.1$ , and range of x values, $N = 61$ points from $x = - 1$ to $x = 1$ . That is, although the background level is set to e.g. 15, the requirement of a maximum of 1000 counts in spectrum still apply. That is, background and amplitude levels do not correspond directly to mean count numbers but rather to relative intensities. One typical data set is shown in Fig. 2, as well as the statistics of the fits of the five models to each of the synthetic spectra. Their corresponding confidence intervals, corresponding to 68.27 % are plotted on top. As the Multinomial log-likelihood does not provide an amplitude measure directly, the distance away from the true $A / B$ is plotted, see top left of Fig. 2.

In the data, we observe the same tendency of the three Gaussian least square fits overestimating the $A / B$ parameter. Especially at small values of the background they diverge substantially from the true value. For larger values, an offset seems to be present, which could be expected from the analysis of the optimal fitting parameters for the feature-less fit in Section 3.1. Both the Poisson and Multinomial methods are quite accurate at small values of background and continue to be up to a background value of around 25. Above this, there seem to be a constant offset for all higher background values. This is, however an artefact of the limited total count in the spectrum. If this is increased to 10000 only the Poisson does not improve its mean, while the Multinomial does. Increasing the total counts also reduces the errorbars as expected, not shown.

Fig. 2.

Distance away from the truth for estimated values of A/B, μ, σ, and a sample data set with a background level $B = 15$ . This background gives a signal to noise of about 1 which with 1000 counts over 61 pixels result in about 14 counts in background and 28 in signal.

When it comes to the center position of the peak, μ, all methods agree across all levels of background, accurately finding the true mean value, $⟨ μ_{est} ⟩ = 0$ , as could be expected from symmetry arguments – the counting error is treated equally for positive and negative values of $x_{i}$ . All methods share a general trend of larger standard deviation of the estimator for larger values of background, simply reflecting the larger level of noise on each data point when a background is added. Especially the Multinomial method has small uncertainties on the mean as compared to the other methods. Around the background value of ∼ 22 the size of the errorbars from the Poisson method start to increase in size. In contrast, the error of the Multinomial method only grows slowly for increasing background values.

There is no doubt that the Multinomial method out-performs the other 4 methods when the peak width is to be determined. When the background to amplitude reaches a ratio of 1, all but the Multinomial method, on average, overestimate the peak width. The Multinomial method not only finds the correct width but also with significantly smaller standard deviation.

4. Model normalization, visualization, and uncertainty

In most neutron experiments, the acquired raw count is somehow normalized. Often this is done with respect to monitor count, resolution volume and detector sensitivity, just to mention a few. The standard progression is to normalize the intensity measured and then find the estimated uncertainty on the data points, that is $\begin{matrix} (26) & I_{i} \to \frac{I_{i}}{N_{i}}, σ_{i} = \sqrt{I_{i}} \to \frac{\sqrt{I_{i}}}{N_{i}} . \end{matrix}$ This introduces a further uncertainty on the counting number from the measurement of the monitor value. By applying the error propagation by adding their uncertainties in quadrature one gets $\begin{matrix} (27) & σ_{i} = \sqrt{\frac{I_{i}}{N_{i}^{2}} + \frac{I_{i}}{N_{i}^{3}}} . \end{matrix}$ As the monitor count is often orders of magnitude above the actual detector counts, we here drop the latter term.

However, the cleanest way to perform this transformation is to transform the model instead of the data, for example that the expected count rate is proportional to the count time: $\begin{matrix} (28) & λ_{i} \to λ_{i} α_{i}, \end{matrix}$ where $α_{i}$ is the (point dependent) normalization constant. In a least-square fit, the value of the variance weighted square deviations, $χ^{2}$ , will now be given as $\begin{matrix} (29) & χ^{2} = \sum_{i} \frac{{(λ_{i} - I_{i})}^{2}}{σ_{i}^{2}} \to \sum_{i} \frac{{(λ_{i} α - I_{i})}^{2}}{σ_{i}^{2}} = \sum_{i} \frac{{(λ_{i} - \frac{I_{i}}{α})}^{2}}{\frac{σ_{i}^{2}}{α^{2}}} . \end{matrix}$ Thus, transforming the model by $α_{i}$ is identical to the transformations on the individual data points: $I_{i} \to I_{i} / α_{i}$ and $σ_{i} \to σ_{i} / α_{i}$ . This is the way that this normalization is usually implemented in practice, when using Gaussian statistics. However, when the Poisson log-likelihood method is used, the normalization belongs only to the model, since a scaling of the number of counts will interfere with the Poisson counting statistics. The Multinomial method, on the other hand, re-normalizes all of the data, in contrast to the Poisson, such that the absolute scale is irrelevant. But this is only the case for an overall scaling as relative normalizations between data points still has to be taken into account, as is the case for the $χ^{2}$ methods. This lack of absolute scale also impacts the visualization of the result. Plotting the optimal parameters on top of the fitted data requires the scale of the data to be found. It is, however, simply given as the sum of all counts and a re-scaling is trivial.

When visualizing data, it is common practice to display an errorbar on the individual data points, representing the statistical uncertainty. However, following the discussion on the Gaussian and Poisson statistics above, this is formally a wrong presentation of counting data. In principle, there is no uncertainty on the actual measurement in a given point. Rather, the uncertainty lies on the estimation on the underlying true scattering intensity $λ_{i}$ , in other words: on the model parameters. With this in mind, a more statistically consistent way of visualizing data would be to show data points without errorbars, while showing the refined models with “error intervals”, which could be shaded areas corresponding to the regular 1σ confidence interval. This is the method we have used to display our data above, Fig. 2. The way of visualizing error does of course not change the underlying analysis, e.g. the estimation of model parameters, but is merely a visual change. Further, it also highlights the fact that the model extrapolates from the data fitted and predicts the true hind-lying $λ_{i}$ for all possible values of $x_{i}$ despite only a limited number of $x_{i}$ values has been observed.

Finding the confidence intervals for the Multinomial and Poisson methods requires a little work. By the notion of uncertainty on the model it is meant that it is independent of the uncertainty on the fitted parameters and represents the statistical uncertainty in drawing counts from its distribution. This is an alternative to providing an error estimate on the data points with the best fitting model plotted on top. In the case of the least squares fit, the region of model uncertainty is the count values corresponding to $\pm 1 σ$ as found from solving $\begin{matrix} (30) & \int_{- a}^{0} \frac{1}{\sqrt{2 π}} e^{\frac{- x^{2}}{2}} = α and \int_{0}^{b} \frac{1}{\sqrt{2 π}} e^{\frac{- x^{2}}{2}} = β, \end{matrix}$ for a and b with $α = β = 0.3173$ , yielding the usual $a = b = 1$ . However, for the Poisson log-likelihood statistics, the corresponding procedure is less obvious, in particular due to the discrete nature of the Poisson statistics. A number of different approaches have been discussed in literature [12] that both cover the wanted area as tightly as possible and without skew. The main discussion issues are (1) Does the error estimate have to be integer or can it be relaxed to be non-integer? (2) Should the range covered above and below the mean value be symmetric? In the present case of scattering data, we will usually normalize the underlying model with (at least) the counting time or the monitor counts, thus allowing for a loosening of the discrete nature. Regarding skewness, it is of greater scientific value to have a statistically true representation than an aesthetically pretty figure.

Thus, one can define the confidence interval limits a and b equivalently as for the Gaussian with $\begin{matrix} (31) & \int_{a}^{λ} \frac{e^{- λ} λ^{x}}{x!} d x = α and \int_{λ}^{b} \frac{e^{- λ} λ^{x}}{x!} d x = β, \end{matrix}$ for $α = β = 0.3173$ corresponding to the integral of a Gaussian from 0 to σ. An example is shown in Fig. 2. As the Poisson distribution is skewed so are the values of a as compared to b.

For the Multinomial distribution, one can use the confidence interval methods for binomial distribution. At each point along the fitted curve the success probability is simply the estimated value, while all other outcomes are regarded as fails. As was the case for the Poisson, many different procedures for calculating confidence intervals exist [13]. Weighing calculational complexity and correctness, it has been chosen to use the Wilson score interval with continuity correction. Specifically, the confidence interval is found from $\begin{array}{l} (32) & {CI}_{upper} (p, n, z) = min {\frac{2 n p + z^{2} + z \sqrt{z^{2} - \frac{1}{n} + 4 n p (1 - p) - (4 p - 2)} + 1}{2 (n + z^{2})}, 1}, \\ (33) & {CI}_{lower} (p, n, z) = max {\frac{2 n p + z^{2} - z \sqrt{z^{2} - \frac{1}{n} + 4 n p (1 - p) + (4 p - 2)} + 1}{2 (n + z^{2})}, 0}, \end{array}$ for a total of n counts and z is the probit corresponding to the wanted confidence interval. In the case of 1σ, $z = 0.952$ .

5. Error estimate on parameters

An experimentally determined parameter has little scientific value without a corresponding uncertainty value. That is to say that when tabulating fitting parameters or other extracted variables, one needs to quantify the degree to which this value represents the true underlying numerical value. In general, two different ways of estimating the error exists; (1) change only the parameter in question until the log-likelihood value changes a certain amount or (2) change the parameter in question and optimize the others until the log-likelihood has changed by the given amount.

For the case of a normally distributed variable being fitted by a single parameter, the uncertainty on the parameter is given by a change in the chi-square value of unity, or, when the log-likelihood method is used, by a change of this value by 0.5. This, in turn, corresponds to a confidence interval of 68.27%, usually denoted the $1 σ$ interval [2,7]. However, in the multi-dimensional case with many parameters, a change of 0.5 in the log-likelihood no longer represents the $1 σ$ interval. Instead, the task is to find the inverse of the cumulative density, such that the 68.27% confidence interval is found. All of this has already been implemented in the software package Minuit [8]. The two above described methods of acquiring the uncertainties still apply and both of these are available in Minuit; one through the regular minimization and one by the minos algorithm. This method is computationally heavier and will in general return non-symmetric errors.

Further, curves for constant log likelihood can also be plotted, and an example for the Multinomial, Poisson and $χ^{2}$ for error scheme 1 and 2 are shown in Fig. 3, for the template data in Fig. 2. That is, the signal-to-background level is $20.0 / 21.6 \approx 0.93$ and σ is 0.1. Error scheme 3 is not shown as it completely resembles scheme 1 and 2. The constant log likelihood curves are plotted as function of estimated $A / B$ and σ. Naïvely one would conclude that the Multinomial confidence interval is larger than those for the other methods, but what is not taken into account is the uncertainty for these in the determination of B. For this effect to be visible, multiple different spectra are to be generated and fitted.

Fig. 3.

Estimated $1 σ$ , $2 σ$ , and $3 σ$ confidence intervals for the signal-to-background value ( $A / B$ ) and peak width (σ) fitting parameters for the synthetic data set in Fig. 2 of a Gaussian peak on constant background as described in Section 3. The value of the true model parameters are displayed as a cross in each panel at the point $(A / B, σ) = (0.926, 0.1)$ .

Looking at the distribution of parameter estimations for the three different statistics types it is seen that on average the Multinomial distribution is both most accurate and precise with the Poisson statistics following its precision. Comparing the extend of the 1, 2, and 3 σ intervals for the Multinomial distribution with the error estimate in Fig. 3 it can be argued that its error estimate is too large. A true correspondence between change in log-likelihood and error estimate might not have been achieved resulting in an overestimation of uncertainty in single parameters.

Diving into spectra for the peak-to-background level of 0.926, depicted in Fig. 4, the spread of parameter differs between the methods. Except for a few outliers both the Multinomial and Poisson methods have centre close to the correct value, while the Gaussian methods both diverges somewhat. Plotted together with the distribution of fits are co-variance ellipses signifying 1, 2, and 3 σ. Again the Multinomial and Poisson methods are superiour, but a clear advantage can be seen for the Multinomial method as its co-variance ellipses are close togehter.

Fig. 4.

Scatter plot of parameters determined from a series of synthetic data sets, using the Multinomial, Poisson and $χ^{2}$ for error scheme 1 and 2 methods, plotted together with histograms of distribution of the fitted value background and amplitude, their true parameters being $A / B = 0.926$ and $σ = 0.1$ . The 1, 2, and 3 σ co-variances are plotted as ellipses, the true and average fitted parameter. Mean value is signified by an empty circle while the true value is signified by a cross.

6. Example: Fitting normalized, low-count neutron scattering data

Our exploration of synthetic data from simple models gave rather clear results in favour of the both the Multinomial and Poisson fitting methods. However, the litmus test would be the influence of the methods on real-world scattering data.

To investigate this, we use an inelastic neutron scattering data set from a measurement of spin waves in MnF₂. We chose this system as a demonstration case, because MnF₂ has simple inelastic features consisting of only one spin wave branch, as well as the fact that a large single crystal of great quality was available. Important in this context is that since MnF₂ orders in an antiferromagnetic structure, different parts of the spin wave spectrum have different intensities. In particular, the magnon intensity around the magnetic Bragg peaks with Miller indices $H + K + L$ being odd is high, as opposed to low close to the structural Bragg peaks, for $H + K + L$ even. This allows for a systematic change of peak intensity when performing 1D cuts for constant energy in a given Q direction.

All data presented were taken during the early commissioning of the new cold-neutron multiplexing spectrometer CAMEA (PSI) in November-December 2018 [4,9]. We used a 6.2 g single crystal sample, held at a temperature of 2 K. The measurements were taken as pure sample rotation scans, using two settings of the analyzer-detector tank and four values of the incoming energy, $E_{i}$ . All conversions, normalizations, and visualizations are performed by the novel MJOLNIR software [10] developed especially for data from CAMEA-type spectrometers. Because of the binning applied to the data, measurement points close to each other in reciprocal space are added together and thus, a completely true normalization is impossible. Instead, the nomalization is found as the average of normalizations from each detected point being binned into the specific pixel. That is, if 15 different detector pixels are binned together into a single point, the normalization is an average of these.

Fig. 5.

Left: cut along $(H 0 1)$ for MnF₂ as measured at CAMEA. Insert: show of full dispersion with color scale a factor of ten larger. Right: excerpt of cut at 3.85 meV with a width of 0.15 meV with corresponding fits. Top: normalized intensity where sensitivity and monitor/counting time is included. Bottom: raw neutron counts.

The main data is shown as a color plot in Fig. 5 (left). A full view of all of the cuts is found in Appendix B. We observe a smooth and sharp spin wave dispersion with maximum intensity at the single ion anisotropy gap at 1.0 meV at $(0 0 1)$ and vanishing intensity close to $(- 1 0 1)$ , as known from earlier studies [14]. We analyse the data by one-dimensional constant-E cuts along $(h 0 1)$ , as shown in Fig. 5 (right). Analyzing the cuts for different energies shows that the intensity increases with distance to the structural Bragg peak $(- 1 0 1)$ .

Comparing the raw counts with the normalized intensity (Fig. 5 (right)) illustrates that one cannot use the (unnormalized) raw counts directly to fit the data. Instead a model is imposed on the data consisting of three parts: First the peak model, $λ_{i}^{*}$ , which is here assumed to be a Gaussian peak on a constant background described by Eq. (23). Second, the sensitivity of the individual analyser-detector pairs, measured by vanadium scattering, and here denoted the Normalization, N. Lastly, the dependence on counting time through the Monitor value, M.

In total the model reads $\begin{matrix} (34) & λ_{i} = λ_{i}^{*} N_{i} M_{i} . \end{matrix}$

This combined model has been fitted to the raw neutron counts using the fitting methods discussed above using Minuit and their errors are found using the Minos algorithm. For the human eye, the validity of the fit is very hard to evaluate, see the bottom right of Fig. 5, because of the the erratic nature of the normalization, $N_{i}$ . It is much better to look at the model, $λ_{i}$ with the data normalized, top right of the figure. However, to the minimization algorithm, this oddly looking model is straight forward to evaluate.

Fig. 6.

Parameter values and uncertainty as fitted to the MnF₂ data using the 5 different methods and Minuit.

Figure 6 shows the outcome of the data analysis for the four different fitting methods. From the found parameters it can be seen that all methods agree largely on the determination of μ, while some spread is present for the other parameters. As a trend, the Poisson method predicts the largest values for the width of the peak. All methods agree that the amplitude of the peak grows for larger energies, but the exact value is disputed. As the background level in general is low, it is expected that the least squares method where zero counts are excluded overestimates the background and in turn underestimates the amplitude to background. This is indeed the case and this estimate lies below. The Poisson and Multinomial fits agree across all three parameters and at all energies. In order for the fits to converge properly, special care had to be given not to include any intensity to the data that was not described by the fitting function. Especially the background estimation of the Poisson and Multinomial fitting were sensitive to any other structures in the data, see Appendix D. Looking at the estimated uncertainties in the parameters, it is seen that the Multinomial distribution provides the largest error estimates as compared to the Poisson. This was also seen in Section 4 where the conclusion is that the error estimates might be a little to large when compared to the standard deviation on a large sample of spectra.

This example of inelastic scattering data has been measured using a multi-detector setup but with a scan over sample rotation. This results in the requirements for using the Multinomial not being met, but rather one is to use the Poisson formalism.

7. Discussion and conclusion

Three different log-likelihoods have been presented having their origin in the Multinomial, Poisson and Gaussian distributions. We have reviewed the formalism to perform data analysis through parameter estimation using these in order to tackle the difficulties arising when performing simple $χ^{2}$ fits on counting data.

When treating a featureless spectrum we have clarified analytically and by the use of synthetic data that the Gaussian approximation of the Poisson distribution, both inside and outside the Poisson regime, will result in a clear bias. The simple reason is that Gaussian statistics weigh low-count data points higher. In contrast, the Poisson fits yields unbiased results, while the Multinomial method simply (and correctly) provides the mean count value as a constant value.

When fitting synthetic data with a peak on a constant background it was shown that the Multinomial and Poisson fitting methods produced much better results for parameters as compared to the three different least squares method. All of the methods had, on average, a good estimation of the peak center, with the Multinomial having the smallest standard deviation. In addition, in both the signal-to-noise and width parameters the $χ^{2}$ methods were biased. When the signal-to-noise level decreased below 1, also the Poisson method became biased. A small bias was also found for the Multinomial in $A / B$ , but was found to be reduced by increasing the number of counts in the spectrum. By investigating a single spectrum, it was found that a cross correlation between peak width and signal-to-background level was present. It was further found that the loglikelihood of the Multinomial distribution increased the slowest signifying a larger area of uncertainty. This is rather artificial as the signal-to-background parameter for all but the Multinomial distribution is a combination of two fitted parameters, where the background uncertainty was not propagated. Further, the spread of multiple estimated parameters for many spectra yield a smaller standard deviation for Multinomial and Poisson as compared to the Gaussian methods.

One of the drawbacks of using Multinomial and Poisson statistics is the need for maintaining the original count values of all data, for example in case of efficiency and monitor normalization as well as background subtraction. This complication makes development of data analysis software one level more complex. Nevertheless, we have implemented such a framework in the MJOLNIR analysis package and used this to compare the Poisson and Gaussian methods on simple, but real, data on spin waves in MnF₂. Our findings show that the Multinomial and Poisson methods are less stable than the Gaussian methods with regards to the case where the fitting function does not fully describe the data. This necessitated masking away data regions containing other signals than the peak being fitted. For none of the Gaussian methods, such a masking procedure was necessary in order to get an acceptable fitting result.

It is only in the case of a one-shot acquisition that the Multinomial distribution is correct, i.e. when all data points are measured at the same time. If a scan is performed it is actually the Poisson distribution that is to be used. Despite the Multinomial and Poisson statistics being the correct methods in each their setting, the sturdiness and reliability of the Gaussian least-square certainly counts in the favour of this well-established method.

In conclusion, the Multinomial, Poisson and Gaussian methods have their strengths and justification, and we advocate that future full-fetched analysis programs should be equipped with more than one fitting method for their data analysis algorithms. A process could consist of first a $χ^{2}$ fitting optimizing user provided initial guesses followed by a log-likelihood fit using Poisson or Multinomial statistics as needed.

Footnotes

Acknowledgements

It is a pleasure to thank Toby Perring, Tobias Weber and Dmitry Gorkov for valuable discussions regarding understanding of error estimates and visualization of model and parameter uncertainties. We would also like to thank Christof Niedermayer for the assistance while obtaining the MnF₂ data set at CAMEA.

Multinomial Log-likelihood derivation

Taking the logarithm, and applying Stirling’s approximation for all faculty terms, one gets $\begin{array}{l} ln (L) \approx & N ln (N) - N \sum_{i = 1}^{m} [n_{i} ln (p_{i}) - n_{i} ln (n_{i}) + n_{i}] + (N - Δ N) ln (1 - Δ P) \\ (35) & - (N - Δ N) ln (N - Δ N) + (N - Δ N) \\ = & N ln (N) + (N - Δ N) - Δ N - (N - Δ N) ln (N - Δ N) + \sum_{i = 1}^{m} [n_{i} + n_{i} ln (\frac{p_{i}}{n_{i}})] \\ (36) & + (N - Δ N) ln (1 - Δ p) \\ (37) & = & N ln (N) + \sum_{i = 1}^{m} [n_{i} ln (\frac{p_{i}}{n_{i}})] + (N - Δ N) ln (\frac{1 - Δ p}{N - Δ N}) \\ (38) & = & \sum_{i = 1}^{m} [n_{i} ln (\frac{p_{i} N}{n_{i}})] + (N - Δ N) ln (\frac{1 - Δ p}{1 - \frac{Δ N}{N}}), \end{array}$ where the last equality follows from rewriting $N ln (N) = (N - Δ N) ln (N) + \sum_{i = 1}^{m} n_{i} ln (N)$ . Instead of working directly with the quantities, normalized ones is introduced $\begin{array}{l} (39) & q_{i} = \frac{n_{i}}{Δ N}, \sum_{i = 1}^{m} q_{i} = 1 \\ (40) & {\tilde{p}}_{i} = \frac{p_{i}}{Δ p}, \sum_{i = 1}^{m} {\tilde{p}}_{i} = 1 \end{array}$ Inserting these into the log-likelihood, $\begin{array}{l} (41) & ln (L) = & N [\frac{Δ N}{N} \sum_{i = 1}^{m} (\frac{n_{i}}{Δ N} ln (\frac{p_{i}}{Δ p} \frac{Δ p}{\frac{n_{i}}{Δ N}} \frac{N}{Δ N})) + (1 - \frac{Δ N}{N}) ln (\frac{1 - Δ p}{1 - \frac{Δ N}{N}})] . \end{array}$ Reducing this one finally reaches $\begin{array}{l} (42) & ln (L) = N [\frac{Δ N}{N} \sum_{i = 1}^{m} (q_{i} ln (\frac{{\tilde{p}}_{i}}{q_{i}})) + \frac{Δ N}{N} ln (\frac{Δ p}{\frac{Δ N}{N}}) + (1 - \frac{Δ N}{N}) ln (\frac{1 - Δ p}{1 - \frac{Δ N}{N}})], \\ (43) & \frac{\partial}{\partial x_{α}} ln (L) = \frac{\partial Δ p}{\partial x_{α}} \frac{\partial ln (L)}{\partial Δ p} + \sum_{i = 1}^{m} [\frac{\partial {\tilde{p}}_{i}}{\partial x_{α}} \frac{\partial ln (L)}{\partial {\tilde{p}}_{i}}] = 0 . \end{array}$ Looking at the derivative of $ln (L)$ with respect to $Δ p$ , $\begin{matrix} (44) & \frac{\partial ln (L)}{\partial Δ p} = N (\frac{\frac{Δ N}{N}}{Δ p} - \frac{1 - \frac{Δ N}{N}}{1 - Δ p}) . \end{matrix}$ The term inside the summation in Eq. (43) is equivalent to $\begin{array}{l} (45) & \sum_{i = 1}^{m} [\frac{\partial {\tilde{p}}_{i}}{\partial x_{α}} \frac{\partial ln (L)}{\partial {\tilde{p}}_{i}}] = \frac{\partial ln (\tilde{L})}{\partial x_{α}} \\ (46) & ln (\tilde{L}) = Δ N \sum_{i = 1}^{m} q_{i} ln (\frac{{\tilde{p}}_{i}}{q_{i}}) . \end{array}$ This follows from $\begin{array}{l} (47) & \frac{\partial ln (L)}{\partial {\tilde{p}}_{i}} = Δ \frac{\partial}{\partial {\tilde{p}}_{i}} q_{i} ln (\frac{{\tilde{p}}_{i}}{q_{i}}) = Δ N \frac{q_{i}}{{\tilde{p}}_{i}} \\ (48) & \sum_{i = 1}^{m} [\frac{\partial {\tilde{p}}_{i}}{\partial x_{α}} Δ N \frac{q_{i}}{{\tilde{p}}_{i}}] = Δ N \sum_{i = 1}^{m} [\frac{q_{i}}{{\tilde{p}}_{i}} \frac{\partial {\tilde{p}}_{i}}{\partial x_{α}}] = \frac{\partial ln (\tilde{L})}{\partial x_{α}} . \end{array}$

Solving the above equation can be split into two; firstly if Eq. (44) is zero, the first term in Eq. (43) is independent of $x_{α}$ , which gives $\begin{matrix} (49) & 0 = N (\frac{\frac{Δ N}{N}}{Δ p} - \frac{1 - \frac{Δ N}{N}}{1 - Δ p}) \Rightarrow Δ p^{*} = \frac{Δ N}{N}, \end{matrix}$ where $Δ p^{*}$ is introduced as the optimal parameter. This is a natural result stating that the maximal $ln (L)$ is when the modeled number of neutrons not hitting the detector coincides with the real world number. Inserting these values of $Δ p^{*}$ in $ln (L)$ yields $\begin{array}{l} (50) & ln {(L)}_{Δ p = Δ p^{*}} & = N [\frac{Δ N}{N} \sum_{i = 1}^{m} [q_{i} ln (\frac{{\tilde{p}}_{i}}{q_{i}})] + \frac{Δ N}{N} + (1 - \frac{Δ N}{N})] \\ (51) & = N + Δ N \sum_{i = 1}^{m} q_{i} ln (\frac{{\tilde{p}}_{i}}{q_{i}}) . \end{array}$ The best fitting parameters are then found by optimizing $\begin{matrix} (52) & \sum_{i = 1}^{m} q_{i} ln (\frac{{\tilde{p}}_{i}}{q_{i}}) = \sum_{i = 1}^{m} q_{i} ln ({\tilde{p}}_{i}) - \underset{const}{\underset{︸}{\sum_{i = 1}^{m} q_{i} ln (q_{i})}}, \end{matrix}$ where $q_{i} ln (q_{i})$ is independent of the optimization parameters $x_{α}$ , which only influence ${\tilde{p}}_{i}$ . Applying this to fitting, one has to minimize $\begin{matrix} (53) & - ln (P (D | M)) = - Δ N \sum_{i = 1}^{m} q_{i} ln ({\tilde{p}}_{i}) . \end{matrix}$

The full MnF 2 data set

For completeness, Figs 7, 8, and 9 presents the full suite of fits to the MnF₂ data, discussed in the main text.

Proof of the systematic errors in the values of background estimators

In the following the deviation of the background estimation using the least squares method on scattering data is investigated. Starting from the chi square $\begin{matrix} (54) & χ^{2} = \sum_{i} {(\frac{n_{i} - λ_{i}}{σ_{i}})}^{2}, \end{matrix}$ where $n_{i}$ is the count number in the i’th bin with $σ_{i}$ being the corresponding uncertainty estimate. $λ_{i}$ is the model prediction at i, which in the case of a flat background simply is b. To find the stationary point for this function, the first derivative with respect to the model parameter b is found $\begin{matrix} (55) & 0 = \frac{\partial χ^{2}}{\partial b} = \sum_{i} - 2 \frac{n_{i} - b}{σ_{i}^{2}} \end{matrix}$ Splitting the sums and isolating b yields $\begin{matrix} (56) & b = {(\sum_{i} \frac{1}{σ_{i}^{2}})}^{- 1} (\sum_{i} \frac{n_{i}}{σ_{i}^{2}}) . \end{matrix}$ Next step is to split the two sums into a part containing the points that are zero and all the rest. It is here assumed that there are z bins with zero counts and $m - z$ non-zero bins. This yields $\begin{matrix} (57) & b = {(\sum_{i = 1}^{z} \frac{1}{{\tilde{σ}}_{i}^{2}} + \sum_{i = z + 1}^{m} \frac{1}{σ_{i}^{2}})}^{- 1} (\sum_{i = 1}^{z} \frac{{\tilde{n}}_{i}}{{\tilde{σ}}_{i}^{2}} + \sum_{i = z + 1}^{n} \frac{n_{i}}{σ_{i}^{2}}), \end{matrix}$ where ${\tilde{n}}_{i}$ and ${\tilde{σ}}_{i}$ denote the values used for zero bins. These are different depending on the background strategy as presented in Table 1. First, the BG3 case is followed where zero counts are removed and the uncertainty estimate is simply $σ_{i} = \sqrt{n_{i}}$ . This results in $\begin{matrix} (58) & b_{3} = {(\sum_{i = z + 1}^{m} \frac{1}{n_{i}})}^{- 1} (\sum_{i = z + 1}^{m} \frac{n_{i}}{n_{i}}) = \frac{m - z}{\sum_{i = z + 1}^{m} \frac{1}{n_{i}}}, \end{matrix}$ as the second sum merely is a sum of 1 with $m - z$ terms. If one assumes the number of bins measured is large, then the sum in the denominator can be approximated by the Poisson distribution of counts. That is the number of bins containing a certain number of counts n is to distributed by the Poisson distribution given the expectation value, in this case b, multiplied with the total number of bins m, $\begin{matrix} (59) & \sum_{i = z + 1}^{m} \frac{1}{n_{i}} \approx \sum_{n = 1}^{\infty} P (n | b) m \frac{1}{n} = m \sum_{n = 1}^{\infty} \frac{e^{- b} b^{n}}{n! n} . \end{matrix}$ In the above, the sum used in the approximation starts at 1 instead of 0 as the bins containing zero counts already have been taken care of. The sum can be performed yielding $\begin{matrix} (60) & \sum_{n = 1}^{\infty} \frac{e^{- b} b^{n}}{n! n} = - e^{- b} (- E_{i} (b) + ln b + γ), \end{matrix}$ where $E_{i}$ is the exponential integral and $γ \approx 0.5772157 \dots$ is the Euler constant. Then, the background estimate becomes $\begin{matrix} (61) & b_{3} = \frac{m - z}{m (- e^{- b} (- E_{i} (b) + ln b + γ))} = \frac{1 - \frac{z}{m}}{- e^{- b} (- E_{i} (b) + ln b + γ)}, \end{matrix}$ Following the same procedure for the two other strategies Eq. 57 becomes $\begin{array}{l} (62) & b_{1} = {(z + \sum_{i = z + 1}^{m} \frac{1}{n_{i}})}^{- 1} (0 + (m - z)) = \frac{1 - \frac{z}{m}}{\frac{z}{m} + (- e^{- b} (- E_{i} (b) + ln b + γ))} \\ (63) & b_{2} = {(4 z + \sum_{i = z + 1}^{m} \frac{1}{n_{i}})}^{- 1} (2 z + (m - z)) = \frac{1 + \frac{z}{m}}{4 \frac{z}{m} + (- e^{- b} (- E_{i} (b) + ln b + γ))} \end{array}$ To proceed, notice that the fraction $\frac{z}{m}$ is the number of zero count bins out the whole. This fraction is approximated by the probability of zero counts from the Poisson distribution, thus $\frac{z}{m} \approx P (0 | b) = e^{- b}$ . Thus, the values minimizing the least squares value using the three background strategies are $\begin{array}{l} (64b1) & b_{1} = \frac{e^{b} + 1}{1 - (- E_{i} (b) + ln b + γ)} \\ (64b2) & b_{2} = \frac{e^{b} + 1}{4 - (- E_{i} (b) + ln b + γ)} \\ (64b3) & b_{3} = \frac{1 - e^{b}}{- E_{i} (b) + ln b + γ}, \end{array}$ and they are plotted in Fig. 10 both with their absolute value and relative to the true background. If one stayed to the tactic of using $σ_{i} = \sqrt{n_{i}}$ independent of the value of n, it is immediately clear that the denominator in 57 is infinite due to the first term if a single zero count bin is present forcing the b estimate to zero.

The limit for large background values is found for all three estimators in (64) by identifying the important parts to be the exponential in the nominator and the exponential integral in the denominator. In the large limit $\begin{matrix} (65) & E_{i} (b) \to_{b \to \infty} e^{b} (\frac{1}{b} + \frac{1}{b^{2}} + \dots), \end{matrix}$ which then makes the fraction $\begin{matrix} (66) & \frac{e^{b}}{E_{i} (b)} \approx \frac{1}{\frac{1}{b} + \frac{1}{b^{2}}} . \end{matrix}$ Looking at the deviation from the actual value one finds $\begin{matrix} (67) & \frac{1}{\frac{1}{b} + \frac{1}{b^{2}}} - b = \frac{- b}{1 + b} = \frac{- 1}{1 + ϵ}, \end{matrix}$ where $ϵ = \frac{1}{b}$ goes to 0 when b goes to ∞. The limit is readily found from insertion $\begin{matrix} (68) & lim_{b \to \infty} b_{i} - b = lim_{ϵ \to 0} \frac{- 1}{1 + ϵ} = - 1 . \end{matrix}$ It has thus been found that all of the three background strategies, $b_{i}$ , in the limit of infinite background yields a value 1 too low as compared to the true Poisson mean.

Sensitivity of Multinomial and Poisson fits

During the fitting procedure of the MnF₂ it became apparent that the stability of the different methods were different, see Fig. 11. When performing the parameter estimations using the different techniques and the same initial guess only the Gaussian approach was robust against a model not fully capturing the data. Both the Multinomial and Poisson methods are influenced by the second feature in the data. This is seen by the lowering of peak intensity and broadening of peak width balancing to fit both peaks. From the discussion of log-likelihoods shallowness of the Poisson and Multinomial log-likelihood as compared to the $χ^{2}$ methods it is sensible the Poisson and Multinomial methods are less stable against data not explained by the model. Two ways of overcoming this exist; mask the data not explained by the model, i.e. $H < - 1$ , or extend the model to include both peaks. The latter suggestion introduces extra parameters to be fitted. However, in this particular case the position of the peaks are to be centered around −1 and their integrated intensities are to be equal. Extending this analysis requires a global model and, preferably, a model of the instrument effect on the data. With such an instrument model no excess parameters are introduced.

References

Arnold ,

J.C.

Bilheux ,

J.M.

Borreguero ,

Buts ,

S.I.

Campbell ,

Chapon and

Zikovsky , Mantid – data analysis and visualization package for neutron scattering and μSR experiments, Nuclear Instruments and Methods in Physics Research, Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 764 (2014), 156–166. doi:10.1016/j.nima.2014.07.029.

R.J.

Barlow , Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences, Wiley, 1999.

Farhi ,

Debab and

Willendrup , IFit: A new data analysis framework. Applications for data reduction and optimization of neutron scattering instrument simulations with McStas, Journal of Neutron Research 17(1) (2014), 5–18. doi:10.3233/JNR-130001.

Groitl ,

Graf ,

J.O.

Birk ,

Markó ,

Bartkowiak ,

Filges and

H.M.

Rønnow , CAMEA – a novel multiplexing analyzer for neutron spectroscopy, Review of Scientific Instruments 87(3) (2016), 035109. doi:10.1063/1.4943208.

R.J.

Hill and

I.C.

Madsen , Effect of profile step counting time on the determination of crystal structure parameters by X-ray rietveld analysis, Journal of Applied Crystallography 17 (1984), 297–306. doi:10.1107/S0021889884011547.

iminuit team, iminuit – A Python Interface to Minuit, 2019, https://github.com/scikit-hep/iminuit.

James , Statistical Methods in Experimental Physics, 2nd edn, World Scientific, 2012.

James and

Roos , Minuit – a system for function minimization and analysis of the parameter errors and correlations, Computer Physics Communications 10(6) (1975), 343–367. doi:10.1016/0010-4655(75)90039-9.

Lass ,

Graf ,

Kägi ,

Müller ,

Bürge ,

Schild ,

M.S.

Lehmann ,

Bollhalder ,

Keller ,

Bartkowiak ,

Filges ,

Herzog ,

Greuter ,

Theidel ,

Testa ,

Favre ,

H.M.

Rønnow and

Niedermayer, Design and performance of the multiplexing spectrometer CAMEA, 2021, in preparation.

10.

Lass ,

Jacobsen ,

D.G.

Mazzone and

Lefmann , MJOLNIR: A software package for multiplexing neutron spectrometers, SoftwareX 12(2352-7110) (2020), 100600. doi:10.1016/j.softx.2020.100600.

11.

Numpy 2019, https://numpy.org/.

12.

V.V.

Patil and

H.V.

Kulkarni , Comparison of confidence intervals for the Poisson mean: Some new aspects, Revstat Statistical Journal 10 (2012), 211–227.

13.

Wallis , Binomial confidence intervals and contingency tests: Mathematical fundamentals and the evaluation of alternative methods, Journal of Quantitative Linguistics 20(3) (2013), 178–208. doi:10.1080/09296174.2013.799918.

14.

Yamani ,

Tun and

D.H.

Ryan , Neutron scattering study of the classical antiferromagnet MnF2: A perfect hands-on neutron scattering teaching course, Canadian Journal of Physics 88 (2010), 771–797. doi:10.1139/P10-081.