A vision from a physical point of view and the information theory on the image segmentation

Abstract

Entropy has been used in many fields of computer vision, like image restoration, edge detection, pattern recognition, and as an evaluation method for image segmentation. The mean shift iterative algorithm (MSHi) was proposed in 2006, where the Shannon entropy was used as a stopping criterion. Later, it was introduced a theorem where this ensures, with a new stopping criterion, the convergence of the MSHi and determines what happens with the entropy at the limit of the segmentation process. The goal of this paper is carry out an analysis of the implications of this theorem and highlight the relation that were found from a physical point of view with image segmentation and the information theory. This last aspect being the novel part of this work.

Keywords

Shannon entropy image segmentation corollary

1 Introduction

The concept of entropy is essential in the foundation of statistical physics. It first appeared in thermodynamics through the second law of thermodynamics [3]. The notion of entropy has been broadened by the advent of statistical mechanics and has been still further broadened by the later advent of information theory. In fact, in 1865 Rudolph Clausius presented the thermodynamic argument for the existence of entropy, and introduced the unit for entropy, but it did not become fashionable [4].

Later, Boltzmann (who was one of the most important physicists of the nineteenth century) proposed a statistical explanation of the second law of thermodynamics through the formula E = k log(W), which express the relationship between entropy E and the probability W. Boltzmann’s point of view had a great relevance to describe the precise relationship between the thermodynamical properties of macroscopic bodies with their microscopic constitution, and the role of probability in this relationship. However, Boltzmann equated the negative of his E-function with Claudius’s thermodynamic entropy and thus claimed to have proved the second law of thermodynamics via statistical mechanics; although this statement caused almost immediately after publication severe criticism of his contemporaries [5].

Nevertheless, nowadays in many texts is frequently spoken about Boltzmann entropy, and over a more general definition of Gibbs entropy, called “Generalized Boltzmann-Gibbs Entropy”, where Boltzmann entropy E is defined for a macroscopic state, whereas Gibbs entropy is defined over a statistical ensemble. This last generalized definition can be reduced to the form of Shannon entropy [6].

Indeed, in 1949 Claude Shannon introduced a formalism designed to solve certain specific technological problems in communication engineering [7], and redefined the entropy concept of Boltzmann-Gibbs as a measure of uncertainty regarding the information content of a system. He proposed a new revolutionary probabilistic way of thinking about communication and defined an expression for measuring quantitatively the amount of information produced by a process [8].

The Shannon entropy was a new concept in the field of information theory, and in computer vision has been used mainly in edge detection, in image restoration, and as an evaluation method for some image segmentation algorithms [9, 10].

From the point of view of digital image processing, the entropy of an image is defined as: $E (I) = - \sum_{i = 0}^{2^{B} - 1} p_{i} log 2 (p_{i}),$ (1) where B is the total quantity of bits of the digitized image I , and p_i is the probability of occurrence of a gray-level value.

Within a uniform region, entropy reaches the minimum value. Theoretically speaking, the probability of occurrence of the gray-level value into a uniform region is always one. In practice, when one works with real images the entropy value does not reach, in general, the zero value. This is due to the images existent noise. Therefore, if we consider entropy as a measure of the disorder within a system (image), it could be used as a good stopping criterion in image segmentation.

The mean shift iterative algorithm (MSHi) was proposed by Rodríguez and Suarez in 2006 [1], and it has been used in many segmentation works, where we took as stopping criterion the Shannon entropy [11 –14]. This first version of the MSHi algorithm stopped when the relative rate of change of entropy from one iteration to the next falls below a given threshold. In 2013, Garcés et al. defined a new stopping criterion based on the entropy of the difference between two images [15], and later they formalized this ideas as a new metrics of similarity between images (called, Natural Entropy Distance (NED)) [16].

In 2017, Rodríguez et al. introduced a theorem that ensures the convergence of the MSHi with NED as stopping criterion [2], and determined the behavior of the entropy at the limit of the segmentation process. So, the goal of this paper is carry out an analysis of the implications of this theorem and highlight the concatenation that were found from a physical point of view with image segmentation and information theory. This last aspect being the novel part of this work.

The remainder of the paper is as follows. In Section 2, a retrospective review about entropy and some theoretical aspects are given. In section 3, we carry out a review of the use of entropy in computer vision. Section 4 presents the mean shift iterative algorithm and the concatenation that were found from a physical point of view with image segmentation and information theory. In Section 5 provides some further topics for future works. Finally, in Section 6 the conclusions are given.

2 A retrospective about entropy. A review of some theoretical aspects

Although in many textbooks there has been a big dispute on that Boltzmann never presented explicitly the formula E = k log(W), the truth is that many works of him exemplifies several of Boltzmann’s most important contributions to modern physics. In effect, the Boltzmann distribution also has passed almost unchanged into the quantum world, and in [17] was expressed that: even without its connection to entropy, the Boltzmann distribution is of remarkably wide ranging importance. His works being a key step in developing the fully probabilistic basis of entropy and the second law of thermodynamics.

In order to avoid all the philosophical discussion around the Boltzmann entropy, and to enter into the analysis of the Shannon entropy, the main goal of this work, we will briefly refer to the generalized Boltzmann-Gibbs entropy, which can be reduced to the form of Shannon entropy [6]. Indeed, the Boltzmann-Gibbs entropy is given by the following definition.

Definition 1. Let φ ={ X₁, X₂, . . . X_n } be a set of macroscopic or observable states of a system, $P$ a σ-algebra of the elements of φ, and W (X), $(X \in P)$ is a measure on $P$ . The measure W is uniquely defined by its values W (X_n) , (n = 0, 1, 2, . . .). The measure W (X_n) is called the statistical weight or the structure function of the observational state X_n, which is the total number of microscopic complexions compatible with the observational state X_n in the definition of Boltzmann entropy. Let P_n be the class of all probability measures P (X) on $ℙ$ absolutely continuous with respect to W, $p_{n} = {p (x), X ∈ ℙ / - P(φ) = 1, P < < w,$ (2) and using Radon-Nikodym theorem [6], it’s possible to obtain: $P (X) = \sum_{X_{n} \in X} ρ (X_{n}) W (X_{n}),$ (3)

$\forall X \in ℙ$ andP ∈ P_n

where, $ρ_{n} = \frac{P (X_{n})}{W (X_{n})}, (n = 0, 1, 2, . . . . . .)$ (4)

The expression (4) is analogous to the coarse density of microstates defined over the phase space of the usual definition of Gibbs entropy [6]. Then, the Boltzmann-Gibbs entropy is defined as, $E = - K \sum_{x_{n}} P (X_{n}) ln (\frac{P (X_{n})}{W (X_{n})})$ (5)

The expression (5) measures the total uncertainty or disorder associated with the microscopic and macroscopic states of the system, and it is, therefore, an example of total entropy introduced in information theory [6]. This definition is very useful for the case of image analysis since a simplest image (segmented) has minor disorder grade, which mean that the existent pixels have minor uncertainty.

In the case of a classical system, the Boltzmann-Gibbs entropy is given by the following definition.

Definition 2. Let N be a system consisting of elements (molecules, organisms, etc.) classified into C_i, i = 1, 2, . . . , n classes (energy-states, species, pixels, etc.). Let N_i be the occupation number of the ith-class. The macroscopic state of the system is given by the set of occupation number X_i = (N₁, N₂, . . . N_n). The statistical weight or degree of disorder of the macrostate X_i is given by [6], $W (X_{i}) = \frac{N!}{\prod_{i = 1}^{n} N_{i}!}$ (6)

The expression (6) represents the total number of macrostates or complexions compatible with the constraints the system is subjected. In the case of images, we might consider the macrostate like the image, while that microscopic states being the classes, which have a certain correlation between the pixels (constraints).

For large N_i, we can reduce the Boltzmann-Gibbs entropy, given by expression (6), to the form of Shannon entropy [6], $E = - KN \sum_{i = 1}^{n} P_{i} ln P_{i},$ (7) where P_i = Ni / - N is the relative frequency of the ith-class or energy state; that is, it is the probability that a molecule lies in the ith-energy-state, or similarly that a ith-pixel belong to the ith-class. Here, in the case of the statistical physics, K is the Boltzmann’s constant; while that in another context K is a constant depending on the unit of measurement of entropy, (the proof of expression (7) appears in Appendix A).

Nowadays, it is well-known that Shannon entropy is the key concept of information theory [7], which has found wide applications in different fields of science and technology [18], and provides a measure of uncertainty associated with the probability distribution. From a computer vision point of view the Shannon entropy has been one of the most used, due to the strong relationship that exists between information theory and digital image processing. From the physical point of view, we also established a good link, especially with images segmentation.

Returning to information theory, motivated with the possibility to obtain an efficient transmission of information over a noisy communication channel, Shannon redefined the entropy concept of Boltzmann-Gibbs as a measure of uncertainty regarding the information content of a system. He introduced a new probabilistic way of seeing communication and defined an expression for measuring quantitatively the amount of information produced by a process, which is given by expression (6). Note that with the Shannon entropy function is possible to proof that:

E (P) is maximal for p₁ = p₂ = •• • = p_n = 1N, where N is the total number of events.

E (P) = 0 just when one p_i is “1” and the rest are “0”

Logarithm is to base 2: ln ₂ (x) = y ⇒ x = 2^y (for example, 8 bit/pixel ⇒2⁸ gray levels).

In information theory what we exposed has a great importance; since this means, at the message level, it’s possible to encode them using only K • E (P) bits; that is, there are only 2^K•E(P) typical messages with K letters. Therefore, E (P) represents the maximum amount of letters that can be compressed as normal messages drawn from a given set [7]. According to Shannon this have a good implication since, “if one is trying to use a noise channel to send a message, then the conditional entropy specifies the number of bits per letter that would need to be sent by an auxiliary noiseless channel in order to correct all the errors due to noise”.

The expressed in previous paragraphs have a strong relationship with image segmentation, and with the theorem presented in [2]. For example, if we carry out an analogy between image segmentation and information theory, we can consider the original image as the transmitter and the result of segmentation as the receiver. It is known that the receiver decreases in entropy (but less than the increase at the transmitter) [21], and the same happen with the entropy in the segmented image and which it will be a corollary of the theorem presented in [2].

3 Use of entropy in computer vision. Some comments and examples

Many authors agree that the visualization process can be treated as an information channel; that is, a visual communication channel that attempts to communicate the information in the source data to the destination, the viewer [22 –24]. For example, many time the scenes, for a better interpretation, need to be transformed by a sequence of steps (algorithms) such as filtering, elimination of noise and segmentation (among other) until its projection. These steps, we can analyze from two point of view, from a physical point of view as a problem of energy conservation (trying that algorithm causes in each step the lowest possible loss of energy), or from the point of view of information theory (preserving the maximum amount of information from the input and generate the output for the next stage of flow [22]). When information loss is inevitable, such as happen in image restoration or in the projecting 3D data to 2D images, it is necessary to have a special care in the selection of the appropriate parameters to preserve as much information (energy) as possible, otherwise the processing result will be disastrous.

The function that connects the relation between the energy and the information is the entropy. Indeed, without the intention of studying in depth very much this matter, let’s take a look at the fundamental equation of thermodynamics [25], $dU = TdE - PdV + μ dN,$ (8) where T is the temperature, E is the entropy, P is the pressure, V is the volume, μ is the chemical energy (which is important in systems which can exchange particles with some reservoir), and N is the number of particles (pixels). The U letter is the energy and in this formulation is a function of E, V, N, so we can write U (E, V, N). If we make E smaller, then we have to decrease the energy U. In other words, if dE < 0 then dU is also <0. Indeed, dU = TdE, that is, the temperature is just the ratio of small changes in energy and entropy. Therefore, when observing the expressions (6) and (7) we can see the interconnection between information and energy; but in this work, we will refer fundamentally to the entropy.

On the other hand, Fig. 1 shows an analogy between message transmission and data visualization [22].

Fig.1

Schema that represents a transmitter-receiver device. (a) Message transmission, (b) data visualization.

To date, in computer vision (CV) entropy has been widely used. Only some examples, in [8] was carried out a deep review of the use of entropy for different applications, and was seen that in CV had been used majorly in image thresholding, which often represents a first step in image analysis. In the mentioned reference was also evidenced that maximum entropy has been utilized in tackling of various real life problems; for example, in radio astronomical interferometry; in spectroscopy; in detecting occurrence of abnormal activities in a video stream in accidents in an escalator, where the frames resulting in a higher error function will have higher entropy; in image reconstruction for positron image tomography, among other many examples.

In [22], the Shannon entropy was used for modeling a scientific data set as a discrete random variable where each data point in the domain carries a value as the outcome. Here, the entropy function E (P) indicated how much information the data set contains. If the distribution in the histogram was uniform across all bins, it was difficult to predict the value of a voxel, due to that the entropy of the data set was high. On the contrary, when the histogram distribution was highly skewed into a few bins, it was easy to guess the value of a voxel, entropy of the data set being low.

In [26] was proposed a new entropy-based evaluation method, taking into consideration that a good segmentation algorithm should maximize the uniformity of pixels within each segmented region, and minimize the uniformity across the regions. In the proposed method, the entropy for region j was defined as, $E_{v} (R_{j}) = - \sum_{m \in v_{j}^{(v)}} \frac{L_{j} (m)}{S_{j}} ln \frac{L_{j} (m)}{S_{j}},$ (9) where region j is a region of the image, v was defined as one of the features among those used to describe the pixels in region j, and $v_{j}^{(v)}$ as the set of all possible values associated with feature v in region j. Then, for region j of segmentation and value m of feature v in that region, was used to denote the number of pixels in region j that have a value of m for feature v (e.g. luminance) in the original image.

From an information theory point of view, $\frac{L_{j} (m)}{S_{j}}$ represents the probability that a pixel in region R_j has a luminance (or other feature) value of m. So E_v (R_j) is the number of bits/pixel needed to encode the luminance for region R_j. Finally, in [26] was defined the expected region entropy of image I as the expected entropy across all regions where each regions has weight (or probability) proportional to its area. In other words, the expected region entropy of segmentation I was, $E_{r} (I) = \sum_{j = 1}^{N} (\frac{S_{j}}{S_{I}}) H (R_{j}),$ (10) where the expression (10) is used as a measure of the uniformity within the regions of I , and when each region has very uniform luminance, then E_r (I) will be small. Other details about this formulation and other aspects related with entropy can be consulted in [26, 27].

4 The mean shift iterative algorithm. An analysis from the physical and the information theory of view

The mean shift is a non-parametric procedure that has demonstrated to be an extremely versatile tool for feature analysis. It can provide reliable solutions for many computer vision tasks [28]. The mean shift filtering (MSh) was proposed in 1975 by Fukunaga and Hostetler [29]. It was largely forgotten until Cheng's paper rekindled interest in it [30]. Unsupervised segmentation by means of the MSh carries out two steps; a first step is a smoothing filter, and a second is to carry out the segmentation [28].

On the other hand, Rodríguez and Suarez [32] proposed the mean shift iterative algorithm (MSHi) in 2006, and the same was employed to carry out image segmentation. The novelty of the proposed algorithm was to use the Shannon entropy as a stopping criterion. The choice of the Shannon entropy as a measure of goodness deserved several observations, which were detailed in [2 , 11– 14].

However, it is very important to point out that the MSh by itself is a filtering process. For this reason, in [28] for the segmentation process two steps were proposed; a first step was to apply the MSh and the other, to carry out segmentation. On the other hand, in [28] it was also expressed that the segmentation step does not add a significant overhead to the filtering process. This issue was our principal motivation, i.e., to arrive to the segmented image from the filtering process without an additional step. Which was the problem then? How to stop? The answer to that question was to use the entropy as a stopping criterion [1]. That was the origin of the MSHi. Therefore, the MSh is a filtering process, while the MSHi is a segmentation algorithm that uses, by default, the MSh in the iteration process.

As we expressed in the introduction, in the first version of the MSHi we used as stopping criterion the difference of entropy (called old criterion) between the first and the next iteration [1]; and later, Garc $\overset{´}{e}$ s et al. proposed a new stopping criterion taking into consideration the entropy of the difference between two images [15]. Although this could seem trivial, physically speaking, it is not so, and in [2] an explanation in detail appears.

So alone an example, Fig. 2 shows two different images; when using the old criterion, we obtain that the difference of entropy is similar to zero, which is not correct. With the new criterion(E (/I_k+1 - I_k/)), the spatial information was taken in consideration (and implicitly the correlation among the pixels), and a major stability was obtained. Therefore, in real conditions is not correct to consider that the pixels are not correlated; that is, as independent random variables. This is one of the problems that have the classical threshold operators, to assume that the pixels are statistically independent [31].

Fig.2

Dissimilar images with entropy = 1.

All this physical analysis was mathematically corroborated in [16] by using ring theory. In Garc $\overset{´}{e}$ s et al. the following theorem was proved [16].

Theorem 1. If two images A and B are strongly equivalents, then they are weakly equivalents.

The implication of Theorem 1 is important since in Fig. 2 images are weakly equivalents, but they are not strongly equivalents, which means that A ≍ B ≱ A ≅ B. Therefore, one should understand that two images strongly equivalents have the same histogram of frequency, except for a uniform shift of all gray levels.

Note, that the mathematically expressed has a close relation with the physics of digital image, since the strong equivalence takes into consideration the spatial relationship among pixels (correlation), and therefore this assumes that exists statistical dependency among them. In addition, we want to point out that this dependency among pixels is not necessarily the same in all image. So, Garc $\overset{´}{e}$ s et al. proposed a new metric of similarity among images by using the ring theory, and was called the “Natural Entropy Distance (NED)” [16]. On the other hand, the authors proved also that NED fulfills with properties related with the axioms of distance.

The Theorem 1 establishes that the old and new criterions are very different, having the new criterion better properties and being more suitable for computer vision tasks. Moreover, Rodriguez et al. proved the following theorem that has interesting implications [2]. These implications (corollaries) being the most outstanding results in this paper.

Theorem 2. When the entropy of the absolute difference between two obtained images from an iteration and the next iteration is taken as a stopping criterion in the mean shift iterative algorithm, whatever the chosen threshold, this condition is enough to achieve the convergence. In addition, at the limit the entropy will be zero.

A first implication of Theorem 2 is the following corollary.

Corollary 1. The entropy of the segmented image will always be less than the original image, and this is always fulfilled independent of the chosen path (of the chosen segmentation method).

From an information theory point of view, when considering the Shannon entropy a measure of the disorder of a macrostate (for example, an image), will help that when the MSHi algorithm is used, within each region the entropy diminishes in measure that is more homogenous. On the other hand, from the physical or of the second law of thermodynamics point of view, it is known that the entropy is a state function, E (B) - E (A) is independent of the path, regardless whether it is reversible or irreversible. For an irreversible path, the entropy of the environment changes, whereas for a reversible one it does not. And, in the case of image segmentation, it can be considered as an irreversible process.

Returning to information theory, it is known that in a transmission channel the receiver decreases in entropy with regard to the transmitter [21], what happens with the segmentation process (Corollary 1). In short, by using the MSHi algorithm for image segmentation, Theorem 2 guarantees that Shannon’s theorem it is fulfilled and the second law of thermodynamics too. On the other hand, image segmentation can be considered as an optimization process with not exact solution. Therefore, many times the important, given a segmentation problem, it is not finding the exact solution, but the closest to the optimal, which when using the MSHi algorithm can be guaranteed.

Corollary 2. The sensitivity of segmentation process when using the MSHi algorithm is given by the selection of the stopping threshold.

Note that Theorem 2 ensures convergence whatever the chosen threshold; therefore, we will have a finer or coarser segmentation depending on the selected value of the stopping threshold. In addition, from Theorem 2 is possible to establish the following corollary.

Corollary 3. The MSHi algorithm at the limit produces a completely homogeneous image and its histogram will be a Dirac delta shifted to the gray level that corresponds.

Observe that Theorem 2 ensures that at the limit the entropy will be zero, then p_i = 1 what it implies only a gray level.

In short, from an information theory point of view -p_i ln(p_i) means the amount of information associated to pixel x_i (where, $p_{i} = \frac{x_{i}}{N}$ ). Then, we can see the image segmentation as a reduction process of information, since the segmented image is a simplification of original image, which we can consider as another interpretation of Theorem 2 and the associated Corollaries. Therefore, a way of measuring quantitatively the quality of a segmentation process could be in trying of obtaining the least loss of information in the segmentation process, which could be achieved given an application, with the best selection of the segmentation method.

What just expressed in the above paragraph is of much importance and it is in line with the state of the art for the quantitative analysis of a segmentation process. It is known the quantity of existing methods for carrying out the quantitative validation of a segmentation method, and in many of them the procedures are very similar [33]. Our proposal could be a different way to validate and compare segmentation strategies. This will be experienced in future works.

5 Some experimental results and simulations

The experimental results and simulations that will be presented were carried out with the standard images shown in Fig. 3.

Fig.3

Standard images [34]. (a) Cosmonaut, (b) Bird, (c) Barbara, (d).

Figure 4 shows three examples of the performance of the MSHi algorithm. In the “ x ” axis appears the iterations of the MSHi and in the “ y ” axis the obtained values of entropy in each iteration of the algorithm.

Fig.4

The “ y ” axis shows the values of entropy, while the “ x ” axis represents the number of iterations.

The graphics in Fig. 4 show that entropy decreases in each one of the iterations. It is interesting to observe that in some iterations an increase of the entropy is noticed, but starting from this iteration the entropy falls quickly, and at the limit the entropy will be zero. Figure 4 is a graphic way of seeing the Corollary 1. To note that in the first iteration entropy in much greater than in subsequent iterations; that is, segmentation can be seen as a process of simplifying the original image, since the number of gray levels decreases in the process of homogenization.

Figure 5 presents a simulation of a segmentation process as the image becomes simpler (more homogeneous areas).

Fig.5

Simulation of decrease of entropy as the image becomes simpler.

The segmentation of the Astro’s image for different values of the stopping threshold is shown in Fig. 6 (simulation of Corollary 2). One can appreciate that the number of iterations increased in an abrupt way when the stopping threshold diminished from 0.001 to 0.0001, and one can also see that the segmentation is going to be refined (the homogenization degree increased in the segmented image).

Fig.6

(a) Segmentation for stopping threshold (st)=0.1, 2 iterations, (b) Segmentation for st = 0.05, 2 iterations, (c) Segmentation for st = 0.01, 4 iterations, (d) Segmentation for st = 0.005, 5 iterations, (e) Segmentation for st = 0.001, 7 iterations, (f) Segmentation for st = 0.0001, 60 iterations.

A simulation for the case of Corollary 3 appear in Fig. 7. Observe that Theorem 2 ensures that at the limit the entropy will be zero, then p_i = 1, what it implies only a gray level; that is, a completely homogeneous image. Therefore, el hitogram will have an only bin shifted to the gray level x_i

Fig.7

Simulation of a completely homogeneous image (only a gray level).

6 Conclusions

In this paper, we carried out a retrospective review of entropy functions since its emergence, and we did a broad theoretical analysis from point of view of the information theory and the physics, deepening in the Shannon entropy. We evidenced that majorly, entropy has been used for the purpose of image thresholding. We introduced and discussed about the MSHi algorithm, where the Shannon entropy was used as a stopping criterion, and where we proposed a theorem that ensures the convergence. Finally, we carried out a wide analysis of that theorem and its implications from a physical point of view and its relationship with image segmentation. We propose some further topics for future works.

6.1 Some further topics for future works

What written in the final paragraph of section IV can serve as base for the creation of another way of evaluating quantitatively the quality of a segmentation process or of several. This issue will be matter of next researches.

In physics, the notion of entropy is typically regarded as a measure of the degree of disorder and the tendency of physical systems to become less organized. Then, image segmentation violates this principle? The answer to this question also could give step to another criterion of quantitative evaluation for image segmentation, which we will analyze in next works.

We will propose an experimental work with several of the theoretical aspects considered in this paper.

Footnotes

Acknowledgments

The authors would like to thank the Instituto Politécnico Nacional for the support to carry out this research. H. Sossa appreciates the economic support received from the SIP-IPN and CONACYT under grants 20170693, 20180730, 20190007 and 65 (Frontiers of Science), respectively, to conduct this investigation.

Yasel Garcés received a postdoctoral scholarship from the “Dirección General de Asuntos del Personal Académico de la UNAM” (DGAPA) in the “Instituto de Biotecnología” (IBt-UNAM). Esley Torres received scholarships from CONACYT with grant 596179.

Appendix A

Let the following expression, (A.1) $W (X_{i}) = \frac{N!}{\prod_{i = 1}^{n} N_{i}!}$

But, (A.2) $ln (\frac{N!}{\prod_{i = 1}^{n} N_{i}!}) = ln (\frac{N!}{N_{1}! N_{21}! . . . N_{n}!})$ (A.3) $ln (\frac{N!}{N_{1}! N_{21}! . . . N_{n}!}) = ln (N!) - ln (N_{1}!) - . . . ln (N_{n}!)$

For large N_i the Stirling’s approximation establishes that, (A.4) $ln (N!) \approx N ln (N) - N + ln \sqrt{2 π N},$

but the last term in expression (A.4) is usually neglected so that a working approximation is: (A.5) $ln (N!) \approx N ln (N) - N$

Therefore, when substituting expression (A.5) in (A.3), we obtain, $\begin{matrix} ln (\frac{N!}{N_{1}! N_{21}! . . . N_{n}!}) \\ \approx N ln (N) - N - N_{1} ln (N_{1}) \\ - N_{1} . . . - N_{n} ln (N_{n}) - N_{n} \end{matrix}$ $\approx N ln (N) - N - (\sum_{i = 1}^{n} N_{i} ln (N_{i}) - \sum_{i = 1}^{n} N_{i})$

where (A.6) $\sum_{i = 1}^{n} N_{i} = N \approx N ln (N) - N - \sum_{i = 1}^{n} N_{i} ln (N_{i}) + N \approx N ln (N) - \sum_{i = 1}^{n} N_{i} ln (N_{i}) \approx \sum_{i = 1}^{n} N_{i} ln (N) - \sum_{i = 1}^{n} N_{i} ln (N_{i})$ (A.7) $\begin{matrix} \approx \sum_{i = 1}^{n} (N_{i} ln (N) - N_{i} ln (N_{i})) \\ \approx \sum_{i = 1}^{n} N_{i} (ln (N) - ln (N_{i})) \\ \approx \sum_{i = 1}^{n} N_{i} ln (\frac{N}{N_{i}}) \\ \approx - \sum_{i = 1}^{n} N_{i} ln (\frac{N_{i}}{N}) \\ \approx - \frac{N}{N} \sum_{i = 1}^{n} N_{i} ln (\frac{N_{i}}{N}) \end{matrix}$ (A.8) $\approx \sum_{i = 1}^{n} \frac{N_{i}}{N} ln (\frac{N_{i}}{N}) \approx - N \sum_{i = 1}^{n} p_{i} ln (p_{i})$ (A.9) $where p_{i} = \frac{N_{i}}{N}$

References

Rodríguez

, Suarez

A.G.

“An Image Segmentation Algorithm Using Iteratively the Mean Shift”, Book Series Lecture Notes in Computer Science Publisher Springer Berlin/Heidelberg, Volume 4225/2006, Book Progress in Pattern Recognition, Image Analysis and Applications, pp. 326– 335, 2006

Rodríguez

, Torres

, Sossa

J.H.

, Garcées

, “A new stopping criterion for the mean shift iterative algorithm. Its use in image segmentation”, International Journal of Imaging and Robotics 17(2), 2017.

Tong

, Statistical Physics, Preprint typeset in JHEP style – Hyper Version, University of Cambridge Part II Mathematical Tripos, 2012 http://www.damtp.cam.ac.uk/user/tong/statphys/sp.pdf

Boltzmann’s Work in Statistical Physics (Stanford Encyclopedia of Philosophy), First published Wed Nov 17, 2004; substantive revision Sun Aug 17, 2004

https://plato.stanford.edu/entries/statphys-Boltzmann/

Gottwald

G.A.

, Oliver

“Boltzmann’s Dilemma: An Introduction to Statistical Mechanics via the Kac Ring”, Journal of Siam Review 51(3) (2009), 613–635,http://www.siam.org/journals/sirev/51-3/70579.html

Chakrabarti

C.G.

and Kajal De, “Boltzman-Gibbs Entropy: Axiomatic Characterization and Application”, Internat J Math & Math Sci 23(4) 243–251, S0161171200000375, S©Hindawi Publishing Corp, 2000

Shannon

C.E.

A mathematical theory of communication, Bell Syst Tech J 27 (1948)379–423.

Chamoli

, Kukreja

, Semwal.

“Survey and Comparative Analysis on Entropy Usage for Several Applications in Computer Vision”, International Journal of Computer Applications 97(16) (2014).

Zhang

, Fritts

J.E.

, Goldman

S.A.

An Entropy-based Objective Evaluation Method for Image Segmentation, Storage and Retrieval Methods and Applications for Multimedia Edited by Yeung, Minerva M.; Lienhart, Rainer W.; Li, Choung-Sheng, Proceeding of The SPIE 5307 (2003)38–49.

10.

Suyash

P.A.

, Whitaker

R.T.

“Higher-Order Image Statistics for Unsupervised, Information-Theoretic, Adaptive, Image Filtering”, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 28(3) (2006)364–376.

11.

Rodriguez

, Torres

, Sossa

J.H.

“Image Segmentation based on an Iterative Computation of the Mean Shift Filtering for different values of window sizes”, International Journal of Imaging and Robotics 6(A11) (2011)1–19.

12.

Rodriguez

, Torres

, Sossa

J.H.

“Image Segmentation via an Iterative Algorithm of the Mean Shift Filtering for Different Values of the Stopping Threshold”, International Journal of Imaging and Robotics 7(1) (2012).

13.

Rodriguez

“Binarization of medical images based on the recursive application of mean shift filtering: Another algorithm”, Journal of Advanced and Applications in Bioinformatics and Chemistry, Dove Medical Press Ltd I 1–12, (2008).

14.

Dominguez

, Rodriguez

“Convergence of the Mean Shift using the Linfinity Norm in Image Segmentation”, International Journal of Pattern Recognition Research (2011)32–42.

15.

Garcés

, Torres

, Pereira

, Pérez

, Rodríguez

“Stopping Criterion for the Mean Shift Iterative Algorithm”, CIARP 2013, Part I, LNCS 8258, pp. 383– 390, 2013. Springer-Verlag Berlin Heidelberg

2013 http://link.springer.com/chapter/10.1007% 2F978-3-642-41822-8_48

16.

Garcés

, Torres

, Pereira

and Rodríguez

, “Application of the Ring Theory in the Segmentation of Digital Images”, International Journal of Soft Computing, Mathematics and Control (IJSCMC), 3(4) (2014).

17.

Sharp

, Matschinsky

“Translation of Ludwig Boltzmann’s Paper “On the Relationship between the Second Fundamental Theorem of the Mechanical Theory of Heat and Probability Calculations Regarding the Conditions for Thermal Equilibrium”, Entropy 17 (2015), 1971–2009; 10.3390/e17041971, 2015 www-mdpi-com.web.bisu.edu.cn/journal/entropy.

18.

Chakrabarti

C.G.

, Chakrabarty

“Shannon Entropy: Axiomatic Characterization and Application”, International Journal of Mathematics and Mathematical Sciences 17 (2005)2847–2854. 10.1155/IJMMS.2005.2847.

19.

Morales

, Pardo

, Vajda

“Uncertainty of Discrete Stochastic Systems: General Theory and Statistical Inference”, IEEE Trasactions on Systems, Man, and Cybernetics-Part A Systems and Humans 26(6) (1996).

20.

Roventa

“A note on Schur-concave functions”, Journal of Inequalities and Applications 159 (2012)http://www.journalofinequalitiesandapplications.com/content//1/159.

21.

Schiller

Motion Mountain, The adventure of physics: Vol. III, Light, Charges and Brains, Copyright © 1990–2016 by Christoph Schiller, from the third year of the 24th Olympiad to the first year of the 31st Olympiad (pag. 132) www.creativecommons.org/licenses/by-nc-nd/3.0/de

22.

Wang

, Shen

H.W.

“Information Theory in Scientific Visualization”, Journal of Entropy 13 (2011), 254–273; 10.3390/e13010254.

23.

Purchase

H.C.

, Andrienko

, Jankun-Kelly

T.J.

, Ward

Theoretical foundations of information visualization. In Information Visualization: Human-centered Issues and Perspectives; Kerren, A.; Stasko, J.T.; Fekete, J.D.; North, C., Eds.; Springer-Verlag: Berlin/Heidelberg, pp. 46– 64, Germany, (2008).

24.

Chen

, Jänicke

“An Information-theoretic framework for visualization”, IEEE Trans Vis Comput Graph 16 (2010)1206–1215.

25.

Santra

S.B.

26.

Zhang

, Fritts

J.E.

, Goldman

S.A.

“An Entropy-based Objective Evaluation Method for Image Segmentation”, Storage and Retrieval Methods and Applications for Multimedia 2004. Edited by Yeung, Minerva M.; Lienhart, Rainer W.; Li, Choung-Sheng, Proceeding of The SPIE 5307 (2003)38–49.

27.

Laia

W.K.

, Khanb

I.M.

, Pohc

G.S.

“Weighted Entropy-based Measure for Image Segmentation”, Procedia Engineering 41 (2012), 1261–1267, (ELSEVIER).

28.

Comaniciu

, Meer

“Mean Shift: A Robust Approach toward Feature Space Analysis”, IEEE Transaction on Pattern Analysis and Machine Intelligence 24(5), 2002.

29.

Fukunaga

, Hostetler

L.D.

The Estimation of the Gradient of a Density Function, IEEE Trans., Information Theory 21 (1975), 32–40.

30.

Cheng

“Mean Shift, Mode Seeking, and Clustering”, IEEE Trans., Pattern Analysis and Machine Intelligence 17(8) (1995)790–799.

31.

Pal

N.R.

, Pal

S.K.

“Entropy: A New Definition and its Applications”, IEEE Transactions on Systems, Man, and Cybernetics 21(5) September/October, (1991).

32.

Rodriguez

, Suarez

, Sossa

J.H.

“A segmentation algorithm based on an iterative computation of the mean shift filtering”, Journal Intelligent & Robotic System 63 (2011)447–463.

33.

Rodríguez

, Sossa

J.H.

Mathematical Techniques for Biomedical Image Segmentation. In R. Narayan (Ed.), Encyclopedia of Biomedical Engineering, 3 (2019), 64–78. Elsevier. ISBN: 9780128048290. Copyright © 2019 Elsevier Inc. All rights reserved. Elsevier https://www.com/books/encyclopedia-of-biomedical-engineering/narayan/978-0-12-29-0

34.

https://www.google.com.mx/search?biw=&bih=468& tbm=isch&sa=1&ei=fJO0XMKVMsjb5gLmgKWADg&q= standard+images% 2C+lena% 2C+pippers% 2C+mandril&o q=standard+images% 2C+lena% 2C+pippers% 2C+mandril &gs_l=img.3...0.0..24144...0.0..0.0.0.......1......gws-wiz-img.izU0sJTZBnk.