Abstract
This paper deals with a new and efficient collective optimization approach, based on DC (Difference of Convex functions) programming and DCA (DC Algorithm), powerful tools of nonconvex programming. Exploiting the efficiency and the flexibility of DCA we develop the so-called collaborative DCA in which divers DCA based algorithms are cooperated in an effective way. Two versions of collaborative DCA are proposed and their applications on clustering, a fundamental problem in unsupervised learning, are studied. Numerical experiments are performed on several datasets. The comparative results with three DCA component algorithms show that the collaborative DCA outperforms them on quality and it realizes a good trade-off between the quality of solutions and the running time.
Introduction
Nonconvex (differentiable/nondifferentiable) programming and global optimization have known, during the past three decades, dramatic developments around the world. The absence of convexity creates a source of difficulties of all kinds, in particular the distinction between the local and global minima, the nonexistence of verifiable characterizations of global solutions, etc. Many current approaches for nonconvex programming are not generally effective in large-scale problems from real-world applications. A great challenge of nonconvex programming is finding efficient algorithms that realize a compromise between the quality and the scalability to tackle large-scale problems.
DC (Difference of Convex functions) and DCA (DC Algorithms) constitute the backbone of nonconvex programming and global optimization. Having more than thirty years of history, these efficient tools have been exploited by many researchers and practitioners in the world, to model and solve nonconvex programs from many fields of applied sciences (see e.g. [13, 16–18] and references quoted therein). DC programming and DCA address the problem of minimizing a function f which is a difference of two convex functions on the whole space
The construction of DCA involves DC components g and h but not the function f itself. Hence, for a DC program, each DC decomposition corresponds to a different version of DCA. Since a DC function f has an infinite number of DC decompositions which have crucial impacts on the qualities (speed of convergence, robustness, efficiency, globality of computed solutions, etc) of DCA, the search for a “good” DC decomposition is important from an algorithmic point of view. Exploiting the nice effect of DC decompositions of the same DC function as well as the one of various equivalent DC formulations of the same problem are challenging issues in DC programming and DCA. For this purpose, we propose in this paper a collective optimization scheme named Collaborative DCA or CoDCA for short. The idea is simple, but attractive: at each running cycle of CoDCA we run parallel some iterations (typically one iteration) of several DCA based algorithms (called DCA component) from their best current solution, and then exchange the obtained solution between the component algorithms to update the best solution to each of them. The CoDCA is terminated when all DCA components are stopped. Such a collective intelligent optimization scheme strongly contributes to the transfer of knowledge and power from the individual to the collective.
Collective optimization has been investigated in the literature, for instance the practical Swarm optimization (see the recent survey [19]) or the Bee algorithms (see e.g. [15]). However these approaches are completely different to our collaborative framework in which divers algorithms are collaborated while applying on different formulations of the same problem. A recent collaborative approach (in the same sense of our work) is the metastorming [22] which collaborates some metaheuristic methods. Nonetheless, the collaboration between iterative deterministic algorithms has been never considered. Perhaps that is because there are a very few efficient iterative deterministic algorithms for the same nonconvex problem. The idea of CoDCA is motivated by the efficiency and the flexibility of DCA, more precisely DCA is a descent method which often gives a global solution from a good starting point. There are several DC formulations, and consequently several DCA schemes for the same DC problem.
CoDCA aims to exploit the efficiency of all DCA components at each running cycle to get a good starting point for each of them.
Various versions of the collaborative DCA can be investigated for several applications. The following questions are crucial for designing CoDCA: Search of various DC decomposition of the same objective function or equivalent DC formulation of the considered problem? Design of corresponding DCA schemes? How long (or how many iterations of) each DCA component would be applied in each running cycle? How to define “the best current solution” for each DCA component?
These questions should be considered in the particular context of the considered problem for well exploiting its special structure.
In this paper, we propose two collaborative DCA schemes for hard clustering. Our work is motivated by the facts that there are some efficient, fast and scalable DCA based algorithms for clustering. We will address all the about question while designing CoDCA.
The paper is organized as follows. The generic collaborative DCA schemes are presented in Section 2 codca while collaborative DCA for clustering is discussed in Section 3 sec:clustering. We start this section by introducing different formulations of the clustering problem and their corresponding DCA, and then indicate how CoDCA works for clustering. The computational experiments are reported in Section 4 experimentalResults. Finally, Section 5 conclusion concludes the paper.
Collaborative DCA: a generic scheme
Consider now the optimization problem of the form
execute parallel a certain number of iterations of each DCA component; exchange and update the best current solution to restart each DCA component in the next cycle.
The efficiency of each scheme depends mainly on the three following elements: the efficiency of DCA components - this is the most important element of CoDCA which results to the quality of the shared solution; the number of iterations in each component algorithm at each step (the moment when the exchange information occurs); the way to moderate the exchange information, more precisely, the way to update the best current solution for restarting each DCA component.
By changing at least one of the above elements we get various CoDCA algorithms for the problem (1).
We propose here two versions of CoDCA that differs from one to other by the way to update “the best solution” for each DCA component. In the first scheme, named
The two CoDCAs are described in the Figure 1.

Generic scheme of CoDCA.
Clustering, which aims at dividing a dataset into groups or clusters containing similar data, is a fundamental problem in unsupervised learning and has many applications in various domains. In recent years, there has been significant interest in developing clustering algorithms to massive datasets (see e.g. [1–5, 20] and the references therein.
The general term “clustering” covers many different types of problems. All consist of subdividing a dataset into groups of similar elements, but there are many measures of similarity, many ways of measuring, and various concepts of subdivision (see [6, 7] for a survey). DCA has been extensively investigated for various kinds of clustering problems: hard/fuzzy/partitional/hierarchical/weighted clustering (refer to [14] for a short review). In the sequel, we will use the term “clustering” to indicate the hard partitional clustering.
An instance of the clustering problem consists of a dataset
DC formulations and DCA based algorithms for clustering
The formulation of MSSC based on bilevel programming was first introduced in [21]:
The kernel version of (3) via Gaussian kernel function
Another equivalent formulation of MSSC, which is a DC integer problem, was proposed in [12]. Let U = (u
li
) ∈ mbrk ×n with l = 1, …, k and i = 1, …, n be the matrix defined by
Then a straightforward mixed integer formulation of MSSC is
The problem (5) was reformulated as a continuous optimization problem in [12] thanks to an exact penalty technique, where the resulting problem was proved to be a DC program.
DC programming and DCA have been developed in Le Thi et al. [10] to (3) and Le Thi et al. [12] to (4) as well as to (5). The proposed DCA schemes are original and very inexpensive because either DCA is explicitly computed [10, 12] or it amounts to computing, at each iteration, the projection of points onto a simplex and/or onto a ball, and/or onto a box, that are all determined in the explicit form [12]. We investigate in this work collaborative DCA schemes using these DCA based algorithms as components. We shortly describe below the DCA component algorithms, the reader is referred to [10, 12] for more details about the DC formulations of these problems and the construction of DCA for them.
1. For i = 1, …, n, choose
2. For l = 1, …, k, compute
3. t ← t + 1
1. For i = 1, …, n, choose
2. Compute Vt+1 by setting: ∀l = 1, …, k,
3. t ← t + 1.
Let Δ
k
be the (k - 1)–simplex in mbrk defined by
Considering the penalty function P defined on mbrk ×n by
1. Define (Y
t
, Z
t
) by setting: ∀i = 1, …, n; l = 1, …, k,
2. Define (Ut+1, Vt+1) by setting: ∀i = 1, …, n; l = 1, …, k,
3. t ← t + 1
We will show how CoDCA is performed for clustering by indicating the parameters to be used in CoDCA schemes. As we have three DCA components, p is equal to 3. The common objective function in
From the convergence properties of DCA we have
Numerical Experiments
Results We have implemented the algorithms in the Visual C++ 2010 environment, with OpenMP enabled for parallel programming. Numerical experiments are conducted on an Intel (R) Xeon (R) CPU E5-2630 v2 @2.60 GHz with 32 GB of RAM.
Experiment setting and testing datasets
All the datasets for testing are described in the Table 1.
The description of the datasets
The description of the datasets
Parameters used in our experiment are set as following: the penalty parameter τ in
In the Table 2, we present the accuracy of clustering and the CPU time (measured in seconds) of the five comparative algorithms: the three DCA component (
Accuracy (in percent) and CPU time of the comparative algorithms. Bold characters indicate the best results.
Accuracy (in percent) and CPU time of the comparative algorithms. Bold characters indicate the best results.
From the numerical experiments we observe that: Not surprisingly, the two collaborative DCAs improve the accuracy of clustering of all three algorithms
About the two versions of CoDCA: in terms of accuracy,
We have proposed an efficient collective optimization approach based on DCA for solving DC programs. The idea is to cooperate several DCA versions applied on different DC formulations of the considered problem to exploit the power of each of them. Two collaborative DCA schemes were developed and applied on clustering, an important and hard topic of data mining. Numerical experiments are performed on several datasets. The results indicated the effectiveness of the cooperative procedure and their superiority in comparison with component algorithms. This work opens the door for an attractive research direction in intelligence collective optimization: the nice effect of DC decomposition in the design of DCA should be exploited to develop collaborative DCA for several nonconvex optimization problems. Work in this direction is in progress.
Footnotes
Acknowledgment
This research is funded by Foundation for Science and Technology Development of Ton Duc Thang University (FOSTECT), website: http://fostect.tdtu.edu.vn, under Grant FOSTECT.2017.BR.10.
