In contrast to the closest string problem (CSP), it is not immediately obvious how the feasible solution set should be defined for the FSP. In the former case, one seeks a sequence that is as close as possible to all sequences of a set Σ. It is clear that the element xj of a solution candidate
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$x = ( x_1 , \ldots , x_m )$$
\end{document}
should be selected from the set Vj; that is, the feasible solution set is
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}X = V_1 \times \ldots \times V_m. \tag{3.1}\end{align*}
\end{document}
The situation is different in the case of the FSP. Since one seeks a sequence as far as possible from all strings in Σ, it is in general not clear, from which character sets the elements of a solution sequence should be chosen. This depends on the respective application.
3.1. Restricted feasible solution set
In analogy to the CSP model (5) in Zörnig (2011, p. 6), the FSP with feasible solution set (3.1) can be modeled by the integer linear programming problem:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}& {\rm max} \ d \\& \rm s.t. \\ & m - \sum_{j = 1}^k y_{r_{i , j}} , j \ge d \quad
{\rm for} \ i = 1 , \ldots , n , \qquad\qquad\qquad\qquad\qquad (3.3) \\ & \quad \sum_{i = 1}^{v_{j}} y_{i , j} = m_j \quad {\rm for} \ j = 1 , \ldots , k , \\ & d, y_{i,j} {\rm nonnegative \ integers}.\end{align*}\end{document}
The variable yi,j represents the frequency of the character i in the positions of the solution sequence
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$t = ( t_1 , \ldots , t_m )$$
\end{document}
that correspond to the jth representative vector. The parameters k, mj, vj denote the number of such vectors, their multiplicities, and the number of different characters in the representative vectors. The length of the sequences is
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$m = m_1 + \ldots + m_k$$
\end{document}
.
The left side of the inequalities represents the Hamming distance between ti of T and the sequence encoded by the yi,j. (Note that the first index ri,j of the variables in the inequalities denotes the ith element of the jth representative vector.) The equations express the fact that the frequencies of characters
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$1 , 2 , \ldots , v_j$$
\end{document}
in the jth isomorphism class sum up to mj.
The practical utility of the above model becomes clear from the following proposition.
Definition 3.1
Let
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\bar{y}_{i , j}$$
\end{document}
denote the optimal solution values of the LP-relaxation of Equation (3.3) and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$f_{i , j} : = \bar{y}_{i , j} - \lfloor \bar{y}_{i , j} \rfloor$$
\end{document}
, the fractional parts (0≤fi,j<1 for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$i = 1 , \ldots , v_j , j = 1 , \ldots , m$$
\end{document}
). We define
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$F_j : = \sum\nolimits_{i = 1}^{v_{j}} f_{i , j}$$
\end{document}
, which is an integer from the set
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\{0 , \ldots , v_j - j \} $$
\end{document}
, since the equations of (3.3) are satisfied. A standard rounding rule for problem (3.3) is defined as follows. Let
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$j \in \{1 , \ldots , m \} $$
\end{document}
be any fixed integer. Select the Fj values from the numbers
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\bar{y}_{i , j}$$
\end{document}
with the largest fractional parts (which need not be uniquely determined) and round them up. Round down the remaining values
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\bar{y}_{i , j}$$
\end{document}
.
Obviously, this rounding procedure results in a feasible solution of the ILP problem (3.3), which will be called the standard rounding solution.
Proposition 3.1
Let
d
be the optimal value of the LP-relaxation of problem (3.3). Then, the objective value
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\bar{d}$$
\end{document}
of a standard rounding solution satisfies
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\bar{d} \ge d - k.$$
\end{document}
Proof. By substituting the values
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\bar{y}_{i , j}$$
\end{document}
for its rounded value, each summand in the sums of the inequalities in problem (3.3) increases at most by 1.
The result implies that for a small number k of representative vectors, the standard rounding solution is a very good approximate solution for the FSP when d is large in comparison with k (see Table 1).
Example 3.1
For the problem in Example 2.3, it holds k=4, v1=v2=v3=2, and v4=3. The ILP model (3.3) takes the form
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}{\rm max} \ d \\& {\rm s.t.} \\ & 10 - y_{1 , 1} - y_{1 , 2} - y_{1 , 3} - y_{1 , 4} \ge d , \\& 10 - y_{1 , 1} - y_{2 , 2} - y_{2 , 3} - y_{2 , 4} \ge d , \\& 10 - y_{2 , 1} - y_{1 , 2} - y_{2 , 3} - y_{3 , 4} \ge d , \\ & \qquad y_{1 , 1} + y_{2 , 1} = m_1 , \qquad\qquad\qquad (3.4) \\ & \qquad y_{1 , 2} + y_{2 , 2} = m_2 , \\ & \qquad y_{1 , 3} + y_{2 , 3} = m_3 , \\ & \qquad y_{1 , 4} + y_{2 , 4} + y_{3 , 4} = m_4 ,\end{align*}\end{document}
d, yi,j nonnegative integers,
and the multiplicities are
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( m_1 , \ldots , m_4 ) = ( 3 , 2 , 2 , 3 )$$
\end{document}
. The solution of the LP-relaxation is
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}y_{1 , 1} = 0 , \quad y_{1 , 2} = 0 , \quad y_{1 ,
3} = 2 , \quad y_{1 , 4} = 1. \bar{3} ,\end{align*}
\end{document}
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}y_{2 , 1} = 3 , \quad y_{2 , 2} = 2 , \quad y_{2 ,
3} = 0 , \quad y_{2 , 4} = 1. \bar{3} , \tag{3.5}\end{align*}
\end{document}
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
\qquad \enspace \ y_{3 , 4} = 0. \bar{3}\end{align*}
\end{document}
with optimal value
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$d = 6. \bar{6}$$
\end{document}
. One of the standard rounding solutions is therefore given by y1,4=1, y2,4=1, and y3,4=1 [where the other values in system (3.5) remain unchanged]. The corresponding objective value is
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$6 = \lfloor 6. \bar{6} \rfloor$$
\end{document}
; thus, an optimal solution of the integer problem (3.4) is encountered.
The optimal solution corresponding to the string matrix
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\overline{T}$$
\end{document}
in Example 2.3 is therefore
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\tilde{t} = ( 2 , 2 , 2 \mid 2 , 2 \mid 1 , 1 \mid 1 , 2 , 3 )$$
\end{document}
, corresponding to the solution t=(1, 1, 2, 2, 2, 2, 1, 2, 3, 2) of T and to the solution s=(1, 4, 1, 1, 2, 4, 4, 3, 3, 1) of S.
Since problem (3.4) is a very specific case, we still consider the following somewhat more complex example.
Example 3.2
Consider the problem (3.3) with 6 binary sequences (n=6, ω=2). The representative matrix is
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}R = \left( \begin{matrix}r_{1 , 1} & \cdots & r_{1
, 31} \\ \vdots & & \vdots \\ r_{6 , 1} & \cdots & r_{6 ,
31}\end{matrix} \right) \tag{3.6}\end{align*}
\end{document}
where
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}\left( \begin{matrix}r_{1 , 1} & \cdots & r_{1 , 6}
\\ \vdots & & \vdots \\ r_{6 , 1} & \cdots & r_{6 , 6}\end{matrix}
\right) = \left( \begin{matrix}1 & 1 & 1 & 1 & 1 & 1 \\ 2 & 2 & 1
& 1 & 1 & 1 \\ 2 & 1 & 2 & 1 & 1 & 1 \\ 2 & 1 & 1 & 2 & 1 & 1 \\ 2
& 1 & 1 & 1 & 2 & 1 \\ 2 & 1 & 1 & 1 & 1 & 2\end{matrix}
\right)\end{align*}
\end{document}
corresponds to partitions into two parts of size 1 and 5, respectively;
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*} \left(\begin{matrix}r_{1 , 7} & \cdots & r_{1 , 21} \\
\vdots & & \vdots \\ r_{6 , 7} & \cdots & r_{6 ,
21}\end{matrix}\right) = \left( \begin{matrix}1 \quad 1 \quad 1 \quad 1 \quad 1 \quad
1 \quad 1 \quad 1 \quad 1 \quad 1 \quad 1 \quad 1 \quad 1 \quad 1 \quad 1 \\ 1 \quad 2 \quad 2 \quad 2 \quad 2 \quad 2 \quad 2
\quad 2 \quad 2 \quad 1 \quad 1 \quad 1 \quad 1 \quad 1 \quad 1 \\ 2 \quad 1 \quad 2 \quad 2 \quad 2 \quad 2 \quad 1 \quad 1 \quad
1 \quad 2 \quad 2 \quad 2 \quad 1 \quad 1 \quad 1 \\ 2 \quad 2 \quad 1 \quad 2 \quad 2 \quad 1 \quad 2 \quad 1 \quad 1 \quad 2
\quad 1 \quad 1 \quad 2 \quad 2 \quad 1 \\ 2 \quad 2 \quad 2 \quad 1 \quad 2 \quad 1 \quad 1 \quad 2 \quad 1 \quad 1 \quad 2 \quad
1 \quad 2 \quad 1 \quad 2 \\ 2 \quad 2 \quad 2 \quad 2 \quad 1 \quad 1 \quad 1 \quad 1 \quad 2 \quad 1 \quad 1 \quad 2 \quad 1
\quad 2 \quad 2\end{matrix} \right) \end{align*}
\end{document}
corresponds to partitions into two parts of size 2 and 4, respectively; and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}\left( \begin{matrix}r_{1 , 22} & \cdots & r_{1 ,
31} \\ \vdots & & \vdots \\ r_{6 , 22} & \cdots & r_{6 ,
31}\end{matrix} \right) = \left( \begin{matrix}1 & 1 & 1 & 1 & 1 &
1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 2 & 2 & 2 & 2 & 2 & 2 \\ 1 &
2 & 2 & 2 & 1 & 1 & 1 & 2 & 2 & 2 \\ 2 & 1 & 2 & 2 & 1 & 2 & 2 & 1
& 1 & 2 \\ 2 & 2 & 1 & 2 & 2 & 1 & 2 & 1 & 2 & 1 \\ 2 & 2 & 2 & 1
& 2 & 2 & 1 & 2 & 1 & 1\end{matrix} \right)\end{align*}
\end{document}
corresponds to partitions into two parts of size 3. Problem (3.3) has now the form
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}& {\rm max} \ d \\& {\rm s.t.} \\& y_{1 , 1} + y_{1 , 2} + y_{1 , 3} \ldots + y_{1 , 31} + d \le m , \\& y_{2 , 1} + y_{2 , 2} + y_{1 , 3} \ldots + y_{2 , 31} + d \le m , \\& y_{2 , 1} + y_{1 , 2} + y_{2 , 3} \ldots + y_{2 , 31} + d \le m , \\& y_{2 , 1} + y_{1 , 2} + y_{1 , 3} \ldots + y_{2 , 31} + d \le m , \qquad\qquad\qquad\qquad\qquad (3.7) \\& y_{2 , 1} + y_{1 , 2} + y_{1 , 3} \ldots + y_{1 , 31} + d \le m , \\& y_{2 , 1} + y_{1 , 2} + y_{1 , 3} \ldots + y_{1 , 31} + d \le m , \\& y_{1 , j} + y_{2 , j} = m_j \quad {\rm for} \ j = 1 , \ldots , 31 ,\end{align*} \end{document}
d, yi,j nonnegative integers,
where
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$m = m_1 + \ldots + m_{31}$$
\end{document}
, k=S(6, 2)=31 (S(n, k) denote the Stirling numbers of the second kind, and the first index of the yi,j is the corresponding element in the matrix (3.6). In the present binary case, one can reduce the number of variables further by substituting y2,j for mj – y1,j and setting yj:=y1,j. One can verify that this yields the problem
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}&{\rm max} \ d \\& {\rm s.t.} \\& \quad Q \left( \begin{matrix}y_1 \\ \vdots \\ y_{31}\end{matrix} \right) \le \left( \begin{matrix}M_{1}-d \\ & \vdots & \\ M_{6}-d\end{matrix} \right) \tag{3.8}\end{align*}
\end{document}
yj ≤ mj for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$j = 1 , \ldots , 31$$
\end{document}
d, yj nonnegative integers,
where the 6 × 31 matrix Q=(qi,j) has entries 1 and −1 such that qi,j=1 if ri,j=1and qi,j=−1 if
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$r_{i , j} = 2 ( i = 1 , \ldots , 6 , j = 1 , \ldots , 31 )$$
\end{document}
. The constants Mi are defined by
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$M_i = \sum\nolimits_{j:q_{i , j} = 1}m_j$$
\end{document}
for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$i = 1 , \ldots , 6$$
\end{document}
.
Consider, for example, the randomly generated parameter values
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( m_1 , \ldots , m_{31} )$$
\end{document}
=(633, 291, 813, 91, 889, 501, 375, 265, 218, 599, 326, 259, 128, 572, 515, 295, 359, 195, 459, 315, 215, 651, 143, 141, 378, 194, 604, 215, 230, 148, 246), summing up to m=11,263. The LP-relaxation has the solution (y10, y14, y22, y24, y25, y27, y29, y31)=(41.16667, 14.33333, 290.1667, 141, 250.1667, 604, 79.66667, 246) (where only nonzero values are listed) with optimal value 7534.833. The corresponding standard rounding solution (y10, y14, y22, y24, y25, y27, y29, y31)=(41, 14, 290, 141, 250, 604, 80, 246) has the objective value 7534 and is therefore optimal for problem (3.8).
Note that a nontrivial FSP with 6 “long” sequences has been solved without needing any technique to ensure the integrality requirements.
The model (3.3) has
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$1 + \sum\nolimits_{j = 1}^k v_j$$\end{document}
variables and k + n linear constraints, independent of the length m of the sequences. Therefore, its size is generally much smaller than that of the conventional models of Festa and Pardalos (2012, Sec. 1.2) and Meneses et al. (2005, sec. 4.2), where the size increases linearly with the length of the sequences. In particular for an FSP with 3 sequences (see problem (3.4)), we get k=4, v1=v2=v3=2, v4=3, and the model has 10 variables and 7 linear restrictions, whatever the length of the sequences may be. For example, the standard rounding solution is the exact solution of problem (3.4) for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$m_1 = \ldots = m_4 = 500$$
\end{document}
corresponding to the sequence length m=2000. But the just mentioned models would require several thousands of restrictions and binary variables to solve an FSP with 3 sequences of that length.
If some possible representative vectors do not occur in the normalized problem, the model size can be even further reduced. For example, if the third column group in problem (3.4) is empty, it follows that m3=0. Thus, y1,3=y2,3=0; that is, one can eliminate these variables and the third equation from problem (3.4).
We now determine the standard rounding solution for test examples with randomly generated multiplicities. It turns out that the corresponding objective value is always very close to the optimal value of the FSP. In fact, the maximal possible error observed in the 40 test examples of Table 1 is 3. The standard rounding solution can be easily determined, since always only a small fraction of the optimal values of the LP-relaxation is noninteger.
For each test instance the following information is provided:
n: number of sequences
ω: alphabet size
k: number of representative vectors
m: sequence length
N: number of noninteger solution values
drelax: optimal value of the LP-relaxation
dsrs: objective value of the standard rounding solution
maximum error:
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\lfloor d_{relax} \rfloor - d_{{\rm srs}}$$
\end{document}
From the examples in Table 1 it becomes clear that the FSP for a sequence length up to about 1 million can be easily solved with model (3.3) for small values of n and ω. So, the actually observed objective value of the standard rounding solution is usually much better than the lower limit given by Proposition 3.1. This is an encouraging result, since finding an approximate solution for string selection problems is considered a very difficult task; see Boucher et al. (2012, p. 1).
In 22 of 40 cases the standard rounding solution is in fact the exact solution of the FSP. In the remaining cases the maximum possible error is 3. In these cases the verification of optimality by means of a Branch-and-Bound method may require a high computational effort. In particular, the integer solver of LINGO could not solve the ILP for example no. 10 of Table 1 in about 15 million iterations. However, an absolute error of only 3 is surely negligible for practical applications with very long sequences.
3.2. Extended feasible solution set
We now assume that the feasible solution set is Equation (3.2); that is, in determining the farthest string, any position can be occupied by any element of the set V. We assume that V=Ω; that is, all characters appear at least once in the string matrix (2.1). Otherwise, there exists an
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\omega_0 \in \Omega \backslash V$$
\end{document}
such that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$( \omega_0 , \ldots , \omega_0 ) \in \Omega^m$$
\end{document}
is an optimal solution of the FSP. Furthermore, we assume that the first r columns of the matrix (2.1) are complete, while the others are incomplete. In the construction of an optimal sequence
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$t = ( t_1 , \ldots , t_m )$$
\end{document}
, one can set
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}t_i = \omega_i \quad {\rm for} \ i = r + 1 ,
\ldots , m \tag{3.9}\end{align*}
\end{document}
where
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\omega_i \in V / V_i$$
\end{document}
is a character not occurring in the ith column of matrix (2.1). Thus, the incomplete columns can be exempted from further consideration. The problem simplifies considerably, when the proportion of incomplete columns is large (see sec. 4). In particular, any FSP with ω > n is trivial, since all columns of matrix (2.1) are incomplete in this case and an optimal solution is given by Equation (3.9). This observation might be interesting for problems with sequences over a large alphabet that may degrade the performance of a string selection algorithm considerably; see Kuksa and Pavlovic (2009).
We now solve the FSP for ω ≤ n. Let S be the string matrix of an FST after deletion of incomplete columns. Then, there exist k=S(n, ω) representative vectors, corresponding to the partitions of the set
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\{1 , \ldots , n \} $$
\end{document}
into ω components. We now obtain the model
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}& {\rm max} \ d \\& {\rm s.t.} \\& m - \sum_{j = 1}^ky_{r_{i , j} , j} \geq d \quad {\rm for} \ i = 1 , \ldots , n , \qquad\qquad\qquad (3.10) \\& \quad \sum_{i = 1}^ \omega y _{i , j} = m_j \quad {\rm for} \ j = 1 , \ldots , k ,\end{align*}
\end{document}
d, yi,j nonnegative integers,
which is a special case of problem (3.3). Now the numbers of variables and linear constraints are given by 1 + kω and n + k, respectively. These numbers are calculated in Table 2 for some values of n and ω.
The case ω=n is of particular interest, and then the standard rounding solution is always optimal. Assume that incomplete columns have been removed from the matrix (2.1); that is, all remaining columns are permutations of the column vector (
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$1 , \ldots , n$$
\end{document}
). Thus, all these columns are isomorphic and one obtains the unique representation vector (
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$1 , \ldots , n$$\end{document}
). Problem (3.10) now takes the form
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}& {\rm max} \ d \\& {\rm s.t.} \\& m - y_1 \ge d , \\& \qquad\qquad \vdots \ \vdots \qquad\qquad\qquad(3.11)\\& m - y_n \ge d, \\& y_1 + \ldots + y_n = m ,\end{align*} \end{document}
d, yi nonnegative integers,
where the variables yi,1 have been renamed as yi.
Proposition 3.2
An optimal solution of problem (3.11) is given by
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}y_1 = \ldots = y_n = \frac {m} {n} \tag {3.12} \end{align*}
\end{document}
if
m/n
is an integer. Otherwise, the standard rounding solution is optimal.
Proof. Evidently, the LP-relaxation of problem (3.11) has the optimal solution (3.12) with corresponding optimal value
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$d = m - \frac {m} {n} $$
\end{document}
. If
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$ \frac {m} {n} $$
\end{document}
is not an integer, the standard rounding solution has the objective value
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}$$m - \bigg \lceil \frac {m} {n} \bigg \rceil = \bigg \lfloor m - \frac {m} {n} \bigg \rfloor$$\end{document}
. ■
Example 3.3
Let
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}S = \left( \begin{matrix}1 & 3 & 1 & 3 & 1 & 2 & 1
& 3 & 4 \\ 2 & 4 & 3 & 4 & 2 & 1 & 2 & 2 & 3 \\ 3 & 1 & 2 & 2 & 3
& 4 & 4 & 1 & 2 \\ 4 & 2 & 4 & 1 & 4 & 3 & 3 & 4 & 1\end{matrix}
\right) \tag{3.13}\end{align*}
\end{document}
be the string matrix of an FSP (after deletion of incomplete columns) with n=4 and m=9. The normalized problem has the matrix
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}T = \left( \begin{matrix}1 & \cdots & 1 \\ 2 &
\cdots & 2 \\ 3 & \cdots & 3 \\ 4 & \cdots & 4\end{matrix} \right)
\tag{3.14}\end{align*}
\end{document}
with 9 identical columns. The LP solution of problem (3.11) is
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$y_1 = \ldots = y_4 = 2.25$$
\end{document}
, yielding the standard rounding solution y1=y2=y3=2 and y4=3. Thus, an optimal solution
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$t = ( t_1 , \ldots , t_9 )$$
\end{document}
of the FSP given by Equation (3.14) contains the numbers 1, 2, 3, and 4 with frequencies 2, 2, 2, and 3, respectively. In particular, t=(1, 1, 2, 2, 3, 3, 4, 4, 4) is an optimal solution that corresponds to the solution (1, 3, 3, 4, 3, 4, 3, 4, 1) of matrix (3.13).
3.3. Maximizing the sum of distances
We now consider another variant of the FST where the objective is now to maximize the sum of distances instead of the minimal distance. This yields the problem
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}\max \sum_{i = 1}^n d ( s^i , t ) \tag{3.15}\end{align*}
\end{document}
where t is chosen from X given by Equation (3.1) or (3.2). This problem, studied in Cheng et al. (2004, Sec. 3), is incomparably simpler than the maximin problem considered so far. Because of its simplicity, there is no normalization necessary. The problem (3.15) can be decomposed into subproblems as follows. For given sequences
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s^i = ( s_1^i , \ldots , s_m^i )$$
\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$t = ( t_1 , \ldots , t_m )$$
\end{document}
, we define
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$d_i^{( j )} = 0$$
\end{document}
if
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s_j^i = t_j$$
\end{document}
and
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$d_i^{( j )} =$$
\end{document}
otherwise. The Hamming distance between si and t can then be written as
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}d ( s^i , t ) = \sum_{j = 1}^m d_i^{( j )} \tag{3.16}\end{align*}
\end{document}
and the sum (3.15) takes the form
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}\max \sum_{i = 1}^n \sum_{j = 1}^m d_i^{( j )} =
\max \sum_{j = 1}^m \sum_{i = 1}^nd_i^{( j )}.
\tag{3.17}\end{align*}
\end{document}
Now the expressions
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\sum\nolimits_{i = 1}^n d_i^{( j )}$$
\end{document}
can be maximized independently from each other, and the overall solution sequence of Equation (3.17) is composed of these individual solutions. Obviously it holds
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}\sum_{i = 1}^nd_i^{( j )} = n - f ( t_j ) , \tag{3.18}\end{align*}
\end{document}
where f (tj) denotes the frequency of occurrence of the character tj in the jth column of the string matrix S. Thus, Equation (3.18) is maximized by setting tj equal to one of the rarest elements in the jth column of S. In particular, if this column is incomplete, one sets tj equal to a character that does not occur in this column, corresponding to f (tj)=0.
It is now clear that the FSP (3.15) can be solved easily in linear time in the size of the string matrix.
Example 3.4
Let
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}S = \left( \begin{matrix}1 & 1 & 2 & 1 & 3 & 2 \\ 2
& 4 & 3 & 1 & 2 & 4 \\ 3 & 2 & 1 & 3 & 3 & 4 \\ 3 & 1 & 2 & 2 & 4
& 4 \\ 3 & 4 & 4 & 3 & 4 & 3\end{matrix} \right)
\tag{3.19}\end{align*}
\end{document}
be the string matrix of an FSP with ω=4. If the feasible solution set is defined by Equation (3.1), an optimal solution of the FSP is given by (1, 2, 1, 2, 2, 3). In case of the definition (3.2), the sequence (4, 3, 4, 4, 1, 1) represents an optimal solution.
3.4. Far from most strings problem
We still consider a variant of the FSP, called the FFMSP:
Given a set
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$${\Sigma} = \{s^1 , \ldots , s^n \} $$
\end{document}
of n sequences with
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$s^i = ( s^i_1 , \ldots , s^i_m ) \in \Omega^m$$
\end{document}
for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$i = 1 , \ldots , n$$
\end{document}
and a threshold d0 > 0. The goal is to find a sequence t from a feasible set X, maximizing the number of strings si in Σ such that
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$d ( s^i , t ) \geq d_0.$$
\end{document}
The set X can be defined by Equation (3.1) or (3.2), but we will restrict ourselves to the second case. By modifying problem (3.10) we obtain the following model for the FFMSP. For
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$i = 1 , \ldots , n$$
\end{document}
we introduce the binary variable zi, which equals 1 if the string si belongs to the “active” strings, that is, the strings that are considered in the distance maximization [otherwise zi=0; see, e.g., Meneses et al. (2005, p. 4) for a similar reasoning]. We get
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}& \max \ z_1 + \ldots + z_n \\& {\rm s.t.} \\& m - z_i \sum_{j = 1}^k y_{r_{i , j} , j} \geq d_0 \quad {\rm for} \ i = 1 , \ldots , n , \qquad\qquad\qquad\qquad\qquad(3.20) \\& \quad \sum_{i = 1}^ \omega y_{i , j} = m_j \quad {\rm for} \ j = 1 , \ldots , k ,\end{align*} \end{document}
yi,j nonnegative integers,
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*} z_i \in \{0 , 1 \} .\end{align*}
\end{document}
The optimal value of problem (3.20) is the maximum number of strings having distance at least d0 from t.
Example 3.5
Consider the case n=4 and ω=3. The model (3.20) takes the form
max z1 + z3 + z3 + z4
s.t.
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
\begin{align*}& m - z_1 ( y_{1 , 1} + y_{1 , 2} + y_{1 , 3} + y_{1 , 4} + y_{1 , 5} + y_{1 , 6} ) \geq d_0 , \\& m - z_2 ( y_{1 , 1} + y_{2 , 2} + y_{2 , 3} + y_{2 , 4} + y_{2 , 5} + y_{2 , 6} ) \geq d_0 , \\& m - z_3 ( y_{2 , 1} + y_{1 , 2} + y_{3 , 3} + y_{2 , 4} + y_{3 , 5} + y_{3 , 6} ) \geq d_0 , \qquad\qquad\qquad\qquad\qquad (3.21) \\& m - z_4 ( y_{3 , 1} + y_{3 , 2} + y_{1 , 3} + y_{3 , 4} + y_{2 , 5} + y_{3 , 6} ) \geq d_0 , \\& \quad y_{1 , j} + y_{2 , j} + y_{3 , j} = m_j \quad {\rm for} \ j = 1 , \ldots , 6 , \\ & y_{i , j} \ \hbox{nonnegative integers} , \\ & z_i \in \{0 , 1 \} \quad {\rm for} \ i = 1 , \ldots , 4.\end{align*}\end{document}
In order to solve the nonlinear integer programming problem (3.20), we proceeded as follows: In a first phase, the relaxation is solved, obtained by omitting the integer conditions. In a second phase, we solve the modification of problem (3.20), obtained as follows: we substitute the condition
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$z_i \in \{0 , 1 \} $$
\end{document}
by zi=0, if zi was smaller than 1 in the first phase solution, and by zi=1 otherwise (
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$i = 1 , \ldots , n$$
\end{document}
).
Example 3.6
Assume that the multiplicities in problem (3.21) are (
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$m_1 , \ldots , m_6$$
\end{document}
)=(796, 974, 397, 998, 789, 999), implying m=4953. For the arbitrarily chosen threshold d0=4000, the relaxation of problem (3.21) has the solution
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}& ( z_1 , z_2 , z_3 , z_4 ) = ( 0.46 , 1 , 1 , 1 ) , \qquad\qquad\qquad\qquad\qquad (3.22) \\& \left( \begin{matrix}y_{1 , 1} & \cdots & y_{1 , 6} \\ \vdots & & \vdots \\ y_{3 , 1} & \cdots & y_{3 , 6}\end{matrix} \right) = \left( \begin{matrix}0 & 0 & 0 & 692.68 & 575.85 & 825.29 \\ 739.85 & 382.29 & 397 & 0 & 0 & 173.71 \\ 56.15 & 591.71 & 0 & 305.14 & 213.15 & 0\end{matrix} \right)\end{align*} \end{document}
with corresponding optimal value 3.46. We modify problem (3.21) by substituting the conditions
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$z_i \in \{0 , 1 \} $$
\end{document}
for
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$i = 1 , \ldots , 4$$
\end{document}
by z1=0, z2=1, z3=1, and z4=1. The resulting linear program has the solution
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}\begin{align*}& z_1 = 0 , z_2 = 1 , z_3 = 1 , z_4 = 1 , \qquad\qquad\qquad\qquad\qquad(3.23) \\& \left( \begin{matrix}y_{1 , 1} & \cdots & y_{1 , 6} \\ \vdots & & \vdots \\ y_{3 , 1} & \cdots & y_{3 , 6}\end{matrix} \right) = \left( \begin{matrix}240 & 0 & 0 & 998 & 789 & 999 \\ 556 & 21 & 0 & 0 & 0 & 0 \\ 0 & 953 & 397 & 0 & 0 & 0\end{matrix} \right)\end{align*}
\end{document}
with objective value 3.We emphasize that the integer values of the yi,j occurred “automatically” by solving the linear problem (no measures were taken to assure integrality); Equation (3.23) also solves the integer problem (3.21) since the first phase solution was optimal for the relaxation.
In Table 3 the above simple heuristic has been applied to 30 test problems with different numbers n, m, and ω. The multiplicities mj in problem (3.20) have been chosen at random and the threshold d0 has been determined arbitrarily. Since the relaxation of problem (3.20) is not a convex optimization problem, the solution may be only a local optimum, depending on the starting values and solution procedure applied by the used software. In our test problems performed by the software LINGO, global optimality was achieved in 90% of the instances. It is very interesting to observe that in all test runs of Table 3 the second-phase solution has exclusively integer values. As this table shows, the objective value of the second phase was always
\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document}
$$\lfloor z \rfloor$$
\end{document}
, where z denotes the objective value of the first phase. Thus, when the first phase terminated with a global optimum, the second phase provided a global optimal solution for the FFMSP (3.20). This is an encouraging result, since the FFMSP is considered one of the most difficult sequence consensus problems; see, for example, Ferone et al. (2013, p. 2).