# The Algorithm¶

The package Augmented Block Cimmino Distributed Solver (ABCD Solver) is a distributed hybrid (iterative/direct) solver for sparse linear systems $$Ax = b$$ where $$A$$ is a double precision matrix $$m \times n$$, $$x$$ an $$n$$-vector and $$b$$ an $$m$$-vector. ABCD Solver uses two methods to solve the linear system:

• Regular Block Cimmino: A block-projection technique that iterates to approximate the solution of the linear system. During the iterations it solves a set of small problems (augmented systems built using partitions of the original system).
• Augmented Block Cimmino: A pseudo-direct solver that augments the original system, based on the given partitions, and constructs the solution directly from independent solves using the augmented subsystems.

## The regular block Cimmino¶

The block Cimmino method is an iterative method that uses block-row projections. To solve $$Ax = b$$, where $$A$$ is an $$m\times n$$ sparse matrix, $$x$$ is an $$n$$-vector and $$b$$ is an $$m$$-vector, we subdivide the system into strips of rows as follows:

$\begin{eqnarray*} \left( \begin{array}{c} A_1 \\ A_2 \\ \vdots \\ A_p \end{array} \right) ~x &=& \left( \begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_p \end{array} \right) \end{eqnarray*}.$

Let $$P_{\mathcal{R}(A_i^T)}$$ be the projector onto the range of $$A_i^T$$ and $${A_i}^+$$ be the Moore-Penrose pseudo-inverse of the partition $$A_i$$. The block Cimmino algorithm then computes a solution iteratively from an initial estimate $$x^{(0)}$$, viz:

$\begin{eqnarray*} \begin{array}{ccl} u_{i} & = & A_i^+ \left ( {b_i - A_i x^{(k)}} \right ) ~~~ i = 1, .... p \\ x^{(k+1)} & = & x^{(k)} + \omega \sum_{i=1}^p{u_i} \end{array} \end{eqnarray*}$

where we can see the independence of the set of $$p$$ equations, which is why the method is so attractive in a parallel environment.

With the above notations, the iteration equation is thus:

$\begin{eqnarray*} x^{(k+1)} & = & x^{(k)} + \omega \sum_{i=1}^p{A_i^+ \left ( {b_i - A_i x^{(k)}} \right )} \\ & = & \left( {I - \omega \sum_{i=1}^p{A_i^+ A_i}} \right) x^{(k)} + \omega \sum_{i=1}^p{A_i^+ b_i} \\ & = & Q x^{(k)} + \omega \sum_{i=1}^p{A_i^+ b_i}. \label{something} \end{eqnarray*}$

The iteration matrix for the block Cimmino method is $$H = I - Q$$, which corresponds to a sum of projectors $$H = \omega \sum_{i=1}^p{\mathcal{P}_{\mathcal{R}(A_i^T)}}$$. It is thus symmetric and positive definite and so we can solve

$H x ~=~ \xi,$

where $$\xi = \omega \sum_{i=1}^p{A_i^+ b_i}$$ using conjugate gradient or block conjugate gradient methods. As the relaxation scalar $$\omega$$ appears on both sides of the equation, we can set it to one.

At each step of the conjugate gradient algorithm we must solve for the $$p$$ projections viz.

$$$A_i u_i ~=~ r_i, ~~~~ (r_i = {b_i - A_i x^{(k)}}),~~~ i = 1, .... p.$$$

In our implementation we choose to solve these equations using the augmented system

$\begin{eqnarray*} \left ( \begin{array}{cc} I & A_i^T \\ A_i & 0 \end{array} \right ) \left ( \begin{array}{l} u_i \\ v_i \end{array} \right ) &=& \left ( \begin{array}{l} 0 \\ r_i \end{array} \right ) \end{eqnarray*}$

that we solve using a direct method, at each iteration to get $$u_i = A_i^+ r_i$$, the projection needed for each partition $$A_i$$. We use the multifrontal parallel solver $$MUMPS$$ to do direct solutions.

Running our solver in the regular mode will go through the following steps:

• Partition the system into strips of rows ($$A_i$$ and $$b_i$$ for $$i = 1, \dots p$$)
• Create the augmented systems
• Analyse and factorize the augmented systems using the direct solver $$MUMPS$$
• Run a block conjugate gradient with an implicit iteration matrix $$H$$, which requires $$p$$ independent augmented system direct solves at each iteration.

## The augmented block Cimmino¶

To understand the augmented block Cimmino algorithm, suppose that we have a matrix $$A$$ with three partitions, described as follows:

$$$A = \left[ \begin{array}{cccccc} A_{1,1} & A_{1,2} &&&& A_{1,3}\\ & A_{2,1} & A_{2,2} & A_{2,3} & \\ &&& A_{3,2} & A_{3,3} & A_{3,1} \end{array} \right],$$$

where $$A_{i,j}$$ is the sub-part of $$A_i$$, the $$i$$-th partition, that is interconnected algebraically to the partition $$A_j$$, and vice versa.

The goal of the augmented block Cimmino algorithm is to make these three partitions mutually orthogonal to each other, meaning that the inner product of each pair of partitions is zero. We consider two different ways to augment the matrix to obtain these zero matrix inner products.

• The first way to augment the matrix to make all the partitions mutually orthogonal to each other is obtained by putting the product $$C_{ij} = A_{ij}A_{ji}^T$$ on the right of the partition $$A_i$$ and adding $$-I$$ on the right of $$A_j$$ viz.
$$$\bar{A} = \left[ \begin{array}{cccccc|ccc} A_{1,1} & A_{1,2} & & & & A_{1,3} & C_{1,2} & C_{1,3} & \\ & A_{2,1} & A_{2,2} & A_{2,3} & & & -I & & C_{2,3}\\ & & & A_{3,2} & A_{3,3} & A_{3,1} & & -I & -I \end{array}\right].$$$
• The second way is to repeat the submatrices $$A_{ij}$$ and

$$A_{ji}$$, reversing the signs of one of them to obtain the augmented matrix $$\bar{A}$$ as in the following

$$$\bar{A} = \left[ \begin{array}{cccccc|ccc} A_{1,1} & A_{1,2} & & & & A_{1,3} & A_{1,2} & A_{1,3} & \\ & A_{2,1} & A_{2,2} & A_{2,3} & & & -A_{2,1} & & A_{2,3}\\ & & & A_{3,2} & A_{3,3} & A_{3,1} & & -A_{3,1}& -A_{3,2} \end{array}\right].$$$

Both ways make $$\bar{A}_i\bar{A}_j^T$$ zero for any pair $$i/j$$, and so the new matrix has mutually orthogonal partitions.

Notice that we augment the matrix from top to bottom and use new columns for the augmentation at each step. This is done so that we do not create any new interconnections between the resulting partitions.

Running our solver in the augmented block Cimmino mode will go through the following steps:

• Partition the system into strips of rows ($$A_i$$ and $$b_i$$ for $$i = 1, \dots p$$)
• Augment the different partitions according to the selected algorithm
• Create the augmented systems
• Analyse and factorize the augmented systems using the direct solver $$MUMPS$$
• Build an auxiliary matrix $$S$$ in parallel and use it to solve a reduced linear system. The result is then used to obtain the solution for the original linear system $$Ax = b$$.

For the last step, please check the presentation http://zenadi.com/thesis_def.pdf (slides 34 to 55) for more details.