Using the solver¶

Instantiating the solver¶

To use the solver, the user has to instantiate the class abcd for C++. In the case of C, the user creates a structure object using the function new_solver(). During the construction of the instance, the control parameters are initialized to their default value, see the controls description for the list of the control parameters and their default value.

For C++, the class is defined in the header file abcd.h

#include "abcd.h"
//...
abcd obj; // instantiating the class


For C, the solver is a type abcd_c that is based on a structure called struct abcd and is defined in the header file abcd_c.h:

#include "abcd_c.h"
//...
abcd_c *obj = new_solver(); // create a new solver


Input matrix and right-hand side¶

The current version of the ABCD Solver accepts only real, centralized, linear systems. The definition of the linear system uses the following information:

class abcd
Public Members
int m

The number of rows in the matrix

int n

The number of columns in the matrix

int nz

The number of entries in the matrix

bool sym

The symmetry of the matrix

int * irn

The row indices of size nz

int * jcn

The column indices of size nz

int start_index

Defines wether it’s Fortran-Style (1) or C-Style (0, default)

double * val

The entries of the matrix of size nz

int nrhs

The number of right-hand sides to solve, default is 1

double * rhs

The right-hand side of size m * nrhs

By default the values of the matrix are zero based, meaning that 0 corresponds to the first row in irn and same thing goes for jcn. If you want to give 1-based arrays (a la Fortran), you have to set start_index to 1.

To initialize the linear system in C++:

// Create an object for each mpi-process
abcd obj;
obj.n = 7;
obj.m = 7;
obj.nz = 15;
obj.sym = false;
if (world.rank() == 0) { // only the master is required to provide the matrix
// allocate the arrays
obj.irn = new int[obj.nz]
// put the data in the arrays
obj.irn[0] = 0;
//..
}


In C:

// Create an object for each mpi-process
abcd_c *obj = new_solver();

obj->n = 7;
obj->m = 7;
obj->nz = 15;
obj->sym = 0;

MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

if (myrank == 0) { // only the master is required to provide the matrix
// allocate the arrays
obj->irn = (int*) malloc(sizeof(int)*(obj->nz));
// put the data in the arrays
obj->irn[0] = 0;
//..
}


Calling the solver¶

During the construction of the solver object, the default parameters are initialized. The user can then call the object as a function (functor) with the job number as an argument.

abcd obj; // instantiating the class
obj(job_id); // call the solver with a job identifier job_id


For C, the solver is a structure called struct abcd:

abcd_c *obj = new_solver(); // create a new solver
call_solver(obj, job_id);


job_id defines which operation the solver has to run. It can have values -1, and 1 through 6. The order in which these jobs have to be called is described in Job dependencies figure.

class abcd
Public Functions
int operator()(int job_id)

The gateway function that launches all other options

 Run an operation identified by the value of job_id, it can be
either:
- -1, initializes the internal matrix used by the
solver. Prior to this call, the user must provide:
* The information about the matrix. #m, #n, #nz,
#sym, #irn, #jcn and #val have to be initialized before the
call.
* After the call, the arrays #irn, #jcn and #val
are no longer used by the solver and can be freely deallocated.
- 1, performs the preprocessing. During this call, the solver
scales the matrix, partitions it and, if requested by the user,
performs the augmentation of the matrix. Prior to this call, the
user must provide:

* The number of partitions to create (see nbparts )
or ask the solver to guess the appropriate number of
partitions (see part_guess)
* The type of scaling to perform (see scaling)
* The type of augmentation to perform (see aug_type)
- 2, creates the augmented systems, analyses them, creates
the mapping between the different mpi-processes and
factorizes the augmented systems.
- 3, performs the solution step, the right-hand sides and
their number are required prior to this call.
* The right-hand sides have to be given through the array
rhs[] and their number in #nrhs.
* The block-size to be used during the block-CG
acceleration. Its value is used only during the regular
block Cimmino solve, and by default its value is 1.
* The solution is centralized (on the master) in the array
sol[].
- 4, combines the call to the phases 1 and 2.
- 5, combines the call to the phases 2 and 3.
- 6, combines the call to the phases 1, 2 and 3.


The Controls¶

Define the general behaviour of the solver. They are split into two arrays, icntl and dcntl. icntl is an integer array and defines the options that control the specific parts of the solver, such as the scaling, the type of algorithm to run and so on. dcntl is a double precision array and defines some of the options required by the algorithms we use such as the imbalance between the partition sizes and the stopping criteria of the solver.

To access each of the control options we use the enums defined in the header defaults.h for C++ and in abcd_c.h for C. To access them, the user can use the namespace Controls, eg. Controls::scaling handle the scaling of the linear system. In the following we omit the Controls:: part for simplicity. Moreover, due to the similarities between the C++ code and the C one, we provide only the C++ snippets unless there is a major difference.

The integer control array¶

icontrols enum

To be used with the abcd::icntl vector.

Values:

• nbparts -

Number of partitions.

Defines the number of partitions in our linear system, can be from 1 to m (the number of rows in the matrix)

// we have 8 partitions
obj.icntl[nbparts] = 8;


• part_type -

Partitioning strategy.

Defines the partitioning strategy, it can have the values:

• 1, Manual partitioning, the nbparts partitions can be provided in the STL vector nbrows. Example:
// use manual partitioning
obj.icntl[part_type] = 1;
// say that we want 20 rows per partition
obj.nrows.assign(obj.icntl[nbparts], 20);

// or
obj.nrows.resize(obj.icntl[nbparts]);
obj.nrows[0] = 20;
obj.nrows[1] = 20;
//...


For C, the nrows vector is an int array:

// use manual partitioning
obj->icntl[part_type] = 1;

obj->nrows =  (int ) malloc(sizeof(int)*(obj->icntl[nbparts]));

obj->nrows[0] = 20;
obj->nrows[1] = 20;
//...


• 2, (default) Automatic uniform partitioning, creates nbparts partitions of similar size.
// use patoh partitioning
obj.icntl[part_type] = 2;


• 3, Automatic hypergraph partitioning, creates nbparts partitions using the hypergraph partitioner PaToH. The imbalance between the partitions is handled using obj.dcntl[part_imbalance]. Example:
// use patoh partitioning
obj.icntl[part_type] = 3;
// say that we want an imbalance factor up to 30% between the partitions
obj.dcntl[part_imbalance] = 0.3;


• part_guess -

Guess the number of partitions.

Asks the solver to guess the appropriate number of partitions and overrides the defined nbparts.

• 0 (default), The user has to provide the number of partitions by setting icntl[nbparts]
• 1, Guess the number of partitions by trying to create a small number of partitions while keeping them small enough to be handled easily by the direct solver

• scaling -

The scaling type.

Defines the type of scaling to be used.

• 0, no scaling
• 1, scale the input matrix (embeded internal scaling strategy)

• itmax -

The max number of iterations.

Defines the maximum number of iterations in block-CG acceleration, default is 1000 iterations.

• block_size -

Block-CG block-size.

Defines the block-size to be used by the block-CG acceleration, default is 1 for classical CG acceleration. When using a higher value than 1, stabilized Block-CG is used.

• verbose_level -

The verbose level.

Defines the verbosity of the solver

• aug_type -

The augmentation type.

Possible values are:

• 0 (default), no augmentation. This makes the solver run in regular block Cimmino mode.

• 1, makes the solver run in Augmented Block Cimmino mode with an augmentation of the matrix using the $$C_{ij}$$ augmentation strategy. For numerical stability, this augmentation technique has to be used with a scaling.

• 2, makes the solver run in Augmented Block Cimmino mode with an augmentation of the matrix using the $$A_{ij}$$ augmentation starategy.

• aug_blocking -

The blocking factor in ABCD.

Defines the blocking factor used by the solver during the solution phase, its default value is 128. This allows the solver to take advantage of BLAS3 Kernels. The optimal value is hardware and problem size related. The user has also to find a compromise between memory and computing time.

The double precision control array¶

dcontrols enum

Values:

• part_imbalance -

The imbalance factor in PaToH case.

• threshold -

The stoping threshold.

• dcntl[part_imbalance] or obj.dcntl[1] defines the imbalance between the partitions when using PaToH (icntl[part_imbalance] = 3).
• obj.dcntl[threshold] or dcntl[2] defines the stopping threshold for the block-CG acceleration, default is 1e-12.

A usage example (C++)¶

Combining the previous options, we expose a basic example that uses the regular block Cimmino scheme, we comment the interesting parts and explain how they fit together. Refer to [Calling a job], [The linear system], and [The Controls] for more details.

// A simple working example of
// =======================

// We include the ABCD solver header file, it contains the abcd class definition
#include "abcd.h"

// Use boost::mpi for simplicity, the user can use which ever MPI library he wants
#include "mpi.h"
#include <boost/mpi.hpp>

// A simple matrix generator for a regular 2D mesh + 5-point stencil
void init_2d_lap(int m, int n, int nz, int *irn, int *jcn, double *val, int mesh_size);
void init_2d_lap(abcd &o, int mesh_size);

int main(int argc, char* argv[])
{
// Equivalent to MPI_Initialize
mpi::environment env(argc, argv);

// obtain the WORLD communicator, by default the solver uses it
mpi::communicator world;

// create one instance of the abcd solver per mpi-process
abcd obj;

if(obj.comm.rank() == 0) { // the master

// we want that only the master logs data
obj.icntl[Controls::verbose_level] = 2;

// scaling
obj.icntl[Controls::scaling] = 2;

init_2d_lap(obj, 200);

// set the rhs
obj.rhs = new double[obj.m * obj.nrhs];

for (size_t j = 0; j < obj.nrhs; ++j)
for (size_t i = 0; i < obj.m; ++i) {
obj.rhs[i + j * obj.m] = ((double) i * (j + 1))/obj.m;
}
}

try {
// initialize the solver with matrix data
obj(-1);

// equivalent to running 1, 2 and 3 successively
// 1 -> Analyse, Scales, Partitions, etc.
// 2 -> Create augmented systems, distribute them, factorize them
// 3 -> Solve the linear system
obj(6);
// if(obj.comm.rank() == 0)
//     for (size_t i = 0; i < obj.n; ++i) {
//         for (size_t j = 0; j < obj.nrhs; ++j)
//             cout << obj.sol[i + j*obj.n] << '\t';
//         cout << endl;
//     }

} catch (runtime_error err) {
cout << "An error occured: " << err.what() << endl;
}

return 0;
}

void init_2d_lap(abcd &obj, int mesh_size)
{
obj.m = mesh_size*mesh_size; // number of rows
obj.n = obj.m; // number of columns
obj.nz = 3*obj.m - 2*mesh_size; // number of nnz in the lower-triangular part
obj.sym = true;
obj.start_index = 1;

// allocate the arrays
obj.irn = new int[obj.nz];
obj.jcn = new int[obj.nz];
obj.val = new double[obj.nz];

init_2d_lap(obj.m, obj.n, obj.nz, obj.irn, obj.jcn, obj.val, mesh_size);
}

void init_2d_lap(int m, int n, int nz, int *irn, int *jcn, double *val, int mesh_size)
{
int pos;
int i;

pos = 0;
for (i = 1; i <= m; i++) {

// the diagonal
irn[pos] = i;
jcn[pos] = i;
val[pos] = -4.0;

pos++;

if (i % mesh_size != 0) {
// the lower-triangular part
irn[pos] = i + 1;
jcn[pos] = i ;
val[pos] = 1.0;
pos++;
}

if (i + mesh_size > m) continue;
irn[pos] = i + mesh_size ;
jcn[pos] = i ;
val[pos] = 1.0;
pos++;
}
}