Hypre linear solvers scalability on our Beowulf cluster

MATH 7664 Iterative Methods in Numerical Linear Algebra Spring 2004

The main goal of the project is to test the scalability of Hypre PCG linear solvers on the UC Denver Beowulf cluster and to determine the size of the largest linear system our cluster is capable of solving using Hypre.

We want to test all available Hypre PCG linear solvers:
on all three Hypre installations that we have compiled for the previous class project:
In this project, we only test linear solvers, not eigensolvers, thus we need
to use the provided "ij_es" driver without specifying the "-lobpcg" option.
We will use 3-D 7-point Laplacian, which is built-in in the driver, as the linear system
matrix and the default choice of the righ-hand side vector, which is a vector with unit components. The size of the problem is determined by the
"-n <nx> <ny> <nz>" option, where  <nx>,<ny>, and <nz> are integer numbers specifying the number of mesh points for the 3-D Laplacian in x, y, and z directions, correspondingly.
E.g., "-n 100 100 100" option generates the 3-D Laplacian with 100 mesh points in each direction, thus the total size of the matrix in this case is 100*100*100=1,000,000.

For the scalability test, you need first determine the largest possible problem the particular combination of the solver and the installation is capable of solving on one node - this can be done interactively (make sure that you are the only one running jobs on this node). Second, you increase the number of nodes and the total problem size proportionally and record the CPU time needed to solve the problems. Make sure, of course, that you use both processors on every node. The results of the tests can be presented using pictures with bars, see p.18 as an example from the following set of slides:

Eleventh Copper Mountain Conference on Multigrid Methods, March 30 - APRIL 4, 2003:
Implementation of a Scalable Preconditioned Eigenvalue Solver Using Hypre - Andrew Knyazev and Merico E. Argentati ( slides)

The preliminary tests show that a 100*100*100=1,000,000 3-D 7-point Laplacian is a good starting point for the one node for all preconditioners. Let us use the default topology, e.g., without specifying the -P option, and let us increase the problem size in the y-direction proportionally to the increase of the number of nodes in the scalability tests, so that the 3-D 7-point Laplacian always has a 100*100*100=1,000,000 block residing on every node, no matter how many nodes are involved.
The conclusion needs to address the following questions:

Which installation is faster and how the answer depends on the number of nodes? We expect that mpich would be faster for a few nodes compared to scali, but with the increase of the number of nodes scali should eventually win over mpich.

What solver is most efficient and under which circumstances? Suspected winners might be 1=AMG-PCG and 2=DS-PCG. 

Here are preliminary results for the project report: Scalability plots