Hypre eigenvalue solvers scalability on our Beowulf cluster

MATH 7664 Iterative Methods in Numerical Linear Algebra Spring 2004

The main goal of the project is to test the scalability of Hypre PCG eigenvlue solvers on the UC Denver Beowulf cluster

Do not forget to read our Beowulf cluster Web pages!

Installing the latest alpha Hypre with precompiled LAPACK libraries

The install is similar, but not identical to that used in one of the previous projects since this time we want to use the precompiled system-wide BLAS and LAPACK libraries. Thus, you need to do a complete install from the scratch.

On Beowulf, use the "env" command to check if you run C or TC shells. If not, use "chsh" command to change your shell to "/bin/tcsh". If you want the shell change on beowulf to be permanent, change the shell on math.

Get the latest Hypre alpha in your beowulf account terminal window:

cd
cp ~aknyazev/hypre_03_19_04.tgz .
tar xzvf hypre_03_19_04.tgz

This creates a new directory, linear_solvers.
To configure gcc-scali install with the precompiled BLAS and LAPACK libraries, run in this directory:

./configure --with-blas="-L/usr/local/lib -lblas" --with-lapack="-llapack -lg2c"
make test

Installing the latest LOBPCG eigenvalue solver for Hypre

To test the most recent version of our eigensolver you need to install additionally in the linear_solvers directory:

mkdir es
cd es
cp ~aknyazev/lobpcg28apr04.tar.gz .
tar xzvf lobpcg28apr04.tar.gz
./build.sh

Check that there are no errors during the install. You will have a warning:

gcc: -lmpi: linker input file unused since linking not done

a few times - this is OK. The lobpcg test drivers: ij_lobpcg and struct_lobpcg are being compiled in the subdirectory test_lobpcg.

Testing the latest LOBPCG eigenvalue solver for Hypre

The lobpcg test drivers: ij_lobpcg and struct_lobpcg are in the subdirectory test_lobpcg and support most options of the original ij and struct drivers. Here is a list of most important options supported by the ij_lobpcg and struct_lobpcg drivers:

To run consistent comparisons, always set -tol 1e-8, see examples below.

Preconditioning in the lobpcg code is done by calling Hypre PCG linear solvers and preconditioners. We want to test in the lobpcg eigensolver all available Hypre PCG linear solvers that we have already tested on our Beowulf cluster for linear systems in the previous projects. Solvers to test in the ij_lobpcg driver:

Solvers to test in the struct_lobpcg driver: Here, the -pcgitr option specifies the number of inner iterations for every preconditioner. The optimal value of pcgitr that minimizes the total CPU time may be different for different problem sizes and preconditioners and is chosen here empirical based on numerical results from p.17 of a recent talk.

Attention: the input parameters and the defaults in ij and struct interface drivers are completely different in the present version of Hypre. Namely, in struct, the `-n' option allows one to specify the local problem size PER processor. The global problem size will be Px*nx by Py*ny by Pz*nz. Also, the defaut -P option in struct is different from that of ij, as well as the default tolerance and the max number of iterations.

To change the default max number of iterations and the verbosity level to make them consistent with that of the ij driver, you need to change the following lines of the the struct_lobpcg.c file:
HYPRE_PCGSetMaxIter( (HYPRE_Solver)solver, 50 );
HYPRE_PCGSetPrintLevel( (HYPRE_Solver)solver, 1 );
into
HYPRE_PCGSetMaxIter( (HYPRE_Solver)solver, 1000);
HYPRE_PCGSetPrintLevel( (HYPRE_Solver)solver, 2 );
and recompile, by running ./build.sh in the es directory.

The original struct driver does not seem to have the command line -tol option, but the present struct_lobpcg driver with the -lobpcg option does accept the -tol value and it must be specified -tol 1e-8 to make our tests consistent.

In order to have consistent problem size, follow these examples:

mpimon ./ij_lobpcg -lobpcg -solver 1 -tol 1e-8 -pcgitr 0 -vrand 1 -n 100 100 100 -P 1 2 1 -- node1 2
mpimon ./struct_lobpcg -lobpcg -solver 11 -tol 1e-8 -pcgitr 0 -vrand 1 -n 100 50 100 -P 1 2 1 -iout 0 -- node1 2

solve the problem with the 3D Laplacian of the same size 100x100x100 on one node 2 CPUs, while

mpimon ./ij_lobpcg -lobpcg -solver 1 -tol 1e-8 -pcgitr 0 -vrand 1 -n 100 200 100 -P 1 4 1 -- node1 2 node2 2
mpimon ./struct_lobpcg -lobpcg -tol 1e-8 -solver 11 -pcgitr 0 -vrand 1 -n 100 50 100 -P 1 4 1 -iout 0 -- node1 2 node2 2

solve the problem with the 3D Laplacian of the same size 100x200x100 on 2 nodes with 2 CPUs each.

The -iout 0 option in struct prevent its from generating output files we do not need. Even with -iout 0 option, the struct driver generates a file called zout.A.00000 that you need to remove manually. I could not find out how to tell the struct driver NOT to generate this file.

The scalability test is similar to that of the previous class projects. We increase the problem size in the y-direction proportionally to the increase of the number of nodes, so that the 3-D 7-point Laplacian always has a 100*100*100=1,000,000 block residing on every node, no matter how many nodes are involved. The results of the tests can be presented using pictures with bars, see p.18 as an example from the following set of slides:

Eleventh Copper Mountain Conference on Multigrid Methods, March 30 - APRIL 4, 2003:
Implementation of a Scalable Preconditioned Eigenvalue Solver Using Hypre - Andrew Knyazev and Merico E. Argentati ( slides)

In this project, we test the eigensolver for computing only the approximation to one eigenvector, corresponding to the smallest eigenvalue of the 3D 7 point Laplacian, using the "-vrand" 1 option. The CPU timer for longer runs is broken in both struct and ij drivers in this Hypre release, e.g., tests with -vrand 10 options are often long enough to produce negative CPU times.

The conclusion needs to address the following questions: