The QR algorithm computes the Schur form of a matrix and is by far the most
popular approach for solving dense nonsymmetric eigenvalue problems. Multishift
and aggressive early deflation (AED) techniques have led to significantly more
efficient sequential implementations of the QR algorithm during the last
decade. More recently, these techniques have been incorporated in a novel
parallel QR algorithm on hybrid distributed memory HPC systems. While leading
to significant performance improvements, it has turned out that AED may become
a computational bottleneck as the number of processors increases. In this
paper, we discuss a two-level approach for performing AED in a parallel
environment, where the lower level consists of a novel combination of AED with
the pipelined QR algorithm implemented in the ScaLAPACK routine PDLAHQR.
Numerical experiments demonstrate that this new implementation further improves
the performance of the parallel QR algorithm.