A novel parallel formulation of Hessenberg-triangular reduction of a regular matrix pair
on distributed memory computers is presented. The formulation is based on a sequential
cache-blocked algorithm by Kågström, Kressner, E.S. Quintana-Ortí, and G. Quintana-
Ortí (2008). A static scheduling algorithm is proposed that addresses the problem of
underutilized processes caused by two-sided updates of matrix pairs based on sequences
of rotations. Experiments using up to 961 processes demonstrate that the new formulation
is an improvement of the state of the art and also identify factors that limit its scalability.