void ATL_UGER2K
(ATL_CINT M, ATL_CINT N, const TYPE *X, const TYPE *Y,
const TYPE *W, const TYPE *Z, TYPE *A, ATL_CINT lda)
The rank-2 update kernel ger2_k uses the exact same testing
and timing methodology as described for ger_k in the previous
section, except the kernel must be stored in the R2CASES/ subdirectory,
and you substute ``r2'' for ``r1'' in the testing and timing commands,
and ``R2'' for ``R1'' in the compiler and flag macros.
A simple GER2 implememtation would be:
#include "atlas_misc.h" /* define TYPE macros */
void ATL_UGER2K
(ATL_CINT M, ATL_CINT N, const TYPE *X, const TYPE *Y,
const TYPE *W, const TYPE *Z, TYPE *A, ATL_CINT lda)
{
register ATL_INT i, j;
for (j=0; j < N; j++)
{
const register TYPE y0=Y[j], z0=Z[j];
for (i=0; i < M; i++)
A[i] += X[i]*y0 + W[i]*z0;
A += lda; /* finished with this column */
}
}
Assuming I save the above file to R2CASES/r2k.c, I would test:
>make sr2ktest mu=1 nu=1 r2rout=r2k.c .... bunch of compilation, etc .... TEST CONJ=0, M=297, N=177, lda=297, incY=1, STARTED TEST CONJ=0, M=297, N=177, lda=297, incY=1, PASSED
And time the single precision real kernel without cache flushing with:
>make sr2ktime mu=1 nu=1 r2rout=r2k.c .... bunch of compilation, etc .... GER2: M=1000, N=1000, lda=1000, AF=[16,16,16], AM=[0,0,0], alpha=1.000000e+00: M=1000, N=1000, lda=1000, nreps=1, time=9.489282e-04, mflop=4217.39 M=1000, N=1000, lda=1000, nreps=1, time=9.714776e-04, mflop=4119.50 M=1000, N=1000, lda=1000, nreps=1, time=9.486141e-04, mflop=4218.79 NREPS=3, MAX=4218.79, MIN=4119.50, AVG=4185.22, MED=4217.39<1528>>