void ATL_UGER2K (ATL_CINT M, ATL_CINT N, const TYPE *X, const TYPE *Y, const TYPE *W, const TYPE *Z, TYPE *A, ATL_CINT lda)The rank-2 update kernel ger2_k uses the exact same testing and timing methodology as described for ger_k in the previous section, except the kernel must be stored in the R2CASES/ subdirectory, and you substute ``r2'' for ``r1'' in the testing and timing commands, and ``R2'' for ``R1'' in the compiler and flag macros.
A simple GER2 implememtation would be:
#include "atlas_misc.h" /* define TYPE macros */ void ATL_UGER2K (ATL_CINT M, ATL_CINT N, const TYPE *X, const TYPE *Y, const TYPE *W, const TYPE *Z, TYPE *A, ATL_CINT lda) { register ATL_INT i, j; for (j=0; j < N; j++) { const register TYPE y0=Y[j], z0=Z[j]; for (i=0; i < M; i++) A[i] += X[i]*y0 + W[i]*z0; A += lda; /* finished with this column */ } }
Assuming I save the above file to R2CASES/r2k.c, I would test:
>make sr2ktest mu=1 nu=1 r2rout=r2k.c .... bunch of compilation, etc .... TEST CONJ=0, M=297, N=177, lda=297, incY=1, STARTED TEST CONJ=0, M=297, N=177, lda=297, incY=1, PASSED
And time the single precision real kernel without cache flushing with:
>make sr2ktime mu=1 nu=1 r2rout=r2k.c .... bunch of compilation, etc .... GER2: M=1000, N=1000, lda=1000, AF=[16,16,16], AM=[0,0,0], alpha=1.000000e+00: M=1000, N=1000, lda=1000, nreps=1, time=9.489282e-04, mflop=4217.39 M=1000, N=1000, lda=1000, nreps=1, time=9.714776e-04, mflop=4119.50 M=1000, N=1000, lda=1000, nreps=1, time=9.486141e-04, mflop=4218.79 NREPS=3, MAX=4218.79, MIN=4119.50, AVG=4185.22, MED=4217.39<1528>>