Contrasting non-default install performance
If you do not install using the architectural defaults, make time will
only print out the Present columns.  This gives you a good summary of
ATLAS's library performance, but it can be hard to tell what is good and
bad if you are not familiar with ATLAS on this hardware.  Sometimes, ATLAS
has architectural defaults for your platform, but your install doesn't use
them.  This is usually because the installer has specified the use of a
non-default compiler, or has explicitly asked that the architectural defaults
not be used, or has overridden the detection of the architecture, etc.  In
this case, make time does not do the comparison
against the architectural defaults, and so only the Present columns
are printed.
However, if you wish to ensure that your library is as good as one that
uses the architectural defaults, then you can manually tell the program
called by make time (xatlbench to do the comparison.  The most
common example would
be you have switched to an unsupported compiler (eg., the Intel compiler),
and now you want to see if the library you built using it is as fast or faster
than the one using the default compiler.  Another example would
be that you want to compare the performance of two closely related
architectures.  This is what we will do here, where we contrast the performance
of the 32 and 64 bit versions of the library on my Core2Duo.
In order to manually do a comparison between a present install and any of
the results stored in ATLAS's architectural defaults you'll need to
perform the following steps:
- make time issued in the BLDdir of your non-default install.
       This does the timings of the present build, and stores the results
       in BLDdir/bin/INSTALL_LOG.
- cd SRCdir/CONFIG/ARCHS, and find the tarfile containing the
      results you wish to compare against.  In our case, we choose
      Core2Duo32SSE3.tar.bz2 to compare against our own Core2Duo64SSE
      results.
- bunzip2 -c Core2Duo32SSE3.tar.bz2 | tar xvf - untars the
      selected architectural results (replace Core2Duo32SSE3.tar.bz2 
      with the tarfile you have selected in step#2).
- cd BLDdir
- ./xatlbench -dp SRCdir/CONFIG/ARCHS/<ARCH> -dc BLDdir/bin/INSTALL_LOG
 xatlbench is the program that compares two sets of results, with
    the -dp pointing to the previous (Refrenc) install result
    directory and -dc pointing to the current (Present)
    install result directory.
Figure ![[*]](crossref.png) shows me doing this on my Core2Duo, with 
SRCdir = /home/whaley/TEST/ATLAS3.7.36.0 and
BLDdir = /home/whaley/TEST/ATLAS3.7.36.0/obj64, where we compare
the present 64-bit install to the stored 32-bit install.
We see that the 64-bit install, which gets to use 16 rather than 8 registers,
is slightly faster for almost all kernels and precisions, as one might expect.
 shows me doing this on my Core2Duo, with 
SRCdir = /home/whaley/TEST/ATLAS3.7.36.0 and
BLDdir = /home/whaley/TEST/ATLAS3.7.36.0/obj64, where we compare
the present 64-bit install to the stored 32-bit install.
We see that the 64-bit install, which gets to use 16 rather than 8 registers,
is slightly faster for almost all kernels and precisions, as one might expect.
Figure:
Comparing 32 and 64 bit libraries on a 2.4 Ghz Core2Duo
|  | 
 
R. Clint Whaley
2016-07-28