
         Matrix multiplication optimization demo

This directory contains a progression of matrix multiplication programs
using OpenCL on a GPU (mult) and C on a CPU (seq_mult).  

This software was written by Tim Mattson of Intel to support the 
Intro to OpenCL Tutorial at SC'10.    It is important to note
that Intel has nothing to do with this software (and hence is not
liable for anything bad that may happen from its use).

The software is unlicensed and can be freely used, redistributed,
modified or ignored.   All I ask is that you acknowlege the orginal
author (Tim Mattson).

To build the two programs, just type make.  Run them as:

   ./mult
   ./seq_mult

The programs are built from the following source files:

    mult_driver.c      The driver for the OCL/GPU program

    seq_mult.c         The full program for the sequential program

    c_elem.c           Setup and kernel for the naive case where we have 
                       one work item per element of the product matrix
                       and we let the framework select the local dim

    c_row.c            Uses the same dot procuct algorithm as before, but 
                       now we have one work item per row of C and we set
                       the local work dim to 250 (so there is one work
                       group per compute unit).  All matrices are used 
                       from global memory.

    c_row_priv.c       Same as c_row.c, but the row of A we are working 
                       on is copied into private memory
 
    c_row_priv_bloc.c  Same as c_row_priv, but the colunns of B that 
                       that the work group is using is first copied
                       into local memory.

Support files:

    mult.h             Include file for the driver.  Defines key constants,
                       function prototypes for functions used from 
                       ../common and include files needed by the driver.

    kernels.h          An include file with function prototypes for the
                       functions defined in c_elem.c, c_row.c, 
                       c_row_priv.c, and c_row_priv_bloc.c

    makefile           Makefile wtih targets seq_mult, mult, clean and
                       a default (i.e. no target on command line) to 
                       build all executables

    matrix_lib.c       A collection of simple matrix manipulation 
                       functions used in mult_driver

    matrix_lib.h       function prototypes from matrix_lib.c

