E

PAPI MATLAB Support Page

MATLAB Support

Two external PAPI functions, flops, and PAPI , are provided for users of MATLAB. The first function lets you monitor the number of floating point instructions executed and the instantaneous MegaFLOPS rate between any two points in your MATLAB code. The second provides complete access to the PAPI Higl Level interface. Four m-file examples for each function are also included to illustrate the use of these functions. These examples can also help you calibrate the performance of your system.

The PAPI mex functions and the supporting m-files are automatically installed in the folder of your choice by either the specialized PAPI MATLAB installer or the more complete Windows PAPI installer. In either case, following the installation you must modify MATLAB's search path to access these resources.

Open MATLAB. Use the File>Set Path... menu command to open the Set Path dialog.  Use the  Add Folder... button to find the PAPI MATLAB Support folder in which the necessary files reside. It'll usually be in C:\Program Files\ICL\WinPAPI\. When the folder has been successfully added to the top of your search path, close the dialog. The PAPI flops function and its supporting m-files are now ready for use.

The files provided include:

Mex External Functions

  • [ops, mflops]= flops
  •                PAPI ('num')
  •    [val, ...]= PAPI ('start', 'event', ...)
  •    [val, ...]= PAPI ('stop')
  •    [val, ...]= PAPI ('read')
  •    [val, ...]= PAPI ('accum')
  •    [ins, ipc]= PAPI ('ipc')
  • [ins, mflips]= PAPI ('flips')
  • [ops, mflops]= PAPI ('flops')

Example M-Files

Mex Functions

NAME

flops(0) - Initialize PAPI library, reset counters to zero and begin counting.
ops = flops - Return the number of floating point operations since the first call or last reset.
[ops, mflops] = flops - Return both the number of floating point operations since the first call or last reset, and the incremental rate of floating point execution in Mega Fl oating Point Operations Per Second.

DESCRIPTION

The MATLAB flops function uses the PAPI Performance API to do the heavy lifting. PAPI takes advantage of the fact that most modern microprocessors have built-in hardware support for counting a variety of basic operations or events. PAPI uses these counters to track things like instructions executed, cycles elapsed, floating point instructions performed and a variety of other events.
The first call to flops will initialize PAPI, set up the counters to monitor floating point instructions and total cpu cycles, and start the counters. Subsequent calls will return one or two values. The first value is the number of floating point operations since the first call or last reset. The second optional value, the execution rate in mflops, can also be returned. The mflops rate is computed by dividing the operations since the last call by the cycles since the last call and multiplying by cycles per second:
mflops = ((ops/cycles) *(cycles/second))/10^6
The cycles per second value is a derived number determined empirically by counting cycles for a fixed amount of system time during the initialization of the PAPI library. Because of the way it is determined, this value can be a small but consistent source of systematic error, and can introduce differences between rates measured by PAPI and those determined by time measurements, for example, tic and toc. Also note that PAPI on Windows counts events on a system level rather than a process or thread level. This can lead to an over-reporting of cycles, and typically an under-reporting of mflops.
The flops function continues counting after any call. A call with an input of 0 resets the counters and returns 0.

ARGUMENTS

0 -- an optional input argument of 0 will cause the counters to be reset to zero.

RETURNS

ops -- total floating point instructions since the first call to flops or the last call with an input of 0.

mflops -- Mflop/s achieved since the last call to flops.


NAME

ctrs = PAPI('num') - Return the number of hardware counters.
PAPI('start', 'event', ...) - Begin counting the specified events.
[val, ...] = PAPI('stop') - Stop counting and return the current values.
[val, ...] = PAPI('read') - Read the current values of the active counters.
[val, ...] = PAPI('accum') - Add the current values of the active counters to the input values.
PAPI('ipc') - Begin counting instructions.
ins = PAPI('ipc') - Return the number of instructions executed since the first call.
[ins, ipc] = PAPI('ipc') - Return both the total number of instructions executed since the first call, and the incremental rate of instruction execution since the last call.
PAPI('flips')
PAPI('flops') - Begin counting floating point instructions or operations.
ins = PAPI('flips')
ops = PAPI('flops') - Return the number of floating point instructions or operations since the first call.
[ins, mflips] = PAPI('flips')
[ops, mflops] = PAPI('flops') - Return both the number of floating point instructions or operations since the first call, and the incremental rate of floating point execution since since the last call.

DESCRIPTION

The PAPI function provides access to the PAPI Performance API. PAPI takes advantage of the fact that most modern microprocessors have built-in hardware support for counting a variety of basic operations or events. PAPI uses these counters to track things like instructions executed, cycles elapsed, floating point instructions performed and a variety of other events.
There are 8 subfunctions within the PAPI call, as described below:
'num'   - provides information on the number of hardware counters built into this platform. The result of this call specifies how many events can be counted at once.
'start' - programs the counters with the named events and begins counting. The names of the events can be found in the PAPI documentation. If a named event cannot be found, or cannot be mapped, an error message is displayed.
'stop'  - stops counting and returns the values of the counters in the same order as events were specified in the start command. 'stop' also can be used to reset the counters for the ipc flips and flops subfunctions described below.
'read'  - return the values of the counters without stopping them.
'accum' - adds the values of the counters to the input parameters and returns them in the output parameters. Counting is not stopped.
'ipc'   - returns the total instructions executed since the first call to this subfunction, and the rate of execution of instructions (as instructions per cycle) since the last call.
'flips' - returns the total floating point instructions executed since the first call to this subfunction, and the rate of execution of floating point instructions (as mega-floating point instructions per second, or mflips) since the last call. A floating point instruction is defined as whatever this cpu naturally counts as floating point instructions.
'flops' - identical to 'flips', except it measures floating point operations rather than instructions. In many cases these two counts may be identical. In some cases 'flops' will be a derived value that attempts to reproduce that which is traditionally considered a floating point operation. For example, a fused multiply-add would be counted as two operations, even if it was only a single instruction.
In typical usage, the first five subfunctions: 'num', 'start' , 'stop', 'read', and 'accum' are used together. 'num' establishes the maximum number of events that can be supplied to 'start'. After a 'start' is issued, 'read' and 'accum' can be intermixed until a 'stop' is issued.
The three rate calls, 'ipc', 'flips', and 'flops' are intended to be used independently. They cannot be mixed, because they use the same counter resources. They can be used serially if they are separated by a 'stop' call, which can also be used to reset the counters.


Example M-Files

NAME

FlopsInnerProduct.m

DESCRIPTION

Computes the product of a scalar and a vector of size n = 50 to 500 in steps of 50. Displays the observed number of floating point operations as compared to the theoretically predicted number. Theory predicts ops = 2*n. The results provide an indication of the overhead incurred by MATLAB, and the Mflops acheived for each computation.

SOURCE

fprintf(1,'\nPAPI Inner Product Test');
fprintf(1,'\nUsing flops');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n', 'difference', '% error', 'mflops')
for n=50:50:500,
    a=rand(1,n);x=rand(n,1);
    flops(0);
    c=a*x;
    [ops, mflops] = flops;
    fprintf(1,'%12d %12d %12d %12d %12.2f %12.2f\n',n,ops,2*n,ops - 2*n, (1.0 - ((2*n) / ops)) * 100,mflops)
end

RESULTS

The following were obtained on an 850 MHz Pentium III running Windows 2000 and MATLAB 6.1. Your milage may vary.
>> FlopsInnerProduct

PAPI Inner Product Test
Using flops
           n          ops           2n   difference      % error       mflops
          50          119          100           19        15.97         2.28
         100          223          200           23        10.31         7.45
         150          327          300           27         8.26        10.52
         200          431          400           31         7.19        13.92
         250          535          500           35         6.54        16.01
         300          639          600           39         6.10        18.92
         350          743          700           43         5.79        20.82
         400          851          800           51         5.99        25.27
         450          955          900           55         5.76        27.90
         500         1059         1000           59         5.57        30.08
>> 


NAME

FlopsMatrixVector.m

DESCRIPTION

Computes the product of a square matrix and a vector of size n = 50 to 500 in steps of 50. Displays the observed number of floating point operations as compared to the theoretically predicted number. Theory predicts ops = 2*n^2. The results provide an indication of the overhead incurred by MATLAB, and the Mflops acheived for each computation.

SOURCE

fprintf(1,'\nPAPI Matrix Vector Multiply Test');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n^2', 'difference', '% error', 'mflops')
for n=50:50:500,
    a=rand(n);x=rand(n,1);
    flops(0);
    b=a*x;
    [count,mflops]=flops;
    fprintf(1,'%12d %12d %12d %12d %12.2f %12.2f\n',n,count,2*n^2,count - 2*n^2, (1.0 - ((2*n^2) / count)) * 100,mflops)
end

RESULTS

The following were obtained on an 850 MHz Pentium III running Windows 2000 and MATLAB 6.1. Your milage may vary.

>> PAPI Matrix Vector Multiply Test
           n          ops         2n^2   difference      % error       mflops
          50         5220         5000          220         4.21        66.15
         100        20625        20000          625         3.03       194.65
         150        45223        45000          223         0.49        31.84
         200        80317        80000          317         0.39        40.88
         250       125423       125000          423         0.34        49.97
         300       180541       180000          541         0.30        53.06
         350       245671       245000          671         0.27        51.94
         400       320467       320000          467         0.15        49.36
         450       405583       405000          583         0.14        43.97
         500       500711       500000          711         0.14        43.00
>> 


NAME

FlopsMatrixMatrix.m

DESCRIPTION

Computes the product of two square matrices of size n = 50 to 500 in steps of 50. Displays the observed number of floating point operations as compared to the theoretically predicted number. Theory predicts ops = 2*n^3. MATLAB uses an ATLAS optimized agorithm for peak performance on matrix-matrix multiplies. The bulk of the error indicated below is due to that algorithm, which increases floating point performance while adding floating point operations to the theoretically predicted number.

SOURCE

fprintf(1,'\nPAPI Matrix Matrix Multiply Test');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n^3', 'difference', '% error', 'mflops')
for n=50:50:500,
    a=rand(n);b=rand(n);c=rand(n);
    flops(0);
    c=c+a*b;
    [count,mflops]=flops;
    fprintf(1,'%12d %12d %12d %12d %12.2f %12.2f\n',n,count,2*n^3,count - 2*n^3, (1.0 - ((2*n^3) / count)) * 100,mflops)
end

RESULTS

The following were obtained on an 850 MHz Pentium III running Windows 2000 and MATLAB 6.1. Your milage may vary.

>> FlopsMatrixMatrix

PAPI Matrix Matrix Multiply Test
           n          ops         2n^3   difference      % error       mflops
          50       258660       250000         8660         3.35       425.32
         100      2039068      2000000        39068         1.92       412.40
         150      6796006      6750000        46006         0.68       464.47
         200     16082342     16000000        82342         0.51       498.34
         250     31379542     31250000       129542         0.41       514.93
         300     54187928     54000000       187928         0.35       426.37
         350     86007456     85750000       257456         0.30       475.78
         400    128320392    128000000       320392         0.25       446.92
         450    182656368    182250000       406368         0.22       441.02
         500    250503312    250000000       503312         0.20       447.54
>> 


NAME

FlopsSampler.m

DESCRIPTION

Demonstrates the application of the PAPI flops function on a series of increasingly more computationally expensive MATLAB operations.  You define the size of the computation with an input parameter; MATLAB displays the operation, the number of floating point operations required, and the effective Mflop throughput.

SOURCE

The source for this function can be examined in the file:
PAPISampler.m

RESULTS

The following were obtained on an 850 MHz Pentium III running Windows 2000 and MATLAB 6.1. Your milage may vary.

>> FlopsSampler(500)

Counts Using PAPI

              Operations            n      fl pt ops      Mflop/s
       calling PAPI flops         500              2         0.04
              dot product         500           1077         6.96
            matrix vector         500         500711        42.75
            random matrix         500         874905        27.41
                  chol(a)         500       46495920       282.40
                    lu(a)         500       84994928       262.04
                    x=a\y         500       91706152       196.63
               condest(a)         500       93340912        77.32
                    qr(a)         500      188981232       363.99
          matrix multiply         500      250253248       469.55
                   inv(a)         500      263311472       328.68
                   svd(a)         500      377278304       115.15
                  cond(a)         500      377283040       113.68
                  hess(a)         500      450242112       177.70
                   eig(a)         500     1181522304       157.53
           [u,s,v]=svd(a)         500     2032648448       110.25
                  pinv(a)         500  2.533161e+009       123.81
                s=gsvd(a)         500  4.506011e+009       146.69
             [x,e]=eig(a)         500  2.916129e+009       149.32
    [u,v,x,c,s]=gsvd(a,b)         500  4.756261e+009       153.54
>> 



NAME

PAPIInnerProduct.m

DESCRIPTION

Computes the product of a scalar and a vector of size n = 50 to 500 in steps of 50 using two different methods: the PAPI('flops') call, and PAPI('start') / PAPI('stop'). Displays the observed number of floating point operations as compared to the theoretically predicted number. Theory predicts ops = 2*n. The results provide an indication of the overhead incurred by MATLAB, and the Mflops acheived for each computation.

SOURCE

fprintf(1,'\n\nPAPI Inner Product Test');
fprintf(1,'\nUsing the High Level PAPI("flops") call');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n', 'difference', '% error', 'mflops')
for n=50:50:500,
    a=rand(1,n);x=rand(n,1);
    PAPI('stop'); % reset the counters to zero
    PAPI('flops'); % start counting flops
    c=a*x;
    [ops, mflops] = PAPI('flops'); % read the flops data
    fprintf(1,'%12d %12d %12d %12d %12.2f %12.2f\n',n,ops,2*n,ops - 2*n, (1.0 - ((2*n) / ops)) * 100,mflops)
end
PAPI('stop');

fprintf(1,'\n\nPAPI Inner Product Test');
fprintf(1,'\nUsing PAPI start and stop');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n', 'difference', '% error', 'flops/cycle')
for n=50:50:500,
    a=rand(1,n);x=rand(n,1);
    PAPI('start', 'PAPI_TOT_CYC', 'PAPI_FP_OPS');
    c=a*x;
    [cyc, ops] = PAPI('stop');
    fprintf(1,'%12d %12d %12d %12d %12.2f %12.6f\n',n,ops,2*n,ops - 2*n, (1.0 - ((2*n) / ops)) * 100,ops/cyc)
end

RESULTS

The following were obtained on an 850 MHz Pentium III running Windows 2000 and MATLAB 6.1. Your milage may vary.
>> PAPIInnerProduct


PAPI Inner Product Test
Using the High Level PAPI("flops") call
           n          ops           2n   difference      % error       mflops
          50          119          100           19        15.97         2.22
         100          223          200           23        10.31         6.47
         150          327          300           27         8.26         9.49
         200          431          400           31         7.19        12.17
         250          535          500           35         6.54        15.19
         300          639          600           39         6.10        17.73
         350          743          700           43         5.79        20.56
         400          851          800           51         5.99        22.93
         450          955          900           55         5.76        15.85
         500         1059         1000           59         5.57        27.64


PAPI Inner Product Test
Using PAPI start and stop
           n          ops           2n   difference      % error  flops/cycle
          50          119          100           19        15.97     0.002868
         100          223          200           23        10.31     0.007038
         150          327          300           27         8.26     0.010591
         200          431          400           31         7.19     0.013792
         250          535          500           35         6.54     0.016720
         300          639          600           39         6.10     0.019734
         350          743          700           43         5.79     0.022139
         400          851          800           51         5.99     0.025629
         450          955          900           55         5.76     0.028769
         500         1059         1000           59         5.57     0.032066
>> 


NAME

PAPIMatrixVector.m

DESCRIPTION

Computes the product of a square matrix and a vector of size n = 50 to 500 in steps of 50 using two different methods: the PAPI('flops') call, and PAPI('start') / PAPI('stop'). Displays the observed number of floating point operations as compared to the theoretically predicted number. Theory predicts ops = 2*n^2. The results provide an indication of the overhead incurred by MATLAB, and the Mflops acheived for each computation.

SOURCE

fprintf(1,'\nPAPI Matrix Vector Multiply Test');
fprintf(1,'\nUsing the High Level PAPI("flops") call');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n^2', 'difference', '% error', 'mflops')
for n=50:50:500,
    a=rand(n);x=rand(n,1);
    PAPI('stop'); % reset the counters to zero
    PAPI('flops'); % start counting flops
    b=a*x;
    [count, mflops] = PAPI('flops'); % read the flops data
    fprintf(1,'%12d %12d %12d %12d %12.2f %12.2f\n',n,count,2*n^2,count - 2*n^2, (1.0 - ((2*n^2) / count)) * 100,mflops)
end
PAPI('stop');

fprintf(1,'\nPAPI Matrix Vector Multiply Test');
fprintf(1,'\nUsing PAPI start and stop');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n^2', 'difference', '% error', 'flops/cycle')
for n=50:50:500,
    a=rand(n);x=rand(n,1);
    PAPI('start', 'PAPI_TOT_CYC', 'PAPI_FP_OPS');
    c=a*x;
    [cyc, ops] = PAPI('stop');
    fprintf(1,'%12d %12d %12d %12d %12.2f %12.6f\n',n,ops,2*n^2,ops - 2*n^2, (1.0 - ((2*n^2) / ops)) * 100,ops/cyc)
end

RESULTS

The following were obtained on an 850 MHz Pentium III running Windows 2000 and MATLAB 6.1. Your milage may vary.

>> PAPIMatrixVector

PAPI Matrix Vector Multiply Test
Using the High Level PAPI("flops") call
           n          ops         2n^2   difference      % error       mflops
          50         5220         5000          220         4.21        68.18
         100        20625        20000          625         3.03       183.13
         150        45223        45000          223         0.49        61.86
         200        80317        80000          317         0.39        56.50
         250       125423       125000          423         0.34        57.77
         300       180541       180000          541         0.30        58.14
         350       245671       245000          671         0.27        55.00
         400       320467       320000          467         0.15        51.79
         450       405583       405000          583         0.14        45.92
         500       500711       500000          711         0.14        42.08

PAPI Matrix Vector Multiply Test
Using PAPI start and stop
           n          ops         2n^2   difference      % error  flops/cycle
          50         5220         5000          220         4.21     0.065863
         100        20625        20000          625         3.03     0.202808
         150        45223        45000          223         0.49     0.072082
         200        80317        80000          317         0.39     0.064472
         250       125423       125000          423         0.34     0.068027
         300       180541       180000          541         0.30     0.068057
         350       245671       245000          671         0.27     0.063880
         400       320467       320000          467         0.15     0.059551
         450       405583       405000          583         0.14     0.044170
         500       500711       500000          711         0.14     0.048029
>> 


NAME

PAPIMatrixMatrix.m

DESCRIPTION

Computes the product of two square matrices of size n = 50 to 500 in steps of 50 using two different methods: the PAPI('flops') call, and PAPI('start') / PAPI('stop'). Displays the observed number of floating point operations as compared to the theoretically predicted number. Theory predicts ops = 2*n^3. MATLAB uses an ATLAS optimized agorithm for peak performance on matrix-matrix multiplies. The bulk of the error indicated below is due to that algorithm, which increases floating point performance while adding floating point operations to the theoretically predicted number.

SOURCE

fprintf(1,'\nPAPI Matrix Matrix Multiply Test');
fprintf(1,'\nUsing the High Level PAPI("flops") call');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n^3', 'difference', '% error', 'mflops')
for n=50:50:500,
    a=rand(n);b=rand(n);c=rand(n);
    PAPI('stop'); % reset the counters to zero
    PAPI('flops'); % start counting flops
    c=c+a*b;
    [count, mflops] = PAPI('flops'); % read the flops data
    fprintf(1,'%12d %12d %12d %12d %12.2f %12.2f\n',n,count,2*n^3,count - 2*n^3, (1.0 - ((2*n^3) / count)) * 100,mflops)
end
PAPI('stop');

fprintf(1,'\nPAPI Matrix Matrix Multiply Test');
fprintf(1,'\nUsing PAPI start and stop');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n^3', 'difference', '% error', 'flops/cycle')
for n=50:50:500,
    a=rand(n);b=rand(n);c=rand(n);
    PAPI('start', 'PAPI_TOT_CYC', 'PAPI_FP_OPS');
    c=c+a*b;
    [cyc, ops] = PAPI('stop');
    fprintf(1,'%12d %12d %12d %12d %12.2f %12.6f\n',n,ops,2*n^3,ops - 2*n^3, (1.0 - ((2*n^3) / ops)) * 100,ops/cyc)
end

RESULTS

The following were obtained on an 850 MHz Pentium III running Windows 2000 and MATLAB 6.1. Your milage may vary.

>> PAPIMatrixMatrix

PAPI Matrix Matrix Multiply Test
Using the High Level PAPI("flops") call
           n          ops         2n^3   difference      % error       mflops
          50       258660       250000         8660         3.35       420.75
         100      2039068      2000000        39068         1.92       479.00
         150      6796006      6750000        46006         0.68       466.45
         200     16082342     16000000        82342         0.51       500.56
         250     31379542     31250000       129542         0.41       505.66
         300     54187924     54000000       187924         0.35       437.01
         350     86007456     85750000       257456         0.30       487.16
         400    128320520    128000000       320520         0.25       423.99
         450    182656272    182250000       406272         0.22       459.83
         500    250503312    250000000       503312         0.20       453.29

PAPI Matrix Matrix Multiply Test
Using PAPI start and stop
           n          ops         2n^3   difference      % error  flops/cycle
          50       258660       250000         8660         3.35     0.408925
         100      2039104      2000000        39104         1.92     0.156549
         150      6796006      6750000        46006         0.68     0.555426
         200     16082400     16000000        82400         0.51     0.416697
         250     31379640     31250000       129640         0.41     0.479208
         300     54187826     54000000       187826         0.35     0.580607
         350     86007732     85750000       257732         0.30     0.457513
         400    128320260    128000000       320260         0.25     0.545727
         450    182656419    182250000       406419         0.22     0.509754
         500    250503204    250000000       503204         0.20     0.552343
>>
 





Innovative Computing Laboratory
2001 R&D Winner
Contact PAPI: papi@cs.utk.edu Computer Science Department

University of Tennessee