PAPI MATLAB Support Page
MATLAB Support
Two external PAPI functions, flops, and PAPI
, are provided for users of MATLAB. The first function lets you monitor
the number of floating point instructions executed and the
instantaneous
MegaFLOPS rate between any two points in your MATLAB code. The second
provides complete access to the PAPI Higl Level interface. Four m-file
examples
for each function are also included to illustrate the use of these
functions.
These examples can also help you calibrate the performance of your
system.
The PAPI mex functions and the supporting m-files are automatically
installed in the folder of your choice by either the specialized PAPI
MATLAB installer or the more complete Windows PAPI installer. In either
case, following the installation you must modify MATLAB's search path
to access these resources.
Open MATLAB. Use the File>Set Path... menu command to open
the Set Path dialog. Use the Add Folder...
button to find the PAPI MATLAB Support folder in which the
necessary files reside. It'll usually be in C:\Program Files\ICL\WinPAPI\. When
the folder has been successfully added to the top of your search
path, close the dialog. The PAPI flops function and its supporting
m-files
are now ready for use.
The files provided include:
Mex External Functions
- [ops, mflops]= flops
-
PAPI ('num')
- [val, ...]=
PAPI ('start', 'event', ...)
- [val, ...]=
PAPI ('stop')
- [val, ...]=
PAPI ('read')
- [val, ...]=
PAPI ('accum')
- [ins, ipc]=
PAPI ('ipc')
- [ins, mflips]=
PAPI ('flips')
- [ops, mflops]=
PAPI ('flops')
Example M-Files
Mex Functions
NAME
flops(0) - Initialize PAPI library,
reset
counters to zero and begin counting.
ops = flops - Return the number of
floating point operations since the first call or last reset.
[ops, mflops] = flops - Return both
the number of floating point operations since the first call or last
reset, and the incremental rate of floating point execution in Mega
Fl oating Point Operations Per Second.
DESCRIPTION
The MATLAB flops function uses the PAPI
Performance API to do the heavy
lifting. PAPI takes advantage of the fact that most modern
microprocessors have built-in hardware support for counting a variety
of basic operations or events. PAPI uses these counters to track things
like instructions executed, cycles elapsed, floating point instructions
performed and a variety of other events.
The first call to flops will initialize PAPI, set up the
counters to monitor floating point instructions and total cpu cycles,
and start the counters. Subsequent calls will return one or two values.
The first value
is the number of floating point operations since the first call or last
reset. The second optional value, the execution rate in mflops, can
also
be returned. The mflops rate is computed by dividing the operations
since
the last call by the cycles since the last call and multiplying by
cycles
per second:
mflops = ((ops/cycles)
*(cycles/second))/10^6
The cycles per second value is a derived number determined empirically
by counting cycles for a fixed amount of system time during the
initialization of the PAPI library. Because of the way it is
determined, this value can be
a small but consistent source of systematic error, and can introduce
differences between rates measured by PAPI and those determined by time
measurements, for example, tic and toc. Also note that
PAPI on Windows counts events on a system level rather than a process
or thread level. This can lead
to an over-reporting of cycles, and typically an under-reporting of
mflops.
The flops function continues counting after any call. A call with an
input of 0 resets the counters and returns 0.
ARGUMENTS
0 -- an optional input argument of 0
will cause the counters to be reset to zero.
RETURNS
ops -- total floating point
instructions since the first call to flops or the last
call with an input of 0.
mflops -- Mflop/s achieved since the last
call to flops.
NAME
ctrs = PAPI('num') - Return the number
of hardware counters.
PAPI('start', 'event', ...)
- Begin counting the specified events.
[val, ...] = PAPI('stop') -
Stop counting and return the current values.
[val, ...] = PAPI('read') -
Read the current values of the active counters.
[val, ...] = PAPI('accum') -
Add the current values of the active counters to the input values.
PAPI('ipc') - Begin counting
instructions.
ins = PAPI('ipc') - Return the
number of instructions executed since the first call.
[ins, ipc] = PAPI('ipc') - Return
both the total number of instructions executed since the first call,
and
the incremental rate of instruction execution since the last call.
PAPI('flips')
PAPI('flops') - Begin counting
floating point instructions or operations.
ins = PAPI('flips')
ops = PAPI('flops') - Return
the number of floating point instructions or operations since the first
call.
[ins, mflips] = PAPI('flips')
[ops, mflops] = PAPI('flops') - Return both the
number of floating point instructions or operations since
the first call, and the incremental rate of floating point execution
since
since the last call.
DESCRIPTION
The PAPI function provides
access to the PAPI Performance API. PAPI takes advantage of the fact
that
most modern microprocessors have built-in hardware support for counting
a
variety of basic operations or events. PAPI uses these counters to
track
things like instructions executed, cycles elapsed, floating point
instructions
performed and a variety of other events.
There are 8 subfunctions
within the PAPI call, as described below:
'num' - provides
information
on the number of hardware counters built into this platform. The result
of
this call specifies how many events can be counted at once.
'start' - programs the counters
with the named events and begins counting. The names of the events can
be
found in the PAPI documentation. If a named event cannot be found, or
cannot be mapped, an error message is displayed.
'stop' - stops counting
and returns the values of the counters in the same order as events were
specified
in the start command. 'stop' also can be used to reset the counters for
the
ipc flips and flops subfunctions described below.
'read' - return the values of
the counters without stopping them.
'accum' - adds the values of the
counters to the input parameters and
returns them in the output parameters. Counting is not stopped.
'ipc' - returns the total
instructions executed since the
first call to this subfunction, and the rate of execution of
instructions
(as instructions per cycle) since the last call.
'flips' - returns the total floating
point instructions executed since
the first call to this subfunction, and the rate of execution of
floating
point instructions (as mega-floating point instructions per second, or
mflips)
since the last call. A floating point instruction is defined as
whatever
this cpu naturally counts as floating point instructions.
'flops' - identical to 'flips',
except it measures floating point operations
rather than instructions. In many cases these two counts may be
identical.
In some cases 'flops' will be a derived value that attempts to
reproduce
that which is traditionally considered a floating point operation. For
example,
a fused multiply-add would be counted as two operations, even if it was
only
a single instruction.
In typical usage, the first five
subfunctions: 'num', 'start'
, 'stop', 'read', and 'accum' are used
together.
'num' establishes the maximum number of events that can be supplied
to
'start'. After a 'start' is issued, 'read' and 'accum'
can be intermixed until a 'stop' is issued.
The three rate calls, 'ipc', 'flips',
and 'flops' are
intended to be used independently. They cannot be mixed, because they
use
the same counter resources. They can be used serially if they are
separated
by a 'stop' call, which can also be used to reset the counters.
Example M-Files
NAME
FlopsInnerProduct.m
DESCRIPTION
Computes the product of a scalar and a vector
of size n = 50 to 500 in
steps of 50. Displays the observed number of floating point operations
as compared to the theoretically predicted number. Theory predicts ops
= 2*n. The results provide an indication of the overhead incurred by
MATLAB,
and the Mflops acheived for each computation.
SOURCE
fprintf(1,'\nPAPI Inner Product Test');
fprintf(1,'\nUsing flops');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n',
'difference',
'% error', 'mflops')
for n=50:50:500,
a=rand(1,n);x=rand(n,1);
flops(0);
c=a*x;
[ops, mflops] = flops;
fprintf(1,'%12d %12d %12d %12d %12.2f
%12.2f\n',n,ops,2*n,ops
- 2*n, (1.0 - ((2*n) / ops)) * 100,mflops)
end
RESULTS
The following were obtained on an 850 MHz
Pentium III running Windows
2000 and MATLAB 6.1. Your milage may vary.
>> FlopsInnerProduct
PAPI Inner Product Test
Using flops
n
ops
2n
difference %
error
mflops
50
119
100
19
15.97
2.28
100
223
200
23
10.31
7.45
150
327
300
27
8.26
10.52
200
431
400
31
7.19
13.92
250
535
500
35
6.54
16.01
300
639
600
39
6.10
18.92
350
743
700
43
5.79
20.82
400
851
800
51
5.99
25.27
450
955
900
55
5.76
27.90
500
1059
1000
59
5.57
30.08
>>
NAME
FlopsMatrixVector.m
DESCRIPTION
Computes the product of a square matrix and
a vector of size n = 50 to
500 in steps of 50. Displays the observed number of floating point
operations
as compared to the theoretically predicted number. Theory predicts ops
= 2*n^2. The results provide an indication of the overhead incurred by
MATLAB, and the Mflops acheived for each computation.
SOURCE
fprintf(1,'\nPAPI Matrix Vector
Multiply Test');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n^2',
'difference',
'% error', 'mflops')
for n=50:50:500,
a=rand(n);x=rand(n,1);
flops(0);
b=a*x;
[count,mflops]=flops;
fprintf(1,'%12d %12d %12d %12d %12.2f
%12.2f\n',n,count,2*n^2,count
- 2*n^2, (1.0 - ((2*n^2) / count)) * 100,mflops)
end
RESULTS
The following were obtained on an 850 MHz
Pentium III running Windows
2000 and MATLAB 6.1. Your milage may vary.
>> PAPI Matrix Vector Multiply Test
n
ops 2n^2
difference
% error mflops
50
5220
5000
220
4.21
66.15
100
20625
20000
625
3.03
194.65
150
45223
45000
223
0.49
31.84
200
80317
80000
317
0.39
40.88
250
125423
125000
423
0.34
49.97
300
180541
180000
541
0.30
53.06
350
245671
245000
671
0.27
51.94
400
320467
320000
467
0.15
49.36
450
405583
405000
583
0.14
43.97
500
500711
500000
711
0.14
43.00
>>
NAME
FlopsMatrixMatrix.m
DESCRIPTION
Computes the product of two square matrices
of size n = 50 to 500 in
steps of 50. Displays the observed number of floating point operations
as compared to the theoretically predicted number. Theory predicts ops
= 2*n^3. MATLAB uses an ATLAS optimized agorithm for peak performance
on
matrix-matrix multiplies. The bulk of the error indicated below is due
to that algorithm, which increases floating point performance while
adding
floating point operations to the theoretically predicted number.
SOURCE
fprintf(1,'\nPAPI Matrix Matrix Multiply
Test');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n^3',
'difference',
'% error', 'mflops')
for n=50:50:500,
a=rand(n);b=rand(n);c=rand(n);
flops(0);
c=c+a*b;
[count,mflops]=flops;
fprintf(1,'%12d %12d %12d %12d %12.2f
%12.2f\n',n,count,2*n^3,count
- 2*n^3, (1.0 - ((2*n^3) / count)) * 100,mflops)
end
RESULTS
The following were obtained on an 850 MHz
Pentium III running Windows 2000 and MATLAB 6.1. Your milage may vary.
>> FlopsMatrixMatrix
PAPI Matrix Matrix Multiply Test
n
ops 2n^3
difference
% error mflops
50
258660
250000
8660
3.35
425.32
100
2039068
2000000
39068
1.92
412.40
150
6796006
6750000
46006
0.68
464.47
200
16082342
16000000
82342
0.51
498.34
250
31379542
31250000
129542
0.41
514.93
300
54187928
54000000
187928
0.35
426.37
350
86007456
85750000
257456
0.30
475.78
400
128320392
128000000
320392
0.25 446.92
450
182656368
182250000
406368
0.22 441.02
500
250503312
250000000
503312
0.20 447.54
>>
NAME
FlopsSampler.m
DESCRIPTION
Demonstrates the application of the PAPI
flops function on a series of increasingly more computationally
expensive MATLAB operations. You define the size of the
computation with an input parameter; MATLAB displays the operation, the
number of floating point operations required, and the effective Mflop
throughput.
SOURCE
The source for this function can be examined
in the file:
PAPISampler.m
RESULTS
The following were obtained on an 850 MHz
Pentium III running Windows 2000 and MATLAB 6.1. Your milage may vary.
>> FlopsSampler(500)
Counts Using PAPI
Operations
n fl pt ops
Mflop/s
calling PAPI
flops
500
2 0.04
dot product
500
1077 6.96
matrix
vector
500
500711 42.75
random
matrix
500
874905 27.41
chol(a)
500
46495920 282.40
lu(a)
500
84994928 262.04
x=a\y
500
91706152 196.63
condest(a)
500
93340912 77.32
qr(a)
500
188981232 363.99
matrix
multiply
500
250253248
469.55
inv(a)
500
263311472 328.68
svd(a)
500
377278304 115.15
cond(a)
500
377283040 113.68
hess(a)
500
450242112 177.70
eig(a)
500
1181522304 157.53
[u,s,v]=svd(a)
500
2032648448
110.25
pinv(a) 500
2.533161e+009
123.81
s=gsvd(a) 500
4.506011e+009
146.69
[x,e]=eig(a) 500
2.916129e+009
149.32
[u,v,x,c,s]=gsvd(a,b)
500 4.756261e+009 153.54
>>
NAME
PAPIInnerProduct.m
DESCRIPTION
Computes the product of a scalar and a vector
of size n = 50 to 500 in
steps of 50 using two different methods: the PAPI('flops') call, and
PAPI('start')
/ PAPI('stop'). Displays the observed number of floating point
operations
as compared to the theoretically predicted number. Theory predicts ops
= 2*n. The results provide an indication of the overhead incurred by
MATLAB,
and the Mflops acheived for each computation.
SOURCE
fprintf(1,'\n\nPAPI Inner Product Test');
fprintf(1,'\nUsing the High Level PAPI("flops") call');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n',
'difference',
'% error', 'mflops')
for n=50:50:500,
a=rand(1,n);x=rand(n,1);
PAPI('stop'); % reset the counters to zero
PAPI('flops'); % start counting flops
c=a*x;
[ops, mflops] = PAPI('flops'); % read the flops data
fprintf(1,'%12d %12d %12d %12d %12.2f
%12.2f\n',n,ops,2*n,ops
- 2*n, (1.0 - ((2*n) / ops)) * 100,mflops)
end
PAPI('stop');
fprintf(1,'\n\nPAPI Inner Product Test');
fprintf(1,'\nUsing PAPI start and stop');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n',
'difference',
'% error', 'flops/cycle')
for n=50:50:500,
a=rand(1,n);x=rand(n,1);
PAPI('start', 'PAPI_TOT_CYC', 'PAPI_FP_OPS');
c=a*x;
[cyc, ops] = PAPI('stop');
fprintf(1,'%12d %12d %12d %12d %12.2f
%12.6f\n',n,ops,2*n,ops
- 2*n, (1.0 - ((2*n) / ops)) * 100,ops/cyc)
end
RESULTS
The following were obtained on an 850 MHz
Pentium III running Windows
2000 and MATLAB 6.1. Your milage may vary.
>> PAPIInnerProduct
PAPI Inner Product Test
Using the High Level PAPI("flops") call
n
ops
2n
difference %
error
mflops
50
119
100
19
15.97
2.22
100
223
200
23
10.31
6.47
150
327
300
27
8.26
9.49
200
431
400
31
7.19
12.17
250
535
500
35
6.54
15.19
300
639
600
39
6.10
17.73
350
743
700
43
5.79
20.56
400
851
800
51
5.99
22.93
450
955
900
55
5.76
15.85
500
1059
1000
59
5.57
27.64
PAPI Inner Product Test
Using PAPI start and stop
n
ops
2n
difference % error flops/cycle
50
119
100
19
15.97
0.002868
100
223
200
23
10.31
0.007038
150
327
300
27
8.26
0.010591
200
431
400
31
7.19
0.013792
250
535
500
35
6.54
0.016720
300
639
600
39
6.10
0.019734
350
743
700
43
5.79
0.022139
400
851
800
51
5.99
0.025629
450
955
900
55
5.76
0.028769
500
1059
1000
59
5.57
0.032066
>>
NAME
PAPIMatrixVector.m
DESCRIPTION
Computes the product of a square matrix and
a vector of size n = 50 to
500 in steps of 50 using two different methods: the PAPI('flops') call,
and PAPI('start') / PAPI('stop'). Displays the observed number of
floating
point operations as compared to the theoretically predicted number.
Theory
predicts ops = 2*n^2. The results provide an indication of the overhead
incurred by MATLAB, and the Mflops acheived for each computation.
SOURCE
fprintf(1,'\nPAPI Matrix Vector
Multiply Test');
fprintf(1,'\nUsing the High Level PAPI("flops") call');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n^2',
'difference',
'% error', 'mflops')
for n=50:50:500,
a=rand(n);x=rand(n,1);
PAPI('stop'); % reset the counters to zero
PAPI('flops'); % start counting flops
b=a*x;
[count, mflops] = PAPI('flops'); % read the flops
data
fprintf(1,'%12d %12d %12d %12d %12.2f
%12.2f\n',n,count,2*n^2,count
- 2*n^2, (1.0 - ((2*n^2) / count)) * 100,mflops)
end
PAPI('stop');
fprintf(1,'\nPAPI Matrix Vector Multiply Test');
fprintf(1,'\nUsing PAPI start and stop');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n^2',
'difference',
'% error', 'flops/cycle')
for n=50:50:500,
a=rand(n);x=rand(n,1);
PAPI('start', 'PAPI_TOT_CYC', 'PAPI_FP_OPS');
c=a*x;
[cyc, ops] = PAPI('stop');
fprintf(1,'%12d %12d %12d %12d %12.2f
%12.6f\n',n,ops,2*n^2,ops
- 2*n^2, (1.0 - ((2*n^2) / ops)) * 100,ops/cyc)
end
RESULTS
The following were obtained on an 850 MHz
Pentium III running Windows
2000 and MATLAB 6.1. Your milage may vary.
>> PAPIMatrixVector
PAPI Matrix Vector Multiply Test
Using the High Level PAPI("flops") call
n
ops 2n^2
difference
% error mflops
50
5220
5000
220
4.21
68.18
100
20625
20000
625
3.03
183.13
150
45223
45000
223
0.49
61.86
200
80317
80000
317
0.39
56.50
250
125423
125000
423
0.34
57.77
300
180541
180000
541
0.30
58.14
350
245671
245000
671
0.27
55.00
400
320467
320000
467
0.15
51.79
450
405583
405000
583
0.14
45.92
500
500711
500000
711
0.14
42.08
PAPI Matrix Vector Multiply Test
Using PAPI start and stop
n
ops 2n^2
difference
% error flops/cycle
50
5220
5000
220
4.21
0.065863
100
20625
20000
625
3.03
0.202808
150
45223
45000
223
0.49
0.072082
200
80317
80000
317
0.39
0.064472
250
125423
125000
423
0.34
0.068027
300
180541
180000
541
0.30
0.068057
350
245671
245000
671
0.27
0.063880
400
320467
320000
467
0.15
0.059551
450
405583
405000
583
0.14
0.044170
500
500711
500000
711
0.14
0.048029
>>
NAME
PAPIMatrixMatrix.m
DESCRIPTION
Computes the product of two square matrices
of size n = 50 to 500 in
steps of 50 using two different methods: the PAPI('flops') call, and
PAPI('start')
/ PAPI('stop'). Displays the observed number of floating point
operations
as compared to the theoretically predicted number. Theory predicts ops
= 2*n^3. MATLAB uses an ATLAS optimized agorithm for peak performance
on
matrix-matrix multiplies. The bulk of the error indicated below is due
to that algorithm, which increases floating point performance while
adding
floating point operations to the theoretically predicted number.
SOURCE
fprintf(1,'\nPAPI Matrix Matrix Multiply
Test');
fprintf(1,'\nUsing the High Level PAPI("flops") call');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n^3',
'difference',
'% error', 'mflops')
for n=50:50:500,
a=rand(n);b=rand(n);c=rand(n);
PAPI('stop'); % reset the counters to zero
PAPI('flops'); % start counting flops
c=c+a*b;
[count, mflops] = PAPI('flops'); % read the flops
data
fprintf(1,'%12d %12d %12d %12d %12.2f
%12.2f\n',n,count,2*n^3,count
- 2*n^3, (1.0 - ((2*n^3) / count)) * 100,mflops)
end
PAPI('stop');
fprintf(1,'\nPAPI Matrix Matrix Multiply Test');
fprintf(1,'\nUsing PAPI start and stop');
fprintf(1,'\n%12s %12s %12s %12s %12s %12s\n', 'n', 'ops', '2n^3',
'difference',
'% error', 'flops/cycle')
for n=50:50:500,
a=rand(n);b=rand(n);c=rand(n);
PAPI('start', 'PAPI_TOT_CYC', 'PAPI_FP_OPS');
c=c+a*b;
[cyc, ops] = PAPI('stop');
fprintf(1,'%12d %12d %12d %12d %12.2f
%12.6f\n',n,ops,2*n^3,ops
- 2*n^3, (1.0 - ((2*n^3) / ops)) * 100,ops/cyc)
end
RESULTS
The following were obtained on an 850 MHz
Pentium III running Windows 2000 and MATLAB 6.1. Your milage may vary.
>> PAPIMatrixMatrix
PAPI Matrix Matrix Multiply Test
Using the High Level PAPI("flops") call
n
ops 2n^3
difference
% error mflops
50
258660
250000
8660
3.35
420.75
100
2039068
2000000
39068
1.92
479.00
150
6796006
6750000
46006
0.68
466.45
200
16082342
16000000
82342
0.51
500.56
250
31379542
31250000
129542
0.41
505.66
300
54187924
54000000
187924
0.35
437.01
350
86007456
85750000
257456
0.30
487.16
400
128320520
128000000
320520
0.25 423.99
450
182656272
182250000
406272
0.22 459.83
500
250503312
250000000
503312
0.20 453.29
PAPI Matrix Matrix Multiply Test
Using PAPI start and stop
n
ops 2n^3
difference
% error flops/cycle
50
258660
250000
8660
3.35
0.408925
100
2039104
2000000
39104
1.92
0.156549
150
6796006
6750000
46006
0.68
0.555426
200
16082400
16000000
82400
0.51
0.416697
250
31379640
31250000
129640
0.41
0.479208
300
54187826
54000000
187826
0.35
0.580607
350
86007732
85750000
257732
0.30
0.457513
400
128320260
128000000
320260
0.25 0.545727
450
182656419
182250000
406419
0.22 0.509754
500
250503204
250000000
503204
0.20 0.552343
>>
|
|