This document describes how to activate and use the new NetLogger-based performance tracing built into the pyGlobus toolkit.
Of course, you need to download and install pyGlobus.
You will also need to download and install the NetLogger Toolkit.
When building pyGlobus, you need to pass the flag -DNETLOGGER=1 to enable the fine-grained C logging.
You can do this by running setup.py
with options like the following, assuming that you
have defined the variable NLHOME
to be the location that you installed the NetLogger
library.
% CFLAGS=" -I$(NLHOME)/include -L$(NLHOME)/lib -lnllite -DNETLOGGER=1" python setup.py build --run-swig --with-modules=xio % python setup.py install --with-modules=xio
Two new levels for GLOBUS_XIO_DEBUG
(see the Globus XIO developer's guide, currently located
here) have
been defined:
Note that the symbolic names cannot be accessed from within C code.
PERF1 turns on tracing of the open/close/accept/etc. events that occur once per connection.
PERF2 turns on tracing of every read and write event.
These events are logged via NetLogger, which uses the environment variable NL_DEST
to specify the destination of logging events. You can set NL_DEST so that NetLogger sends events to a file,
or across the network via TCP or UDP to the NetLogger daemon, netlogd. See Appendix A
for details. For starters, you can just use a file like /tmp/foo.log
.
% export GLOBUS_XIO_DEBUG=384 ; # turn on both PERF1 and PERF2 % export NL_DEST=/tmp/foo.log
To turn off logging, just set GLOBUS_XIO_DEBUG
to a value that doesn't include the 128 and 256 bits.
If NL_DEST
is not set, but GLOBUS_XIO_DEBUG
includes a filename,
then NetLogger output will be sent to that file. Of course, if other debug output is also being sent to that file by the Globus library,
it will be mixed with the NetLogger output. This is probably not what you want, though may be fine for some debugging scenarios.
Because the NetLogger output is timestamped and well-structured, it can easily be parsed and graphed. The NetLogger Toolkit provides a lot of tools, mostly in Python, for this purpose.
Coming soon: Examples of how to do this!!
The overhead of the PERF1
tracing is negligible for any long session, since events are only logged
at the start and end of the connection. Since XIO is, to our knowledge, not used for very short-lived sessions, this overhead
is not worth worrying about.
The overhead of PERF2
, on the other hand, is well worth optimizing. The previous section on visualizing
NetLogger results showed (hopefully) how useful the read/write traces are, but of course nobody wants to pay a high price
in performance. There's good news and bad news here. The good news is that careful use of the NetLogger C API means that
the XIO performance penalty is around 5 percent (see graph below). The bad news is that for long transfers, there is still a data management problem
with respect to the log files, which due to the simple ASCII format are somewhere around 150 bytes per read or write. Assuming a
reasonable write/read size, like 64K, this means that you get about 2.5MB of logs per GB of data.
But, the good news about the bad news is that NetLogger can deal with this problem, in a couple of ways. First, the net
in NetLogger allows you to send the traces over the network and then use CPU resources somewhere other than the main data
host to process (reduce, filter, etc.) the data. Second, you can dynamically turn the detailed tracing on and off using the environment
variable NLCONFIGLVL
instead
/tmp/logfile
x-netlog://localhost
x-netlog://remote.host:1143?
x-netlog-udp://localhost
x-netlog-udp://remote.host:1143?
These numbers are measured on a Gigabit Ethernet connection on a LAN, which is more-or-less the worst case scenario for trying to minimize logging overhead; a nice, slow 100Mb/s wide-area connection would probably allow us to wave our hands about how much variation there is anyways. But even in this case, the numbers aren't bad. The graph below shows the results of 19 runs (20, but first was thrown out to get rid of startup artifacts) in one of four modes; the "+NetLogger" refers, of course, to both connection and read/write events (on both server and client). Each dot represents a transfer. This is not to be taken as a good benchmark of XIO per-se, but rather as a representation of the rough upper-bound on overhead added by detailed tracing.
Here are the mean/stddev values associated with the data above:
Implementation | mean Mb/s | stdev |
C XIO | 848 | 16.94 |
Python sockets | 774 | 42.96 |
Python XIO | 771 | 62.30 |
Python XIO + NetLogger | 721 | 45.23 |
Without attempting a detailed analysis, it is obvious that the amount of overhead added by NetLogger is at worst not too much larger than the variation between runs. This agrees with the visual impression of the dotplot above.