pyGlobus: Collecting XIO Performance Numbers

Introduction

This document describes how to activate and use the new NetLogger-based performance tracing built into the pyGlobus toolkit.

Prerequisites

Of course, you need to download and install pyGlobus.

You will also need to download and install the NetLogger Toolkit.

Build

When building pyGlobus, you need to pass the flag -DNETLOGGER=1 to enable the fine-grained C logging. You can do this by running setup.py with options like the following, assuming that you have defined the variable NLHOME to be the location that you installed the NetLogger library.

%  CFLAGS=" -I$(NLHOME)/include -L$(NLHOME)/lib -lnllite -DNETLOGGER=1" python setup.py build --run-swig --with-modules=xio
%  python setup.py install --with-modules=xio

Activate and de-activate tracing

Two new levels for GLOBUS_XIO_DEBUG (see the Globus XIO developer's guide, currently located here) have been defined:

Note that the symbolic names cannot be accessed from within C code.

PERF1 turns on tracing of the open/close/accept/etc. events that occur once per connection.

PERF2 turns on tracing of every read and write event.

These events are logged via NetLogger, which uses the environment variable NL_DEST to specify the destination of logging events. You can set NL_DEST so that NetLogger sends events to a file, or across the network via TCP or UDP to the NetLogger daemon, netlogd. See Appendix A for details. For starters, you can just use a file like /tmp/foo.log.

%  export GLOBUS_XIO_DEBUG=384 ; # turn on both PERF1 and PERF2
%  export NL_DEST=/tmp/foo.log

To turn off logging, just set GLOBUS_XIO_DEBUG to a value that doesn't include the 128 and 256 bits.

If NL_DEST is not set, but GLOBUS_XIO_DEBUG includes a filename, then NetLogger output will be sent to that file. Of course, if other debug output is also being sent to that file by the Globus library, it will be mixed with the NetLogger output. This is probably not what you want, though may be fine for some debugging scenarios.

Log analysis

Because the NetLogger output is timestamped and well-structured, it can easily be parsed and graphed. The NetLogger Toolkit provides a lot of tools, mostly in Python, for this purpose.

Coming soon: Examples of how to do this!!

Monitoring overhead

The overhead of the PERF1 tracing is negligible for any long session, since events are only logged at the start and end of the connection. Since XIO is, to our knowledge, not used for very short-lived sessions, this overhead is not worth worrying about.

The overhead of PERF2, on the other hand, is well worth optimizing. The previous section on visualizing NetLogger results showed (hopefully) how useful the read/write traces are, but of course nobody wants to pay a high price in performance. There's good news and bad news here. The good news is that careful use of the NetLogger C API means that the XIO performance penalty is around 5 percent (see graph below). The bad news is that for long transfers, there is still a data management problem with respect to the log files, which due to the simple ASCII format are somewhere around 150 bytes per read or write. Assuming a reasonable write/read size, like 64K, this means that you get about 2.5MB of logs per GB of data.

But, the good news about the bad news is that NetLogger can deal with this problem, in a couple of ways. First, the net in NetLogger allows you to send the traces over the network and then use CPU resources somewhere other than the main data host to process (reduce, filter, etc.) the data. Second, you can dynamically turn the detailed tracing on and off using the environment variable NLCONFIGLVL instead




 

Appendix A: NetLogger URL format

File url
Syntax: [file:][/][path]

Examples:
TCP url
Syntax: x-netlog://host[:port]

Examples:
UDP url (Python,Perl,C)
Syntax: x-netlog-udp://host[:port]

Examples:
 

Appendix B: NetLogger Overhead

These numbers are measured on a Gigabit Ethernet connection on a LAN, which is more-or-less the worst case scenario for trying to minimize logging overhead; a nice, slow 100Mb/s wide-area connection would probably allow us to wave our hands about how much variation there is anyways. But even in this case, the numbers aren't bad. The graph below shows the results of 19 runs (20, but first was thrown out to get rid of startup artifacts) in one of four modes; the "+NetLogger" refers, of course, to both connection and read/write events (on both server and client). Each dot represents a transfer. This is not to be taken as a good benchmark of XIO per-se, but rather as a representation of the rough upper-bound on overhead added by detailed tracing.

Here are the mean/stddev values associated with the data above:

Implementationmean Mb/sstdev
C XIO84816.94
Python sockets77442.96
Python XIO77162.30
Python XIO + NetLogger72145.23

Without attempting a detailed analysis, it is obvious that the amount of overhead added by NetLogger is at worst not too much larger than the variation between runs. This agrees with the visual impression of the dotplot above.