SiLK Analysis Suite: Installation Handbook

CERTŪ Network Situational Awareness


Document Overview

This document is a summary of configuring and installing the SiLK Analysis Suite. This document also describes setting up the rwflowpack daemon and its control script, rwfpd. This document is intended for individuals comfortable with the following tasks:


Outline

This document is broken into the following sections:

  1. Document Overview (this section)

  2. Software Inventory: A summary of the software covered and installed using this document

  3. Packing And Filesystem Overview

  4. Configuration and Installation


General Disclaimer

The SiLK analysis suite is experimental software, not production software. While we have made efforts to ensure functionality and ease of use, the CERT makes no promises as regards functionality and reliability.


Software Inventory

The software can be logically divided into two parts:

  1. The packing system--the tools that collect, pack, and store the IP-header data contained in NetFlow records. These tools are run by the system administrator and/or by system startup.

  2. The analysis tools--applications and utilities used by the analysts that read the packed data and produce some output.


Packing Tools List

The packing system tools are listed here, and they are discussed in detail in below.

rwflowpack. Reads NetFlow data and writes packed binary files.

rwfpd. Shell script wrapper to run rwflowpack


Analysis Tools List

Most of the analysis tools are detailed in the Analysts' Handbook and in manual pages; giving the --help option to a tool will give you usage information. For completeness, we give a one-line description of each tool here.

rwfilter. Chooses in which packed binary files to look for flow records, reads those files, and chooses which flow records to process.

rwsort. Sorts records based on a given set of fields.

rwcut. Prints selected fields in delimited human-readable form.

rwcount. Prints traffic summaries across time.

rwuniq. Counts flows, packets, and/or bytes based on a user-defined set of fields.

rwaddrcount. Counts bytes, packets, and flows by source or destination IP address.

rwtotal. Counts bytes, packets, and flows based on one of a set of pre-defined key fields.

rwgroup. Groups multiple records together with a common tag based on fields chosen by the user.

rwmatch. Matches records from two files together into a common stream based on fields chosen by the user.

rwpmatch. Uses a SiLK Flow file to filter the contents of a tcpdump file.

rwstats. Computes a variety of different summary statistics.

rwset. Generates a binary IP-set file containing the unique IP addresses.

rwbag. Generates a binary bag file containing unique IP addresses and a count of bytes, packets, or flows.

readset. Prints the IP addresses in a binary IP-set file.

buildset. Creates a binary IP-set file from a text file.

setintersect. Performs a set intersection of IP-set files.

rwset-union. Performs a union of IP-set files.

rwbagbuild. Creates a bag file from an IP-set file or a text file.

rwbagcat. Prints the contents of a bag file.

rwbagtool. Performs high-level operations on a bag file.

num2dot. Converts integer IP addresses in rwcut's textual output to dotted decimal format.

rwappend. Appends new SiLK Flow records in one or more packed binary files to an existing packed binary file.

rwcat. Concatenates packed binary files into a single stream.

rwfileinfo. Prints information about a SiLK binary file.

rwptoflow. Create a single-packet SiLK flow record for every record in a tcpdump file.

rwfglob. Prints the filenames that rwfilter's file selection options ("fglob" options) will access.

rwswapbytes. Changes the endianness of a packed binary file.

mapsid. Maps between sensor name and sensor number.


Packing And Filesystem Overview

NetFlow records contain information taken from the IP packet headers for the traffic the flow represents, including source and destination IP addresses, IP protocol, and source and destination ports for TCP and UDP. Each record also includes fields that describe the starting and ending times of the flow. All of this information is encoded in fields in the records that make up a data file in the SiLK file system, and is also used to determine how the data should be stored.

In addition to the familiar IP header fields, every NetFlow record also includes information about how the flow was routed, including the next-hop IP, and the SNMP indexes for the incoming and outgoing interfaces on the router. This routing information is used as the basis for categorizing every flow according to whether it represents traffic that is "incoming" or "outgoing" with respect to the network, or set of networks, being monitored by an organization. Data in the SiLK file system is segregated according to this categorization.

These instructions assume that you are enabling NetFlow on a border router: a router connected to an ISP or other peering point for your network. Every flow reported by a border router either represents traffic entering the network or leaving the network. It should be emphasized that a flow record represents traffic in a single direction. This means, for example, that every TCP connection across the border will result in at least two distinct flows, one for each side of the conversation.

It's important to keep in mind that the frame of reference for "outgoing" or "incoming" is always the monitored network. The fact that every flow is also both incoming and outgoing with respect to the router can be confusing. When we label traffic as incoming or outgoing it is always with respect to the network being monitored, not a router. Every flow reported by a router is marked with an input SNMP index (the interface through which the flow entered the router) and the output index (how the flow left the router), and we can determine whether a flow is entering the network based on these values.

Based on the flow itself and on configuration information provided by the user, rwflowpack examines each NetFlow record to determine how and where to store it.

First, rwflowpack designates a NetFlow record as "incoming" (entering the organization) or "outgoing" (leaving the organization). rwflowpack requires the user to specify the incoming indexes; these are the SNMP interfaces on the router that connect to an organization's ISP(s) and face the Internet---traffic coming into the organization enters the router on these interfaces. rwflowpack labels a record as "incoming" when the incoming (entering the router) SNMP interface listed on the record matches the "incoming SNMP index" list. This list of numbers is provided by the person or script invoking rwflowpack and generally corresponds to the interfaces connected to an organization's ISP. All other traffic is labeled "outgoing".

Next, rwflowpack will determine if the packets represented by the NetFlow record left the router by looking at the outgoing (leaving the router) SNMP interface. When the interface number matches the "null interface" value given by the user, the packets are considered "not-routed"; otherwise the packets are "routed". The default value for the null interface is 0, which is the SNMP interface Cisco uses for packets that do not leave the router---either because they were for the router itself (e.g., part of a routing protocol like BGP) or because they were blocked by the access control lists (ACLs) on the router.

Finally, for a routed flow record, the IP protocol and source and destination ports are examined. When the protocol is 6 (TCP) and either side of the conversation used port 80 (http), 443 (https), or 8080 (http-alt), the flow is considered a "routed-web" record. Because of the known IP protocol and limited number of web-server-side ports, "routed-web" traffic can be packed in a smaller record. Although there is only a couple of bytes of savings per record, on most networks the volume of flow records and the high percentage of those flows that are web traffic (or that masquerade as web traffic) allow those couple of bytes to add up to substantial savings over the course a day.

Routed traffic that is not-TCP or does not use one of the well-established HTTP ports is considered "routed-non-web" traffic. There is certainly no guarantee that "routed-web" traffic is entirely web-based nor that there is no web-traffic in "routed-non-web"; the web/non-web split is heuristic that gets it right most of the time.

These tests allow rwflowpack to support six traffic types; each type is stored in a different subdirectory, as described in the next section.


Filesystem

The packing system records all data in flat files written by rwflowpack. While users can access the flat files directly, the files written by rwflowpack are intended to be read using the "Data Selection" switches of rwfilter as described in the Analysts' Handbook. When a user specifies these selection switches, the tool converts the parameters into a set of filenames, which are then read sequentially by rwfilter.

A primary configuration issue when installing the Analysis Suite is having the packing system write the data to the location where rwfilter expects to find it. The directory rwflowpack writes to is determined at run time, but rwfilter uses a directory root set at compile time. This directory's value, dataRootdir, is given as the --with-data-rootdir switch to the configure command, as described in the configuration section below.


Directory hierarchy

The directory root for the packed data files is dataRootdir, underneath dataRootdir are six directories, each corresponding to a type of traffic as described above.

dataRootdir/innull. Flows for incoming not-routed traffic

dataRootdir/inweb. Flows for incoming routed-web traffic

dataRootdir/in. Flows for incoming routed-non-web traffic

dataRootdir/outnull. Flows for outgoing not-routed traffic

dataRootdir/outweb. Flows for outgoing routed-web traffic

dataRootdir/out. Flows for outgoing routed-non-web traffic

Under each of these directories are date directories, in the form YYYY/MM/DD. For example, output web files for October 4th, 2003 are recorded in dataRootdir/outweb/2003/10/04/

Under each date directory are the packed binary files, one per hour per rwflowpack instance. Note that the date and hour are based on UTC time, not the on local time. Filenames include the date and type information, and are written in the form: flowType-sensorName_YYYYMMDD.HH

The flowType in the file name will correspond to the directroy name, and will be one of

innull. incoming not-routed

iw. incoming routed-web

in. incoming routed-non-web

outnull. outgoing not-routed

ow. outgoing routed-web

out. outgoing routed-non-web

The sensorName is a command line option to rwflowpack. When multiple instances of rwflowpack write to a single shared filesystem, the sensorName is used to avoid file name collisions. The name should be a 1- to 24-character string that begins with an uppercase letter; it must not contain an underscore (_), a slash (/), or whitespace. The sensorName can be a mnemonic for the sensor location, the router, the ISP, etc. If you are not feeling creative, you can use the form Sn, i.e., S0, S1, ..., S254. (The tools as distributed allow for 15 sensors with names S0, S1, ..., S14.) Currently, the sensorNames and their sensorIDs are compiled into the tools; to enable the analysis tools to filter by sensorID or sensorName, you need to coordinate these sensorNames with the values in the sensorInfo[] array in the silktools/src/include/silk_site_generic.h source file. The software will work without this addition, but you will not be able to filter by sensorID.


Packing Tools

rwflowpack

The standard packing tool is rwflowpack, which listens on a UDP port for NetFlow PDUs from a router, categorizes the data into one of six types, and stores the data in packed flat files. For details on the arguments to rwflowpack, see its man page that is an appendix to this document.

In typical use, rwflowpack is a daemon, so once invoked it becomes a background process; a process ID file will be written to the log directory, but after that, rwflowpack is largely untouchable except through an explicit kill signal.


rwfpd

To provide easier use in UNIX-like environments, the rwflowpack package includes a daemon script, rwfpd. rwfpd can be added to a machine's boot sequence to invoke rwflowpack automatically. It is invoked as:

rwfpd {start | stop | restart | status}

start. starts a daemon

stop. stops the instance indicated by the pid-file

restart. refreshes a daemon

status. provides a status message as to whether the daemon is running


Configuration and Installation

Installation is a four-step process. These steps are:

  1. Configure the router.

  2. Configure the machine.

  3. Configure the software.

  4. Compile and install the tools.


Configure the Router

You will need to do these steps for each router that you wish to instrument.

The timestamps on the NetFlow records will be based on the timestamps received from the router. We suggest using ntp to minimize drift in the router's clock; you can do this with the Cisco IOS command


          ntp server [ip-address]
        
The host and port number to which the router will send the NetFlow PDUs is given by the command

          ip flow-export [ip-address] [port]
        
If you are configuring multiple routers, you'll need to use a unique port number for each router. The command

          ip flow-export version 5
        
will make certain the router exports NetFlow version 5 records, which our tools require. Our software assumes no flow records are longer than 34 minutes (30 minutes is the default for CISCO---this means a long TCP session will be broken across multiple flow records). To set the active timeout on your CISCO router, use the IOS command:

          ip flow-cache active-timeout 30
        

When the router is rebooted, it can reassign the SNMP interface numbers; obviously this creates a problem, as the SNMP interface that was facing the Internet could now be facing your organization, resulting in the incoming and outgoing flows be reversed. To get around this problem, you can tell the router to use persistent settings for the interface numbers. The easiest solution is to enable global persistence, through the ISO command


          snmp-server ifindex persist
        
and then save the configuration with the EXEC mode command

          copy running-config startup-config
        
See the following link for more information on IfIndex persistence, including instructions on setting persistence on an interface-by-interface basis: http://www.cisco.com/univercd/cc/td/doc/product/software/ios121/121newft/121t/121t5/dt5ifidx.htm.

To enable NetFlow, issue the IOS command


          ip route-cache flow
        


Configure the Machine

Adjust Maximum Socket Buffer Size

NetFlow traffic tends to be "bursty": when you make an HTTP request several servers may respond feeding you pages, images, and ads. To handle this, it is important for the machine that receives the NetFlow packets to have a large socket buffer. rwflowpack will attempt to increase its socket buffer to 8MB; if that fails, it will back-off until it finds a socket buffer size that the kernel will allow.

To increase the allowable maximum socket buffer size on a running Linux system:


            echo 8388608 > /proc/sys/net/core/rmem_max
          
On a running Solaris box, issue

            ndd -set /dev/udp udp_max_buf 8388608
          

Those lines may be added to the system's start-up sequence (e.g., /etc/rc2.d/S99ndd) to make the change persistent across reboots.


Directory Hierarchy

There are three directory roots to be considered:

DATA_ROOTDIR. The directory where files are written by rwflowpack and where the analysis applications expect to find the data.

PACKER_HOME. The directory for maintaining the packer itself, that is the packer application and its logfiles.

SILK_PATH. The root directory for the SILK suite; there should be two directories under SILK_PATH:

  • ${SILK_PATH}/bin

  • ${SILK_PATH}/lib

This can be a well-known location, e.g. /usr/local.

Once these locations are decided, create the directories if required.


Configuring the Source

Determine the names you want to use for the sensors. These could reflect the ISP the router connects to or the location of the router, or they could be completely arbitrary. The names can be 1 to 24 characters in length; they must begin with a capital letter, and may not contain an underscore, a slash, or whitespace.

Open the file silktools/src/include/silk_site_generic.h in an editor, and modify the section that reads:


          #define SENSOR_COUNT    15
          #ifdef DECLARE_SENSORID_VARIABLES
          uint32_t numSensors=SENSOR_COUNT;

          /*  sensorinfo_t is defined in silk_site.h
           *
           *      typedef struct {
           *        char *sensorName;
           *        uint8_t numClasses;
           *        uint8_t classNumbers[MAX_CLASSES];
           *      } sensorinfo_t;
           */
          sensorinfo_t sensorInfo[SENSOR_COUNT+1] = {
            /*   0 */ { "S0",   1, {DEFAULT_CLASS, 0} },
            /*   1 */ { "S1",   1, {DEFAULT_CLASS, 0} },
            /*   2 */ { "S2",   1, {DEFAULT_CLASS, 0} },
            /*   3 */ { "S3",   1, {DEFAULT_CLASS, 0} },
            /*   4 */ { "S4",   1, {DEFAULT_CLASS, 0} },
            /*   5 */ { "S5",   1, {DEFAULT_CLASS, 0} },
            /*   6 */ { "S6",   1, {DEFAULT_CLASS, 0} },
            /*   7 */ { "S7",   1, {DEFAULT_CLASS, 0} },
            /*   8 */ { "S8",   1, {DEFAULT_CLASS, 0} },
            /*   9 */ { "S9",   1, {DEFAULT_CLASS, 0} },
            /*  10 */ { "S10",  1, {DEFAULT_CLASS, 0} },
            /*  11 */ { "S11",  1, {DEFAULT_CLASS, 0} },
            /*  12 */ { "S12",  1, {DEFAULT_CLASS, 0} },
            /*  13 */ { "S13",  1, {DEFAULT_CLASS, 0} },
            /*  14 */ { "S14",  1, {DEFAULT_CLASS, 0} },

          #if 0
        
replacing each Sn with the names you have choosen. Remove the lines you don't need, and adjust the line that defines SENSOR_COUNT so it matches the number of sensors you have. For example, if I had two routers "Primary" and "Secondary", I would edit the code to read:

          #define SENSOR_COUNT    2
          #ifdef DECLARE_SENSORID_VARIABLES
          uint32_t numSensors=SENSOR_COUNT;

          /*  sensorinfo_t is defined in silk_site.h
           *
           *      typedef struct {
           *        char *sensorName;
           *        uint8_t numClasses;
           *        uint8_t classNumbers[MAX_CLASSES];
           *      } sensorinfo_t;
           */
          sensorinfo_t sensorInfo[SENSOR_COUNT+1] = {
            /*   0 */ { "Primary",     1, {DEFAULT_CLASS, 0} },
            /*   1 */ { "Secondary",   1, {DEFAULT_CLASS, 0} },

          #if 0
        

At the top of the source tree in the silktools directory, issue the command


        ./configure --with-data-rootdir=DATA_ROOTDIR
      

where DATA_ROOTDIR is the directory where the data files will reside. configure will run some tests on your platform and create the files silktools/src/include/silk_config.h and silktools/scripts/build/skconfig.mk containing the results of these tests.


Compiling and Installing The Tools

To compile the tools, you will need to the GNU version of the make program. Type make (or gmake on most *BSD machines) in the silktools directory. This will make the analysis tools, utilities, and the rwflowpack binary.

If that succeeds, install the analysis tools in their final location:


          cp ./bin/* $SILK_PATH/bin
          rm $SILK_PATH/bin/rwflowpack
          cp ./lib/*.so $SILK_PATH/lib
        

Install rwflowpack in its final location:


          cp ./bin/rwflowpack       $PACKER_HOME/bin
        


Building and Installing the Country Code Library

The SiLK analysis suite provides a shared library that uses a data file to map IP addresses to countries. With the shared library and data file in place, rwfilter allows the user to filter by country, and rwcut can display the country.

The shared library is built as part of the standard installation sequence. The data file is based on the GeoIP Country database distributed by MaxMind (http://www.maxmind.com/). We do not distribute the data file, but we provide Perl scrips that will convert the GeoIP data files to the format that the shared library requires.

MaxMind distributes multiple versions of their GeoIP Country database; one is a free evaluation copy that is "97% accurate". In addition, they sell versions with higher accuracy, and they offer various subscription services.

Obtain your copy of the MaxMind GeoIP Country database, either the binary version or the comma separated value version. To create the SiLK-specific datafile, use either:

  • For the binary data format:

    
mkdir share
    src/util/ccfilter_new/geoip-to-silk.pl < GeoIP.dat > share/country_codes.pmap
                

  • For the CSV version:

    
mkdir share
    src/util/ccfilter_new/geoip-csv-to-silk.pl < GeoIPCountryWhois.csv > share/country_codes.pmap
                

Copy the file country_codes.pmap into the share directory.


Configure rwfpd

rwfpd is an sh script that controls rwflowpack. Since you must run one instance of rwflowpack per sensor, you will need to create one rwfpd per sensor. If I have routers "Primary" and "Secondary," I may want to create scripts rwfpd-primary and rwfpd-secondary to know which rwfpd* script controls which sensor.

Near the top of the rwfpd script are the values you need to set to configure it for your site:


          SENSOR=S0
  
          SENSOR_CONFIG=
  
          NETFLOW_PORT=
          INPUT_INDEX=
          NULL_INTERFACE=0
  
          USER=`whoami`
          CONTACT=
  
          PACKER_HOME=`echo ~silk`
          PACKER_BIN=${PACKER_HOME}/bin
          DATA_ROOTDIR=${PACKER_HOME}/data
          LOG_DIR=${PACKER_HOME}/log
  
          PATH=/bin:/usr/bin
        

Some configuration of rwflowpack must happen via command line switches: these are values such as the data directory and logging directory. Other configure can either come from command line switches or from a configuration file. For a single sensor, the command-line options are easier; for a site with many sensors, the configuration file may be easier.

The first five values in the rwfpd script depend on whether the configuration is by command line switches or by configuration file. The remaining values will exist for all installations.


Configuring rwflowpack via command line switches

For the command line configuration of rwflowpack, set the following values in the rwfpd script(s):

SENSOR. Name of sensor; used to generate unique file names under DATA_ROOTDIR. Corresponds to the --sensor-name argument to rwflowpack. Each rwfpd* script must be given a unique value to avoid file name collisions. These names should match the names specified in the sensorInfo[] array in the silktools/src/include/silk_site_generic.h source file. If the name given here does not exist in the sensorInfo[] array, rwflowpack will warn you that you are packing data for an unknown sensor. rwflowpack will continue to collect and pack data for this unknown sensor; however, rwfilter will not find the data files for this sensor until the sensorInfo[] array is updated and the tools are recompiled.

NETFLOW_PORT. UDP port on which to listen for NetFlow packets. Corresponds to the --netflow-port argument to rwflowpack. This needs to match the port you specified when configuring the router with the ip flow-export command. This port must be unique for each rwfpd* script, and make certain that you keep straight which router is talking on which port.

INPUT_INDEX. SNMP indexes of the router interface(s) that face the Internet. Corresponds to the --in-index argument to rwflowpack. Should be a comma-separated list of numbers containing no spaces, for example 1,2. To determine this information, you can use the snmpwalk command, or you can use the SiLK tools to capture data, and analyze the results. We discuss this latter approach below.

NULL_INTERFACE. The router will set the output interface SNMP index to this value to denote a non-routed flow. The default is 0, which is the value Cisco uses to denote a null flow. Corresponds to the --null-interface argument to rwflowpack.


Configuring rwflowpack via configuration file

Instead of getting the values of the netflow-port, in-index, and null-interface via the command line switches, rwflowpack can get these values from a configuration file. To do this, set the value of the SENSOR_CONFIG variable in the rwfpd script to the path to the configuration file. The full syntax of the configuration file is given as an appendix to this document.

The most simple configuration file is:


            sensor-probe S0
              listen-on-port p
              input-index n1,n2,...
          

This block specifies that NetFlow data collected on port p belongs to the sensor "S0". The syntax is simply key-value pairs on each line, where the key are value are separated by whitespace. Multiple values are separated by commas. Blank lines are ignored; comments begin with '#' and continue to the end of the line.

The sensor-probe value is the name of the sensor for which data is being collected and packed; it should be alpha-numeric string of 1 to 24 characters; it must start with an upper-case letter, and it may not contain an underscore, a slash, nor whitespace.

The listen-on-port value tells rwflowpack which port to bind() to in order to collect NetFlow data. This is the same as the NETFLOW_PORT value in the previous subsection. This is required.

The input-index value corresponds to the --in-index to rwflowpack and to INPUT_INDEX value above. It tells rwflowpack which SNMP interfaces on the router face the Internet, i.e., are connected to your organization's ISP(s). This is required.

Two other parameters that may be of use are:


              null-interface 0
              accept-from-host addr
          

The null-interface value is 0 by default. It should be set to value your router uses for a flow record that did not leave the router.

The accept-from-host parameter expects a host address as its argument. The address, when present, is the IP of the host from which rwflowpack will accept incoming NetFlow packets. When not present, rwflowpack will accept packets from any host. This option has no command line equivalent.

Multiple sensor-probe blocks may be present in a single configuration file; however, there must be one rwflowpack invocation per sensor-probe since rwflowpack currently is not able to listen on multiple ports simultaneously. If multiple sensor-probes are given, the --sensor-name switch (rwfpd's SENSOR_NAME value) must be specified so that rwflowpack knows for which sensor-probe it is to collect data.

The --sensor-name switch is optional when a sensor configuration is specified that contains a single sensor-probe block. rwflowpack uses the value specified in the --sensor-name when generating the names for its log and PID files; when running multiple rwflowpack instances, the --sensor-name switch is recommended to avoid collisions in the names of these files.


Directory specifications

Regardless of whether command line switches or a configuration file is used to configure rwflowpack, the following values need to be passed to rwflowpack's command line; they must be specified in the rwfpd script.

USER. User running this script. The script will attempt to su to this user when starting rwflowpack.

CONTACT. Currently ignored; person(s) to e-mail in case of problems. Should be a comma-separated list of email addresses, containing no spaces.

PACKER_HOME. Setting this is not required; it is simply a convenience variable for setting PACKER_BIN, DATA_ROOTDIR, and LOG_DIR variables.

PACKER_BIN. Directory containing the rwflowpack binary.

DATA_ROOTDIR. Root directory for packed data files; does not have to be subdirectory of PACKER_HOME. Corresponds to the --root-directory argument to rwflowpack. The files generated by each rwflowpack instance will have unique names, so this directory can be common among all rwfpd* scripts.

LOG_DIR. Directory in which to write logging files. Corresponds to the --log-directory argument to rwflowpack. Each rwflowpack invocation uses the value passed to the --sensor-name switch as part of its log file name; if you give that option to each rwflowpack, this directory can be common among all rwfpd* scripts.

PATH. Standard shell executable search path. In general, a more restricted PATH is better.

Once the rwfpd* script(s) are configured, copy it/them into the $PACKER_HOME/bin directory and start them:


            cp rwfpd    $PACKER_HOME/bin
            rwfpd start
          


Determining Incoming Interfaces

rwflowpack needs to know which of the router's interfaces are the incoming interfaces, i.e., which interface(s) face the Internet. One way to determine this information is to use the SILK tools to collect data that includes the SNMP interface numbers and compare the source and destination IP addresses with the IP addresses of your organization.

First, create an IP-set of your organization's address space. To do this, list the CIDR blocks that belong to your organization in a text file, one CIDR address per line, and save this file as myips.txt. If my address space is 192.168.0.0/16, I could do


          echo "192.168.0.0/16" > myips.txt
        
To convert the text listing to a binary file, issue the command

          buildset myips.txt myips.set
        
The file myips.set is a binary representation of your address space. You can use the readset command to list the contents of the file, though beware that the output is one address (i.e., /32) per line, so there can be a lot of output. Supplying the --print-statistics should produce some useful output for sanity checking the IP-set file. If my network is 192.168.0.0/16, I will see:

          $ bin/readset --print-stat myips.set
          myips.set: 65536 IPs
                  minimumIP = 192.168.0.0
                  maximumIP = 192.168.255.255
                  count of /8's  = 1 (0.390625 %)
                  count of /16's = 1 (0.001526 %)
                  count of /24's = 256 (0.001526 %)
                  count of /27's = 2048 (100.000000 %)
        

In order to identify which interfaces are connected to your ISP, you can configure rwflowpack to classify all data as outgoing then determine what subset of records actually represent incoming traffic from ISP-facing interfaces by looking at source and destination IP addresses. You'll do this by configuring the rwfpd script(s) to use the null interface as its incoming SNMP interface; assuming a Cisco router, this value is 0. You may need to use a different value on a router from another vendor.


          INPUT_INDEX=0
        
Since no packets should enter the router on that interface, no traffic will be considered "incoming", and all traffic will appear in the out, outweb, and outnull directories.

You need to tell rwflowpack that it should include the SNMP interface numbers in the files it creates. Add the option --pack-interfaces to the invocation of rwflowpack by modifying the rwfpd script(s). Below the main configuration section is the line:


          EXTRA_OPTIONS=
        
Modify it to read:

          EXTRA_OPTIONS=--pack-interfaces
        

Start the rwfpd script(s) and allow it to collect data.


          rwfpd start
        
You should see data appearing in the files dataRootdir/out*/*/*/*/*. For example, outgoing routed-web traffic at 2:14pm EDT on October 4, 2003, from sensor "Primary" will be in dataRootdir/outweb/2004/10/04/ow-Primary_2003104.18. If data does not appear, try browsing the web. If you still do not see data, make certain you've correctly configured your router(s) to generate NetFlow records.

If you data files but they are empty, just be patient. rwflowpack uses buffered input/ouput, which may hold records in memory. Data is flushed to disk every five minutes and on shutdown.

To find incoming traffic, you want to select all records where where the source IP is outside your organization's address space and the destination IP is inside the address space. To select records, use the rwfilter command; the --not-sipset and --dipset switches do the IP address filtering; use the --type switch to select the outgoing-traffic files (remember that all traffic is being written to those files, and by default rwfilter looks at the files for incoming routed data), and the --pass-output switch to direct the records that pass these IP filters to the standard output. For the records that pass the filter, you want to know which SNMP interfaces they passed through in the border router(s). To get this information, run the rwuniq command, and select the fields containing the sensor and input and output SNMP indexes as the key.


          rwfilter --not-sipset=myips.set --dipset=myips.set \
            --type=out,outweb,outnull --pass=stdout \
            | rwuniq --fields=12-14
        

This will produce something similar to:


                      sr| in|out|     count
                 Primary|  1|  2|     25139
                 Primary|  1|  4|         8
                 Primary|  1|  3|        80
               Secondary|  8|  3|      4309
        
where "Primary" and "Secondary" are the names you assigned to the sensors (routers). From this output, you can see that SNMP interface "1" on the router named "Primary" is the incoming interface, and interface "8" on "Secondary" is incoming. Note that a router connected to multiple ISPs will have multiple input interfaces.

Stop the current rwfpd script(s), and remove the existing data files.


          rwfpd stop
          find dataRootdir/out* -type f -print | xargs rm
        
In each rwfpd script, set the INPUT_INDEX value to the appropriate value. You can remove the --pack-interfaces switch from the EXTRA_OPTIONS line, or you can leave it there (at the expense of 6 to 8 extra bytes per NetFlow record).


rwflowpack man page


NAME
      rwflowpack - Stores NetFlow v5 PDUs in packed binary files

SYNOPSIS
      To use a configuration file to configure rwflowpack to read
      NetFlow records, use:

      rwflowpack --log-directory=<logDir>
            [--log-pathname=<logPathname>] [--pack-interfaces]
            [--no-daemon] [--byte-order=<endian>]
            --root-directory=<dataRootDir>
            --sensor-configuration=<config-file>
            [--sensor-name=<sensorname>]

      To use command line arguments to configure rwflowpack to read
      NetFlow records, use:

      rwflowpack --log-directory=<logDir>
            [--log-pathname=<logPathname>] [--pack-interfaces]
            [--no-daemon] [--byte-order=<endian>]
            --root-directory=<dataRootDir>
            { --netflow-port=<port>
              | --netflow-file=<pathname> }
            --sensor-name=<sensorname>
            [--class=<class-name>] --in-index=<list>
            [--null-interface=<index>]


DESCRIPTION
      rwflowpack is a daemon which collects NetFlow V5 packets and
      packs the data into hourly input/output records organized in a
      time-based directory structure as described in FILES.

      NetFlow records contain information found in the IP header
      (source and destination IP addresses, IP protocol, source and
      destination ports for TCP and UDP, etc), the start and ending
      times of the flow, as well as routing information (the next-hop
      IP and the incoming and outgoing SNMP interfaces on the router
      where the flow entered and left).

      When speaking of "incoming" and "outgoing", there are two frames
      of reference that can cause confusion.  Assuming a router on the
      border of an organization, there is incoming and outgoing
      traffic---traffic that enters and leaves the organization.
      Additionally, there is traffic that enters and leaves the
      router; incoming traffic from the router's perspective.

      NetFlow data is strictly one-way: a TCP conversation across the
      organization's border results in two sets of flow records---one
      for each side of the conversation.  A NetFlow record is
      generated for incoming packets---packets that enter the router
      on an SNMP interface index; each record also has an outgoing
      SNMP interface.  NetFlow records that describe traffic that is
      entering the organization (incoming traffic) will have an
      outgoing SNMP index saying on which interface the packets left
      the router; likewise, traffic leaving the organization (outgoing
      traffic) enters router and has an incoming SNMP interface.

      Based on the flow itself and on switches provided by the user,
      rwflowpack examines each NetFlow record to determine how and
      where to store it.

      First, rwflowpack designates a NetFlow record as "incoming"
      (entering the organization) or "outgoing" (leaving the
      organization).  rwflowpack requires the user to use the
      --in-index switch to specify the incoming indexes; these are the
      SNMP interfaces on the router that connect to an organization's
      ISP(s) and face the Internet---traffic coming into the
      organization enters the router on these interfaces.  rwflowpack
      labels a record as "incoming" when the incoming (entering the
      router) SNMP interface listed on the record matches one of the
      numbers listed in the --in-index switch; all other traffic is
      labeled "outgoing".

      Next, rwflowpack will determine if the packets represented by
      the NetFlow record left the router by looking at the outgoing
      (leaving the router) SNMP interface.  When the interface number
      matches the value given by the --null-interface switch, the
      packets are considered "not-routed"; otherwise the packets are
      "routed".  The default value for --null-interface is 0, which is
      the SNMP interface Cisco uses for packets that do not leave the
      router---either because they were for the router itself (e.g.,
      part of a routing protocol like BGP) or because they were
      blocked by the access control lists (ACLs) on the router.

      Finally, for a routed flow record, the IP protocol and source
      and destination ports are examined.  When the protocol is 6
      (TCP) and either side of the conversation used port 80 (http),
      443 (https), or 8080 (http-alt), the flow is considered a
      "routed-web" record.  Because of the known IP protocol and
      limited number of web-server-side ports, "routed-web" traffic
      can be packed in a smaller record.

      Routed traffic that is not-TCP or does not use one of the
      well-established HTTP ports is considered "routed-non-web"
      traffic.  There is certainly no guarantee that "routed-web"
      traffic is entirely web-based nor that there is no web-traffic
      in "routed-non-web"; the web/non-web split is heuristic that
      gets it right most of the time.

      These tests allow rwflowpack to support six traffic types; each
      type is stored in a different subdirectory:

            in      -flows for incoming routed-non-web traffic
            inweb   -flows for incoming routed-web traffic
            innull  -flows for incoming not-routed traffic

            out     -flows for outgoing routed-non-web traffic
            outweb  -flows for outgoing routed-web traffic
            outnull -flows for outgoing not-routed traffic

OPTIONS
      Option names may be abbreviated if the abbreviation is unique or
      is an exact match for an option.

      One of the following switches is required:

      --log-directory=<logDir>
            The directory under which log messages will be stored
            unless the --log-pathname option is given.  The process id
            file will also be stored in this directory unless the
            --no-daemon option is given, in which case no process id
            file is written.

      --log-pathname=<logPathname>
            The complete path to log file.  Using this option will
            override the log-filename that would be generated based on
            the --log-directory argument.  This option also diables
            automatic log rotation.

      The following switches are optional and are primarily used for
      debugging:

      --pack-interfaces
            This switch allows you to override the default file output
            format of the packed-files that rwflowpack writes.  When
            this switch is present, rwflowpack writes additional
            information into the packed files: the router's SNMP input
            and output interfaces and the next-hop IP address.  The
            extra data produced by this switch is useful for
            determining why traffic is being stored in certain files.

            Note that this switch will only affect newly created
            files.  New records will always be appended to an existing
            file in the file's current output format to maintain file
            integrity.

      --no-daemon
            Forces rwflowpack to stay in the foreground---it does not
            become a daemon.  Useful for debugging.

      --byte-order=<endian>
            Sets the byte order for newly created files.  When
            appending records to an existing file, the existing byte
            order of the file is maintained.  The argument is one of:
                  'native' byte order (Default)
                  'little' endian
                  'big' endian

      --reader-function=<N>
            Currently unused.  Future plans are to allow rwflowpack to
            read flow-like data from sources other than NetFlow.

      SWITCHES TO CONFIGURE PACKING

      rwflowpack must be instructed how to determine whether a record
      represents an incoming or outgoing flow.  This can be done via a
      configuration file, or via command line arguments (though not by
      a combination of both).  The complete syntax of the
      configuration file is described in Installation Handbook.  In
      what follows, we provide the command line switch and the
      corresponding configuration file keyword.

      --sensor-configuration=<config-file>
            The path to the configuration file.

      --sensor-name=<sensorName>
            (Keywords: sensor, sensor-probe) This should be an
            alpha-numeric string from 1 to 24 characters in length.
            The sensorName must start with an letter and it must not
            contain an underscore.  For compatibility with the
            analysis applications, the sensorName should match a value
            given in the sensorInfo[] array in the source file
            src/include/silk_site_generic.h.  This value is required.

            The --sensor-name switch may be present on the command
            line when the --sensor-configuration switch is present,
            and it is required if the configuration file contains
            multiple sensor-probe definitions.  When --sensor-name is
            given, it should be the name of one of the sensor-probes
            listed in the configuration file, and it determines which
            of the sensors listed in the configuration file are being
            packed by this instance of rwflowpack.

      --class=<class-name>
            (Keyword: class) When the --sensor-name gives the name of
            an unknown sensor (i.e., one that is not listed in the
            sensorInfo[] array), the 'class' of the sensor must be
            provided (unless the tools have been compiled with support
            for a single class only).

      The following determine in which files (the 'type') the records
      are stored:

      --in-index=<integer-list>
            (Keyword: input-index) The SNMP interface index(es) upon
            which data enters the router, given as a comma separated
            list of positive integers.  Any record which has one of
            these indexes as its input interface is considered
            "incoming"---entering the organization.  This value is
            required.

      --null-interface=<index>
            (Keyword: null-interface) The SNMP index of the output
            interface the router uses to denote a not-routed packet.
            A packet may be not-routed because the packet is meant for
            the router itself (e.g., a BGP message) or because it
            matches an ACL violation.  Flows matching this output
            interface are stored in the "innull" or "outnull"
            directories.  If this option is not provided, 0 is used as
            the default, which is the value used by Cisco routers to
            denote a null flow.

      NETFLOW COLLECTION SWITCHES

      When collecting NetFlow data, the following switch is required:

      --root-directory=<dataRootDir>
            The directory under which the packed data will be stored.
            The tool will create subdirectories below dataRootDir
            based on the data received.

      One and only one of the following must be provided as the source
      for the NetFlow records, either from the configuration file or
      from the command line argument:

      --netflow-port=<port>
            (Keyword: listen-on-port) The UDP port on which rwflowpack
            listens for NetFlow V5 packets.  This is the port given to
            Cisco ISO command "ip flow-export <ip-address>
            <port>".  When this switch is present, rwflowpack
            runs as a daemon (unless the --no-daemon switch is
            present) until it is killed.

      --file=<pathname>
            (Keyword: read-from-file) Instead of using the
            --netflow-port switch, you can have rwflowpack read
            NetFlow v5 PDUs directly from the named file.  When this
            switch is present, rwflowpack will process the file and
            exit.  The file's length should be an integer multiple of
            1464 bytes, where 1464 is the maximum length of the
            NetFlow v5 PDU.  Each 1464 block should contain the
            24-byte NetFlow v5 header and space for 30 48-byte flow
            records, even if data for only 1 NetFlow record is valid.



FILES
      The directory root is <dataRootDir>, immediately
      underneath <dataRootDir> are six subdirectories
      corresponding to the six traffic types discussed above.  Under
      these are directories representing the year, month, and day in
      YYYY/MM/DD format.  That is:

            <dataRootDir>/in/{$YEAR}/{$MONTH}/{$DAY}/*
            <dataRootDir>/inweb/{$YEAR}/{$MONTH}/{$DAY}/*
            <dataRootDir>/innull/{$YEAR}/{$MONTH}/{$DAY}/*
            <dataRootDir>/out/{$YEAR}/{$MONTH}/{$DAY}/*
            <dataRootDir>/outweb/{$YEAR}/{$MONTH}/{$DAY}/*
            <dataRootDir>/outnull/{$YEAR}/{$MONTH}/{$DAY}/*

      For example, output web files for October 4th, 2003 are
      recorded in <dataRootDir>/outweb/2003/10/04/

      The names of the files in these directories include all of this
      information, and are written in the form:

            <type>-<sensorName>_YYYYMMDD.HH

      Where <sensorName> is the name given in the --sensor-name
      switch to rwflowpack.  The <type> will be one of in, iw
      (for inweb), innull, out, ow (for outweb), or outnull.

SEE ALSO
      The rwfpd script; The SiLK Installation Handbook

HELP FILE VERSION
      $SiLK: INSTALL.html,v 1.5 2005/09/26 22:33:10 thomasm Exp $


Probe Configuration File

To allow the for easier configuration, rwflowpack supports reading its configuration information from a text file. To have the program read the configuration file, invoke it with the --sensor-config=config-file switch, where config-file is the location of the configuration file.

The configuration file tries to satisfy two goals: (1) to tell the program how to collect the flow data (from a network socket, a UNIX domain socket, or a file), and how to pack the data once it has been collected.

A probe represents an object that generates a stream of flow-data; the object could be a Cisco router that is producing NetFlow or software that creates flow information from tcpdump-like data.

Probes can be grouped into a logical sensor; a sensor is one or more probes that, for the purpose of analysis, all have the same name. Currently, rwflowpack does not make use of this functionality.

While probes are defined only in the configuration file, sensors should be declared in the sensorInfo[] array that is defined in the disa/src/include/silk_site_*.h C header file. While it is possible to define sensors in the configuration file, the analysis tools (e.g., rwfilter) will not be aware of these sensors.

The syntax of the configuration file is simple. Blank lines are ignored; comments, which begin with the '#' character and continue to the end of the line, are also ignored. Every other line should contain a key-value pair, where the key and value are separated by whitespace. For a key that accepts multiple values, the values must be separated by whitespace and/or a comma. Repeating a key will overwrite the key's previous value. Lines are grouped together into blocks; there are two types of blocks: sensor and sensor-probe. The keywords 'sensor' and 'sensor-probe' introduce a new block; the block continue until the next 'sensor' or 'sensor-probe' keyword. Each key-value pair within the block sets an attribute on the sensor or probe.

An example sensor block is:


    sensor S01
        class all
  
The 'sensor' keyword defines a new sensor. It is followed by the name of the sensor being defined; this name will be used to generate the names of the data files.

There is a single attribute on a sensor that must be present: the 'class' keyword sets the class that the sensor belongs to; in the above example, sensor 'S01' belongs to the class 'all'. Saying a sensor "belongs to" a class means data collected by that sensor will be considered whenever the name of the class is passed to rwfilter. The name of the class must be one of the strings listed in the classInfo[] array in the disa/src/include/silk_site_*.h file. As stated previously, the analysis tools are not aware of sensors that are created in the configuration file. A sensor block is most useful during testing or initial deployment.

A sensor-probe block, which defines a probe, is more complicated. An sensor-probe block that shows all the attributes (but that is illegal because of conflicting attributes) is:


    sensor-probe S01
        probe-name probe01
        priority 8
        probetype netflow
        null-interface 0
        isp-ip 10.10.10.10,10.10.10.11
        input-index 1,2,3,4
        output-index 8,9
        listen-as-host 192.168.1.1
        listen-on-port 9999
        listen-on-unix-domain-socket /tmp/sock
        read-from-file /var/tmp/flow-file
        accept-from-host 172.16.22.22
  

sensor-probe. The sensor-probe keyword closes the previous sensor or sensor-probe block and begins a new sensor-probe block. The value is the name of the sensor for which this probe is a flow collector. The sensor name must be known, either by being defined in the sensorInfo[] array or from a sensor block previously defined in this configuration file.

probe-name. If this attribute is not present, the name of the probe will be the same as the name of the sensor, i.e., the value given on the sensor-probe line. Note that the probes that belong to a single sensor must each have a unique name. This attribute is only used in rwflowpack error messages.

probe-type. There is a single type of probe, 'netflow'.

priority. The priority is an integer value between 1 and 100, inclusive; values of 1-50 are considered low priority, 51-100 are high. rwflowpack does not use this attribute.

listen-on-port. This attribute instructs rwflowpack to bind() to the specified network port in order to collect NetFlow data; i.e., this is the port given by Cisco ISO command ip flow-export [ip-address] [port].

listen-as-host. The value is a network IP address in dotted-decimal notation; it is the address to which NetFlow data is being sent; i.e., it is the ip-address given by Cisco ISO command ip flow-export [ip-address] [port]. This attribute is only useful on a multi-homed machine; if not present, the program will listen on all the machine's addresses.

listen-on-unix-domain-socket. This attribute is unused by rwflowpack.

read-from-file. When this attribute is given, rwflowpack will read PDU records from the specified path on the filesystem. This attribute is incompatible with the listen-on-port and listen-on-unix-domain-socket attributes. The file's length should be an integer multiple of 1464 bytes, where 1464 is the maximum length of the Netflow v5 PDU. Each 1464 block should contain the 24-byte Netflow v5 header and space for 30 48-byte flow records, even if data for only 1 Netflow record is valid.

accept-from-host. When the program has opened a network socket on which to listen for flows, this attribute is the IP or name of the host that is allowed to connect to the socket. When this attribute is not present, any host may connect.

null-interface. The SNMP index of the output interface the router uses to denote a not-routed packet. A packet may be not-routed because the packet is meant for the router itself (e.g., a BGP message) or because it matches an ACL violation. Flows matching this output interface are stored as the one of the not-routed types (e.g., "innull", "outnull"). If this attribute is not provided, 0 is used as the default, which is the value used by Cisco routers to denote a null flow. This attribute is only used by rwflowpack, and only for NetFlow probes.

isp-ip. This attribute is currently unused.

input-index. The value for this attribute is a list of positive integers that represent the SNMP interface index(es) upon which "incoming" data enters the router. Any flow record which has one of these indexes as its input interface is considered "incoming"---entering the organization. This attribute is required.

output-index. This attribute is ignored.