Advanced features for Cisco CSS Controller and Nortel Alteon Controller

This chapter includes the following sections:

Note:
In this chapter xxxcontrol denotes ccocontrol for Cisco CSS Controller and nalcontrol for Nortel Alteon Controller.

Collocation

Cisco CSS Controller or Nortel Alteon Controller can reside on the same machine as a server for which you are load balancing requests. This is commonly referred to as collocating a server. No additional configuration steps are required.

Note:
A collocated server competes for resources with Load Balancer during times of high traffic. However, in the absence of overloaded machines, using a collocated server offers a reduction in the total number of machines necessary to set up a load-balanced site.

High availability

The high availability feature is now available for Cisco CSS Controller and Nortel Alteon Controller.

To improve controller fault tolerance, the high availability function contains these features:

Configuration

See ccocontrol highavailability -- control high availability and nalcontrol highavailability -- control high availability for the complete syntax for xxxcontrol highavailability.

To configure controller high availability:

  1. Start the controller server on both controller machines.
  2. Configure each controller with identical configurations.
  3. Configure the local high availability role, address, and partner address as follows:
    xxxcontrol highavailability add address 10.10.10.10 
    partneraddress 10.10.10.20 port 143 role primary
  4. Configure the partner high availability role, address, and partner address as follows:
    xxxcontrol highavailability add address 10.10.10.20 
    partneraddress 10.10.10.10 port 143 role secondary
    The address and partneraddress parameters are reversed on the primary and secondary machines.
  5. Optionally, configure high availability parameters on the local and partner controllers; for example:
    xxxcontrol highavailability set beatinterval 1000
  6. Optionally, configure reach targets on local and partner controllers as follows:
    xxxcontrol highavailability usereach 10.20.20.20
    The same number of reach targets must be configured on the local and partner controllers.
  7. Start the high availability component and define recovery strategy on local and partner controllers as follows:
    xxxcontrol highavailability start auto
  8. Optionally, display high availability information on local and partner controllers as follows:
    xxxcontrol highavailability report
  9. Optionally, specify takeover on standby controller to take over from active controller as follows:
    xxxcontrol highavailability takeover
    This is necessary only for maintenance.
Notes:
  1. To configure a single controller without high availability, do not issue any high availability commands.
  2. To convert two controllers in a high availability configuration to a single controller, stop high availability on the standby controller first; optionally, then stop high availability on the active controller.
  3. When you run two controllers in a high availability configuration, unexpected results can occur if any of the controller properties differ between the switches; for example, switchconsultantid, switch address, and so forth. You can also get unexpected results if the controller high availability properties do not match; for example, port, role, reach targets, beatinterval, takeoverinterval, and recovery strategy.

Failure detection

In addition to the loss of connectivity between active and standby controllers, which is detected through the heartbeat messages, reachability is another failure detection mechanism.

When you configure controller high availability, you can provide a list of hosts that each of the controllers must reach to work correctly. There must be at least one host for each subnet that your controller machine uses. These hosts can be routers, IP servers, or other host types.

Host reachability is obtained by the reach advisor, which pings the host. Switchover takes place if the heartbeat messages cannot go through, or if the reachability criteria are better met by the standby controller than by the active controller. To make this decision based on all available information, the active controller regularly sends the standby controller its reachability capabilities and vice versa. The controllers then compare their reachability information with their partner's information and decide who should be active.

Recovery strategy

The roles of the two controller machines are configured as primary and secondary. At startup the controllers exchange information until each machine is synchronized. At this point, the primary controller moves to the active state and begins calculating weights and updating the switch, while the secondary machine moves to standby state and monitors the availability of the primary machine.

At any point if the standby machine detects that the active machine has failed, the standby machine performs a takeover of the active (failed) machine's load-balancing functions and becomes the active machine. When the primary machine is again operational, the two machines determine which controller will be active according to how recovery strategy is configured.

There are two kinds of recovery strategy:

Automatic recovery

The primary controller moves to the active state, calculating and updating weights, as soon as it becomes operational again. The secondary machine moves to standby after the primary is active.

Manual recovery

The active secondary controller remains in active state, even after the primary controller is operational.

The primary controller moves to standby state and requires manual intervention to move to the active state.

The strategy parameter must be set the same for both machines.

Examples

For Cisco CSS Controller high availability configuration examples, see Examples.

For Nortel Alteon Controller high availability configuration examples, see Examples.

Optimizing the load balancing provided by Load Balancer

The controller function of Load Balancer performs load balancing based on the following settings:

You can change these settings to optimize load balancing for your network.

Importance given to metric information

The controller can use some or all of the following metric collectors in its weighting decisions:

The default metrics are activeconn and connrate.

You can change the relative proportion of importance of the metric values. Think of the proportions as percentages; the sum of the relative proportions must equal 100%. By default, the active connections and new connections metrics are used and their proportions are set to 50/50. In your environment, you might need to try different metric proportion combinations to find the combination that gives the best performance.

To set the proportion values:

For Cisco CSS Controller
ccocontrol ownercontent metrics metricName1 proportion1 metricName2 proportion2
For Nortel Alteon Controller
nalcontrol service metrics metricName1 proportion1 metricName2 proportion2

Weights

Weights are set based upon application response time and availability, feedback from the advisors, and feedback from a system-monitoring program, such as Metric Server. If you want to set weights manually, specify the fixedweight option for the server. For a description of the fixedweight option, see Controller fixed weights.

Weights are applied to all servers providing a service. For any particular service, the requests are distributed between servers based on their weights relative to each other. For example, if one server is set to a weight of 10, and the other to 5, the server set to 10 should get twice as many requests as the server set to 5.

If an advisor finds that a server has gone down, the weight for the server is set to -1. For Cisco CSS Controller and Nortel Alteon Controller the switch is informed that the server is not available and the switch stops assigning connections to the server.

Controller fixed weights

Without the controller, advisors cannot run and cannot detect if a server is down. If you choose to run the advisors, but do not want the controller to update the weight you have set for a particular server, use the fixedweight option on the ccocontrol service command for Cisco CSS Controller or the nalcontrol server command for Nortel Alteon Controller.

Use the fixedweight command to set the weight to the value you desire. The server weight value remains fixed while the controller is running until you issue another command with fixedweight set to no.

Weight calculation sleeptimes

To optimize overall performance, you can restrict how often metrics are collected.

The consultant sleeptime specifies how often the consultant updates the server weights. If the consultant sleeptime is too low, it can mean poor performance as a result of the consultant constantly interrupting the switch. If the consultant sleeptime is too high, it can mean that the switch's load balancing is not based on accurate, up-to-date information.

For example, to set the consultant sleeptime to 1 second:

xxxcontrol consultant set consultantID sleeptime interval

Sensitivity threshold

Other methods are available for you to optimize load balancing for your servers. To work at top speed, updates to the weights for the servers are only made if the weights have changed significantly. Constantly updating the weights when there is little or no change in the server status would create an unnecessary overhead. When the percentage weight change for the total weight for all servers providing a service is greater than the sensitivity threshold, the weights used by the load balancer to distribute connections are updated. Consider, for example, that the total weight changes from 100 to 105. The change is 5%. With the default sensitivity threshold of 5, the weights used by the load balancer are not updated, because the percentage change is not above the threshold. If, however, the total weight changes from 100 to 106, the weights are updated. To set the consultant's sensitivity threshold to a value other than the default, enter the following command:

  xxxcontrol consultant set consultantID sensitivity percentageChange

In most cases, you will not need to change this value.

Advisors

Advisors are agents within Load Balancer. Their purpose is to assess the health and load of server machines. They do this with a proactive client-like exchange with the servers. Consider advisors as lightweight clients of the application servers.

Note:
For a detailed list of advisors, see List of advisors.

How advisors work

Advisors periodically open a TCP connection with each server and send a request message to the server. The content of the message is specific to the protocol running on the server. For example, the HTTP advisor sends an HTTP "HEAD" request to the server.

Advisors then listen for a response from the server. After getting the response, the advisor makes an assessment of the server. To calculate this load value, most advisors measure the time for the server to respond, then use this value (in milliseconds) as the load.

Advisors then report the load value to the consultant function, where it appears in the consultant report. The consultant then calculates aggregate weight values from all its sources, per its proportions, and sends these weight values to the switch. The switch uses these weights for load balancing new incoming client connections.

If the advisor determines that a server is alive and well, it reports a positive, non-zero load number to the consultant. If the advisor determines that a server is not active, it returns a special load value of negative one (-1) to inform the switch that the server is down. Subsequently, the switch does not forward any further connections to that server until the server has come back up.

Advisor sleeptimes

Note:
The advisor defaults work efficiently for the great majority of possible scenarios. Use caution when entering values other than the defaults.

The advisor sleeptime sets how often an advisor asks for status from the servers on the port it is monitoring and then reports the results to the consultant. If the advisor sleeptime is too low, it can result in poor performance because the advisor constantly interrupts the servers. If the advisor sleeptime is too high, it can mean that the consultant's weighting decisions are not based on accurate, up-to-date information.

For example, to set the interval to 3 seconds for the HTTP advisor, type the following command:

xxxcontrol metriccollector set consultantID:HTTP sleeptime 3

Advisor connect timeout and receive timeout for servers

You can set the amount of time an advisor takes to detect that a particular port on the server or service has failed. The failed-server timeout values, connecttimeout and receivetimeout, determine how long an advisor waits before reporting that either a connect or receive has failed.

To obtain the fastest failed-server detection, set the advisor connect and receive timeouts to the smallest value (one second), and set the advisor and consultant sleeptime to the smallest value (one second).

Note:
If your environment experiences a moderate-to-high volume of traffic and server response time increases, do not set the timeoutconnect and timeoutreceive values too small. If these values are too small, the advisor might prematurely mark a busy server as failed.

To set the timeoutconnect to 9 seconds for the HTTP advisor, type the following command:

xxxcontrol metriccollector set consultantID:HTTP timeoutconnect 9

The default for connect and receive timeout is 3 times the value specified for the advisor sleeptime.

Advisor retry

Advisors have the ability to retry a connection before marking a server down. The advisor will not mark a server down until the server query has failed the number of retries plus 1. If not set the retry value defaults to zero.

For the Cisco CSS Controller, set the retry value using ccocontrol ownercontent set command. For more information, see ccocontrol ownercontent -- control the owner name and content rule.

For the Nortel Alteon Controller, set the retry value using nalcontrol service set command. For more information, see nalcontrol service -- configure a service.

Create custom (customizable) advisors

Note:
In this section server is used as a generic term to refer to a service for Cisco CSS Controller or to a server for Nortel Alteon Controller.

The custom (customizable) advisor is a small piece of Java code that you provide as a class file, and is called by the base code. The base code provides all administrative services, such as:

It also reports results to the consultant. Periodically the base code performs an advisor cycle, where it individually evaluates all servers in its configuration. It starts by opening a connection with a server machine. If the socket opens, the base code calls the getLoad method (function) in the custom advisor. The custom advisor then performs the necessary steps to evaluate the health of the server. Typically, it sends a user-defined message to the server and then waits for a response. (Access to the open socket is provided to the custom advisor.) The base code then closes the socket with the server and reports the load information to the consultant.

The base code and custom advisor can operate in either normal or replace mode. Choice of the mode of operation is specified in the custom advisor file as a parameter in the constructor method.

In normal mode, the custom advisor exchanges data with the server, and the base advisor code times the exchange and calculates the load value. The base code then reports this load value to the consultant. The custom advisor needs only return a zero (on success) or negative one (on error). To specify normal mode, the replace flag in the constructor is set to false.

In replace mode, the base code does not perform any timing measurements. The custom advisor code performs whatever operations are desired for its unique requirements, and then returns an actual load number. The base code will accept the number and report it to the consultant. For best results, normalize your load number between 10 and 1000, with 10 representing a fast server, and 1000 representing a slow server. To specify replace mode, the replace flag in the constructor is set to true.

With this feature, you can write your own advisors to provide the precise information about servers that you need. A sample custom advisor, ADV_ctlrsample.java, is provided for the controllers. After installing Load Balancer, you can find the sample code in ...ibm/edge/lb/servers/samples/CustomAdvisors installation directory.

The default install directories are:

Note:
If you add a custom advisor to Cisco CSS Controller or Nortel Alteon Controller, you must stop and then restart ccoserver or nalserver (for Windows systems, use Services) to enable the Java process to read the new custom advisor class files. The custom advisor class files are loaded only at startup.

Naming Convention

Your custom advisor file name must be in the form ADV_myadvisor.java. It must start with the prefix ADV_ in uppercase. All subsequent characters must be lowercase letters.

As per Java conventions, the name of the class defined within the file must match the name of the file. If you copy the sample code, be sure to change all instances of ADV_ctrlsample inside the file to your new class name.

Compilation

Custom advisors are written in Java language. Use the Java compiler that is installed with Load Balancer. The following files are referenced during compilation:

Your classpath must point to both the custom advisor file and the base classes file during the compile.

For Windows platform, a compile command might look like this:

install_dir/java/bin/javac -classpath
    install_dir\lb\servers\lib\ibmlb.jar ADV_pam.java

where:

The output for the compilation is a class file; for example:

ADV_pam.class

Before starting the advisor, copy the class file to the ...ibm/edge/lb/servers/lib/CustomAdvisors installation directory.

Note:
If you want, custom advisors can be compiled on one operating system and run on another. For example, you can compile your advisor on Windows systems, copy the class file (in binary) to an AIX machine, and run the custom advisor there.

For AIX, HP-UX, Linux , and Solaris systems, the syntax is similar.

Run

To run the custom advisor, you must first copy the class file to the proper installation directory:

...ibm/edge/lb/servers/lib/CustomAdvisors/ADV_pam.class

Start the consultant, then issue this command to start your custom advisor:

For Cisco CSS Controller
ccocontrol ownercontent metrics consultantID:ownerContentID pam 100
For Nortel Alteon Controller
nalcontrol service metrics consultantID:serviceID pam 100

where:

Required routines

Like all advisors, a custom advisor extends the function of the advisor base, called ADV_Base. It is the advisor base that actually performs most of the advisor's functions, such as reporting loads back to the consultant for use in the consultant's weight algorithm. The advisor base also performs socket connect and close operations and provides send and receive methods for use by the advisor. The advisor itself is used only for sending and receiving data to and from the port on the server being advised. The TCP methods within the advisor base are timed to calculate the load. A flag within the constructor in the ADV_base overwrites the existing load with the new load returned from the advisor if desired.

Note:
Based on a value set in the constructor, the advisor base supplies the load to the weight algorithm at specified intervals. If the actual advisor has not completed so that it can return a valid load, the advisor base uses the previous load.

These are base class methods:

Search order

The controllers first look at the provided list of native advisors; if they do not find a given advisor there, they look at the list of custom advisors.

Naming and path

Sample advisor

The program listing for a controller sample advisor is included in Sample advisor. After installation, this sample advisor can be found in the ...ibm/edge/lb/servers/samples/CustomAdvisors directory.

Metric Server

Metric Server provides server load information to the Load Balancer in the form of system-specific metrics, reporting on the health of the servers. The Load Balancer consultant queries the Metric Server agent residing on each of the servers, assigning weights to the load balancing process using the metrics gathered from the agents. The results are also placed into the service report for Cisco CSS Controller or the server report for Nortel Alteon Controller.

Prerequisites

The Metric Server agent must be installed and running on all servers that are being load balanced.

How to Use Metric Server

Below are the steps to configure Metric Server for the controllers.

To have Metric Server run on an address other than the local host, edit the metricserver file on the load-balanced server machine. After java in the metricserver file, insert the following:

-Djava.rmi.server.hostname=OTHER_ADDRESS

In addition, before the "if" statements in the metricserver file, add this: hostname OTHER_ADDRESS.

For Windows systems: Alias the OTHER_ADDRESS on the Microsoft stack. To alias an address on the Microsoft stack, see the section on aliasing an address on the Microsoft stack for a metric server.

Workload manager advisor

WLM is code that runs on MVS™ mainframes. It can be queried to ask about the load on the MVS machine.

When MVS Workload Management has been configured on your OS/390® system, the controllers can accept capacity information from WLM and use it in the load balancing process. Using the WLM advisor, the controllers periodically open connections through the WLM port on each server in the consultant host table and accept the capacity integers returned. Because these integers represent the amount of capacity that is still available and the consultants expects values representing the loads on each machine, the capacity integers are inverted by the advisor and normalized into load values (for example, a large capacity integer but a small load value both represent a healthier server). There are several important differences between the WLM advisor and other controller advisors:

  1. Other advisors open connections to the servers using the same port on which flows normal client traffic. The WLM advisor opens connections to the servers using a port different from normal traffic. The WLM agent on each server machine must be configured to listen on the same port on which the controller WLM Advisor is started. The default WLM port is 10007.
  2. It is possible to use both protocol-specific advisors along with the WLM advisor. The protocol-specific advisors will poll the servers on their normal traffic ports, and the WLM advisor will poll the system load using the WLM port.

Using binary logging to analyze server statistics

The binary logging feature allows server information to be stored in binary files. These files can then be processed to analyze the server information that has been gathered over time.

The following information is stored in the binary log for each server defined in the configuration.

The consultant must be running to log information in the binary logs.

Use the xxxcontrol consultant binarylog command set to configure binary logging.

The start option starts logging server information to binary logs in the logs directory. One log is created at the start of every hour with the date and time as the name of the file.

The stop option stops logging server information to the binary logs. The log service is stopped by default.

The set interval option controls how often information is written to the logs. The consultant sends server information to the log server every consultant interval. The information is written to the logs only if the specified log interval seconds have elapsed since the last record was written to the log. By default, the log interval is set to 60 seconds.

There is some interaction between the settings of the consultant interval and the log interval. Because the log server is provided with information no faster than the consultant interval seconds, setting the log interval less than the consultant interval effectively sets it to the same as the consultant interval.

This logging technique allows you to capture server information at any granularity. You can capture all changes to server information that are seen by the consultant for calculating server weights; however, this amount of information is probably not required to analyze server usage and trends. Logging server information every 60 seconds gives you snapshots of server information over time. Setting the log interval very low can generate huge amounts of data.

The set retention option controls how long log files are kept. Log files older than the retention hours specified are deleted by the log server. This occurs only if the log server is being called by the consultant, so if you stop the consultant, old log files are not deleted.

A sample Java program and command file are provided in the ...ibm/edge/lb/servers/samples/BinaryLog directory. This sample shows how to retrieve all the information from the log files and print it to the screen. It can be customized to do any type of analysis you want with the data.

Following is an example using the supplied script and program:

xxxlogreport 2002/05/01 8:00 2002/05/01 17:00

This produces a report of the controller's server information from 8:00 AM to 5:00 PM on May 1, 2002.

Using scripts to generate an alert or record server failure

Load Balancer provides user exits that trigger scripts that you can customize. You can create the scripts to perform automated actions, such as alerting an Administrator when servers are marked down or simply record the event of the failure. Sample scripts, which you can customize, are in the ...ibm/edge/lb/servers/samples installation directory. To run the files, copy them to the ...ibm/edge/lb/servers/bin directory, then rename each file according to the directions contained in the script.

The following sample scripts are provided, where xxx is cco for Cisco CSS Controller, and nal for Nortel Alteon Controller: