Troubleshooting Dispatcher

Use the information that is provided to help you solve problems that can occur in Content Based Routing.

Use the table to see a full description and possible solution for the problem that you are experiencing.
Table 1. Troubleshooting table for Load Balancer
Symptom Possible Cause
Dispatcher not running correctly Conflicting port numbers
Connections from client machines not being served or connections timing out
  • Wrong routing configuration
  • Server does not have loopback device aliased to the cluster address
  • Extra route not deleted
  • Port not defined for each cluster
Dispatcher, Microsoft IIS, and SSL are not working or will not continue Unable to send encrypted data across protocols
The dscontrol or lbadmin command fails with ‘Server not responding' or ‘unable to access RMI server' message
  1. Commands fail due to socksified stack. Or commands fail due to not starting dsserver
  2. RMI ports are not set correctly
  3. Host file has incorrect local host
Advisors not working correctly Advisors are not running
“Cannot Find the File..." error message, when running Netscape as default browser to view online help (Windows platform) Incorrect setting for HTML file association
Graphical user interface does not start correctly Insufficient paging space
Graphical user interface does not display correctly. Resolution is incorrect.
Help panels sometimes disappear behind other windows Java™ limitation
GUI hangs (or unexpected behavior) when trying to load a large configuration file. Java does not have access to enough memory to handle such a large change to the GUI
Korean Load Balancer interface displays overlapping or undesirable fonts on AIX® and Linux systems Default fonts must be changed
[Windows]Unexpected GUI behavior when using Windows platform paired with Matrox AGP video card Problem occurs when using Matrox AGP video cards while running the Load Balancer GUI
Slow response time when running commands on the Dispatcher machine Slow response time can be due to machine overloading from a high volume of client traffic
SSL or HTTPS advisor not registering server loads Problem occurs because the SSL server application not configured with the cluster IP address
[Windows]On Windows platform, corrupted Latin-1 national characters appear in command prompt Change font properties of command prompt window
[Windows]On Windows platform, advisors and reach targets mark all servers down Task offloading is not disabled or may need to enable ICMP.
[Windows]On Windows platform, advisors not working in a high availability setup after a network outage When the system detects a network outage, it clears its Address Resolution Protocol (ARP) cache
[Linux]On Linux systems, "IP address add" command and multiple cluster loopback aliases are incompatible When aliasing more than one address on the loopback device, should use ifconfig command, not ip address add
Slow down occurs when loading Load Balancer configurations The delay might be due to Domain Name System (DNS) calls that are made to resolve and verify the server address.
[Windows]On Windows systems, the following error message appears: There is an IP address conflict with another system on the network If high availability is configured, cluster addresses may be configured on both machines for a brief period which causes this error message to appear.
[Windows]On Windows systems, "Server not responding" error occurs when issuing a dscontrol or lbadmin command When more than one IP address exists on a Windows system and the host file does not specify the address to associate with the hostname.
Dispatcher MAC forwarding configuration limitations with zSeries and S/390® platforms On Linux, there are limitations when using zSeries or S/390 servers that have Open System Adapter (OSA) cards. Possible workarounds are provided.
[Linux]On Linux systems, iptables can interfere with the routing of packets Linux iptables can interfere with load balancing of traffic and must be disabled on the Load Balancer machine.
Upgrading the Java fileset provided with the Load Balancer installations If a problem is found with the Java file set, you should report the problem to IBM® Service so that you can receive an upgrade for the Java file set that was provided with the Load Balancer installation.
Load Balancer for IPv4 and IPv6 conflicts with IP security (IPsec)

If you are using the Load Balancer for IPv4 and IPv6 with IP security (IPsec) enabled, output packets might be incorrect and dispatcher configuration information might display incorrectly in the command line interface and administrative console for WebSphere® Application Server.

Load Balancer reports that it is forwarding connections, but clients do not receive responses.

The serverUp script might run when you issue commands for Load Balancer that affect the status of servers You might experience problems if you run a command that affects the status of a server, such as the dscontrol manager unquiesce and dscontrol manager quiesce commands, after a manager cycle has already retrieved the weights of the servers. If you run these commands, it might overwrite the values that are saved during the manager cycle and cause the serverUp script to run unexpectedly.

Dispatcher not running correctly

This problem can occur when another application is using one of the ports used by the Load Balancer. For more information, read the Configuring the Load Balancer machine topic.

Load Balancer requests are not being balanced

This problem has symptoms such as connections from client machines not being served or connections timing out. Check the following to diagnose this problem:
  1. Have you configured the nonforwarding address, clusters, ports, and servers for routing? Check the configuration file.
  2. Does the loopback device on each server have the alias set to the cluster address?

    [AIX][HP-UX][Linux][Solaris]Use netstat -ni to check.

  3. Is the extra route deleted?

    [AIX][HP-UX][Linux][Solaris]Use netstat -nr to check.

  4. Use the dscontrol cluster status command to check the information for each cluster you have defined. Make sure you have a port defined for each cluster.
  5. Use the dscontrol server report :: command to make sure that your servers are neither down nor set to a weight of zero.
[Windows]

Dispatcher, Microsoft IIS, and SSL do not work (Windows platform)

When using Dispatcher, Microsoft IIS, and SSL, if they do not work together, there may be a problem with enabling SSL security. For more information about generating a key pair, acquiring a certificate, installing a certificate with a key pair, and configuring a directory to require SSL, see the Microsoft Information and Peer Web Services documentation.

[Solaris]

dscontrol or lbadmin command fails

  1. The dscontrol command returns: Error: Server not responding. Or, the lbadmin command returns: Error: unable to access RMI server. These errors can result when your machine has a socksified stack. To correct this problem, edit the socks.cnf file to contain the following lines:
    EXCLUDE-MODULE java
    EXCLUDE-MODULE javaw
  2. The administration consoles for Load Balancer interfaces (command line and graphical user interface) communicate with dsserver using remote method invocation (RMI). The default communication uses three ports; each port is set in the dsserver start script:
    • 10099 to receive commands from dscontrol
    • 10004 to send metric queries to Metric Server
    • 10199 for the RMI server port

    This can cause problems when one of the administration consoles runs on the same machine as a firewall or through a firewall. For example, when Load Balancer runs on the same machine as a firewall, and you issue dscontrol commands, you might see errors such as Error: Server not responding.

    To avoid this problem, edit the dsserver script file to set the port used by RMI for the firewall (or other application). Change the line: LB_RMISERVERPORT=10199 to LB_RMISERVERPORT=yourPort. Where yourPort is a different port.

    When complete, restart dsserver and open traffic for ports 10099, 10004, 10199, and 10100, or for the chosen port for the host address from which the administration console will be run.

  3. These errors can also occur if you have not already started dsserver.
  4. If there are multiple adapters on the machine, you must designate which adapter that dsserver is to use by adding the following in the dsserver script:java.rmi.server.hostname=<host_name or IPaddress>

    For example: java -Djava.rmi.server.hostname="10.1.1.1"

Advisors not working correctly

An ICMP ping is issued to the servers before the advisor request. If a firewall exists between Load Balancer and the servers, ensure that pings are supported across the firewall. If this setup poses a security risk to your network, modify the java statement in dsserver to turn off all pings to the servers by adding the java property:
LB_ADV_NO_PING="true"      
java  -DLB_ADV_NO_PING="true"
[Windows]

“Cannot find the file..." error message when trying to view online Help (Windows platform)

For Windows platforms, when using Netscape as your default browser, the following error message may result: “Cannot find the file '<filename>.html' (or one of its components). Make sure the path and filename are correct and that all required libraries are available."

The problem is due to an incorrect setting for HTML file association. The solution is the following:
  1. Click My Computer, click Tools, select Folder Options, and click File Types tab
  2. Select “Netscape Hypertext Document"
  3. Click Advanced button, select open, click Edit button
  4. Enter NSShell in the Application: field (not the Application Used to Perform Action: field), and click OK

Graphical user interface (GUI) does not start correctly

The graphical user interface (GUI), which is lbadmin, requires a sufficient amount of paging space to function correctly. If insufficient paging space is available, the GUI might not start up completely. If this occurs, check your paging space and increase it if necessary.

Graphical user interface (GUI) does not display correctly

If you experience a problem with the appearance of the Load Balancer GUI, check the setting for the operating system's desktop resolution. The GUI is best viewed at a resolution of 1024x768 pixels.

[Windows]

On Windows platform, help windows sometimes disappear behind other open windows

On Windows platform, when you first open help windows, they sometimes disappear into the background behind existing windows. If this occurs, click on the window to bring it forward again.

GUI hangs (or unexpected behavior) when trying to load a large configuration file

When using lbadmin to load a large configuration file (roughly 200 or more add commands), the GUI may hang or display unexpected behavior, such as responding to screen changes at an extremely slow rate of speed.

This occurs because Java does not have access to enough memory to handle such a large configuration.

There is an option on the runtime environment that can be specified to increase the memory allocation pool available to Java.

The option is -Xmxn where n is the maximum size, in bytes, for the memory allocation pool. n must be a multiple of 1024 and must be greater than 2MB. The value n may be followed by k or K to indicate kilobytes, or m or M to indicate megabytes. For example, -Xmx128M and -Xmx81920k are both valid.

For example, to add this option, edit the lbadmin script file, modifying "javaw" to "javaw -Xmxn" as follows. For AIX systems, modify "java" to "java -Xmxn".
  • [AIX]AIX systems
    java -Xmx256m -cp $LB_CLASSPATH $LB_INSTALL_PATH $LB_CLIENT_KEYS 
    com.ibm.internet.nd.framework.FWK_Main 1>/dev/null 2>&1 &
    
  • [Linux]Linux systems
    javaw -Xmx256m -cp $LB_CLASSPATH $LB_INSTALL_PATH $LB_CLIENT_KEYS 
    com.ibm.internet.nd.framework.FWK_Main 1>/dev/null 2>&1 &
    
  • [Windows]Windows systems
    START javaw -Xmx256m -cp %LB_CLASSPATH% %LB_INSTALL_PATH%
     %LB_CLIENT_KEYS% com.ibm.internet.nd.framework.FWK_Main

There is no recommended value for n , but it should be greater than the default option. A good place to start would be with twice the default value.

Korean Load Balancer interface displays overlapping or undesirable fonts on AIX and Linux systems

To correct overlapping or undesirable fonts in the Korean Load Balancer interface:
  • [AIX]On AIX systems
    1. Stop all Java processes on the AIX system.
    2. Open the font.properties.ko file in an editor. This file is located in home/jre/lib where home is the Java home.
    3. Search for the following string:
      -Monotype-TimesNewRomanWT-medium-r-normal
      --*-%d-75-75-*-*-ksc5601.1987-0
    4. Replace all instances of the string with:
      -Monotype-SansMonoWT-medium-r-normal
      --*-%d-75-75-*-*-ksc5601.1987-0
    5. Save the file.
  • [Linux]On Linux systems
    1. Stop all Java processes on the system.
    2. Open the font.properties.ko file in an editor. This file is located in home/jre/lib where home is the Java home.
    3. Search for the following string (with no spaces):
      -monotype-
      timesnewromanwt-medium-r-normal--*-%d-75-75-p-*-microsoft-symbol
    4. Replace all instances of the string with:
      -monotype-sansmonowt-medium-r-normal--*-%d-75-75-p-*-microsoft-symbol
    5. Save the file.
[Windows]

On Windows platform, unexpected GUI behavior when using Matrox AGP video cards

On Windows platform when using a Matrox AGP card, unexpected behavior can occur in the Load Balancer GUI. When clicking the mouse, a block of space slightly larger than the mouse pointer can become corrupted causing possible highlighting reversal or images to shift out of place on the screen. Older Matrox cards have not shown this behavior. There is no known fix when using Matrox AGP cards.

Slow response time running commands on Dispatcher machine

If you are running the Dispatcher component for load balancing, it is possible to overload the computer with client traffic. The Load Balancer kernel module has the highest priority, and if it is constantly handling client packets, the rest of the system may become unresponsive. Running commands in user space may take a very long time to complete, or may never complete.

If this happens, you should begin to restructure your setup to avoid overloading the Load Balancer machine with traffic. Alternatives include spreading the load across several Load Balancer machines, or replacing the machine with a stronger and faster computer.

When trying to decide if the slow response time on the machine is due to high client traffic, consider whether this occurs during client peak traffic times. Misconfigured systems that cause routing loops can also cause the same symptoms. But before changing the Load Balancer setup, determine whether the symptoms may be due to high client load.

SSL or HTTPS advisor not registering server loads

Load Balancer will send packets to the servers using the cluster address that is aliased on the loopback. Some server applications (such as SSL) require that configuration information, such as certificates, are based on the IP address. The IP address must be the cluster address which is configured on the loopback in order to match the contents of the incoming packets. If the IP address of the cluster is not used when configuring the server application, then the client request will not get properly forwarded to the server.

[Windows]

On Windows systems, corrupted Latin-1 national characters appear in command prompt window

In a command prompt window on the Windows operating system, some national characters of the Latin-1 family might appear corrupted. For example, the letter "a" with a tilde may display as a pi symbol. To fix this, you must change the font properties of the command prompt window. Change the font, as follows:
  1. Click the icon in the upper left corner of the command prompt window
  2. Select Properties, then click the Font tab
  3. The default font is Raster fonts; change this to Lucida Console and click OK

On Windows systems, advisors and reach targets mark all servers down

When configuring your adapter on a Load Balancer machine, you must ensure that the following two settings are correct for the advisor to work:
  • Disable Task Offloading.
    • To disable Task offloading: Go to Start > Settings > Control Panel > Network and Dial-up Connections, then select the adapter.
    • In the pop-up window, click Properties.
    • Click Configure, then select the Advanced tab.
    • In the property pane, select the Task Offload property, then select disable in the value field.
  • Enable Protocol 1 (ICMP) for IP protocols if you are enabling TCP/IP filtering. If ICMP is not enabled, the ping test to the back-end server will not succeed. To check whether ICMP is enabled:
    • Go to Start > Settings > Control Panel > Network and Dial-up Connections, then select the adapter.
    • In the pop-up window, click Properties.
    • From the components pane, select Internet Protocol (TCP/IP), then click Properties.
    • Click Advanced, then select the Options tab.
    • Select TCP/IP filtering in the options pane, then click Properties.
    • If you have selected Enable TCP/IP Filtering and permit only for IP protocols, you must add IP Protocol 1. This must be added in addition to the existing TCP and UDP ports that you enabled.
[Windows]

On Windows systems, after network outage, advisors not working in a high availability setup

By default, when the Windows operating system detects a network outage, it clears its address resolution protocol (ARP) cache, including all static entries. After the network is available, the ARP cache is repopulated by ARP requests sent on the network.

With a high availability configuration, both servers take over primary operations when a loss of network connectivity affects one or both. When the ARP request is sent to repopulate the ARP cache, both servers respond, which causes the ARP cache to mark the entry as not valid. Therefore, the advisors are not able to create a socket to the backup servers.

Preventing the Windows operating system from clearing the ARP cache when there is a loss of connectivity solves this problem. Microsoft has published an article that explains how to accomplish this task. This article is on the Microsoft Web site, located in the Microsoft Knowledge Base, article number 239924: http://support.microsoft.com/default.aspx?scid=kb;en-us;239924.

The following is a summary of the steps, described in the Microsoft article, to prevent the system from clearing the ARP cache:
  1. Use the Registry editor (regedit or regedit32) to open the registry.
  2. View the following key in the registry:
    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters
  3. Add the following registry value: Value Name: DisableDHCPMediaSense Value Type: REG_DWORD.
  4. After the key is added, edit the value and set it to 1.
  5. Reboot the machine for the change to take effect.
Attention: This affects the ARP cache regardless of the DHCP setting.
[Linux]

On Linux systems, do not use "IP address add" command when aliasing multiple clusters on the loopback device

Certain considerations must be taken when using Linux kernel 2.4.x servers. If the server has a cluster address configured on the loopback device using the ip address add command, only one cluster address can be aliased.

When aliasing multiple clusters to the loopback device use the ifconfig command, for example:
ifconfig lo:num clusterAddress netmask 255.255.255.255 up 

Additionally, there are incompatibilities between the ifconfig method of configuring interfaces and the ip method of configuring interfaces. Best practice suggests that a site choose one method and use that method exclusively.

Delay occurs while loading a Load Balancer configuration

Loading a Load Balancer configuration might take a long time due to Domain Name System (DNS) calls that are made to resolve and verify the server address.

If the DNS of the Load Balancer machine is configured incorrectly, or if DNS in general takes a long time, this will cause a slow down in loading the configuration due to the Java processes that are sending DNS requests on the network.

A workaround for this is to add your server addresses and hostnames to your local /etc/hosts file.

[Windows]

On Windows systems, an IP address conflict error message appears

If high availability is configured, the cluster addresses may be configured on both machines for a brief period and cause the following error message to occur: There is an IP address conflict with another system on the network. In this case, you can safely ignore the message. It is possible for a cluster address to be briefly configured on both high availability machines at the same time, especially during startup of either machine, or when a takeover has been initiated.

[Windows]

On Windows systems, "Server not responding" error occurs when issuing dscontrol or lbadmin

To resolve this problem, update the c:\Windows\system32\drivers\etc\hosts file with your machine host name and the IP address that you want to associate with the host name.

If you are using dscontrol, you can specify the connection address using the following command:
dscontrol host@@<ip_address or host_name> <command>
Avoid trouble Avoid trouble: The IP address cannot be a cluster address.gotcha
[Linux]

On Linux, Dispatcher configuration limitations when using zSeries or S/390 servers that have Open System Adapter (OSA) cards

In general, servers in the Load Balancer configuration must all be on the same network segment regardless of the platform. Active network devices such as router, bridges, and firewalls interfere with Load Balancer. This is because Load Balancer functions as a specialized router, modifying only the link-layer headers to its next and final hop. Any network topology in which the next hop is not the final hop is not valid for Load Balancer.
Note: Tunnels, such as channel-to-channel (CTC) or inter-user communication vehicle (IUCV), are often supported. However, Load Balancer must forward across the tunnel directly to the final destination, it cannot be a network-to-network tunnel.

There is a limitation for zSeries and S/390 servers that share the OSA card, because this adapter operates differently than most network cards. The OSA card has its own virtual link layer implementation, which has nothing to do with ethernet, that is presented to the Linux and z/OS® hosts behind it. Effectively, each OSA card looks just like ethernet-to-ethernet hosts (and not to the OSA hosts), and hosts that use it will respond to it as if it is ethernet.

The OSA card also performs some functions that relate to the IP layer directly. Responding to ARP (address resolution protocol) requests is one example of a function that it performs. Another is that shared OSA routes IP packets based on destination IP address, instead of on ethernet address as a layer 2 switch. Effectively, the OSA card is a bridged network segment unto itself.

Load Balancer that runs on an S/390 Linux or zSeries Linux host can forward to hosts on the same OSA or to hosts on the ethernet. All the hosts on the same shared OSA are effectively on the same segment.

Load Balancer can forward out of a shared OSA because of the nature of the OSA bridge. The bridge knows the OSA port that owns the cluster IP. The bridge knows the MAC address of hosts directly connected to the ethernet segment. Therefore, Load Balancer can MAC-forward across one OSA bridge.

However, Load Balancer cannot forward into a shared OSA. This includes the Load Balancer on an S/390 Linux when the back-end server is on a different OSA card than the Load Balancer. The OSA for the back-end server advertises the OSA MAC address for the server IP, but when a packet arrives with the ethernet destination address of the server's OSA and the IP of the cluster, the server's OSA card does not know which of its hosts, if any, should receive that packet. The same principles that permit OSA-to-ethernet MAC-forwarding to work out of one shared OSA do not hold when trying to forward into a shared OSA.

Workaround:

In Load Balancer configurations that use zSeries or S/390 servers that have OSA cards, there are two approaches you can take to work around the problem that has been described.
  1. Using platform features

    If the servers in the Load Balancer configuration are on the same zSeries or S/390 platform type, you can define point-to-point (CTC or IUCV) connections between Load Balancer and each server. Set up the endpoints with private IP addresses. The point-to-point connection is used for Load Balancer-to-server traffic only. Then add the servers with the IP address of the server endpoint of the tunnel. With this configuration, the cluster traffic comes through the Load Balancer OSA card and is forwarded across the point-to-point connection where the server responds through its own default route. The response uses the server's OSA card to leave, which might or might not be the same card.

  2. Using Load Balancer's encapsulation feature.

    If the servers in the Load Balancer configuration are not on the same zSeries or S/390 platform type, or if it is not possible to define a point-to-point connection between Load Balancer and each server, it is recommended that you use Load Balancer's encapsulation feature, which is a protocol that permits Load Balancer to forward across routers.

    When using encapsulation, the client->cluster IP packet is received by Load Balancer, encapsulated, and sent to the server. At the server, the original client->cluster IP packet is excapsulated, and the server responds directly to the client. The advantage with using GRE is that Load Balancer sees only the client-to-server traffic, not the server-to-client traffic. The disadvantage is that it lowers the maximum segment size (MSS) of the TCP connection due to encapsulation overhead.

    Refer to the topic Use encapsulation forwarding to forward traffic across network segments for more information on how to configure Load Balancer to forward with encapsulation.

[Linux]

Linux iptables can interfere with the routing of packets

Linux iptables can interfere with load balancing of traffic and must be disabled on the Dispatcher machine.

Issue the following command to determine if iptables are loaded:
lsmod | grep ip_tables
The output from the preceding command might be similar to this:
ip_tables         22400   3
iptable_mangle,iptable_nat,iptable_filter
Issue the following command for each iptable listed in the output to display the rules for the tables:
iptables -t <short_name> -L
For example:
iptables -t mangle -L 
iptables -t nat    -L
iptables -t filter -L    
If iptable_nat is loaded, it must be unloaded. Because iptable_nat has a dependency on iptable_conntrack, iptable_conntrack also must be removed. Issue the following command to unload these two iptables:
rmmod iptable_nat iptable_conntrack

Upgrading the Java file set provided with the Load Balancer installation

During the Load Balancer installation process, a Java file set also gets installed. Load Balancer will be the only application that uses the Java version which installs with the product.You should not upgrade this version of the Java file set on your own. If there are problem which requires an upgrade for the Java file set, you should report the problem to IBM Service so the Java file set which is shipped within Load Balancer will be upgraded with an official fix level.

[HP-UX]

On AIX systems, Load Balancer conflicts with IP security (IPsec)

[AIX]

If you are using Load Balancer with IP security (IPsec) enabled, output packets might be incorrect and dispatcher configuration information might display incorrectly in the command line interface and administrative console for WebSphere Application Server. Load Balancer reports that it is forwarding connections, but clients do not receive responses.

If you are using Load Balancer function and IP security on the same host, there might be communication problems between Load Balancer and the application server. The Load Balancer component is not fully compatible with IPsec features and it transmits data from both sides of the security layer. Load Balancer receives packets below IPsec and, as a result, receives encrypted packets that it does not decrypt. When sending data, Load Balancer transmits them above IPsec, so it sends unencrypted packets to the application server that are encrypted on the other end by IPsec. The application server, therefore, receives encrypted data that cannot be used.

The serverUp script might run when you issue commands for Load Balancer that affect the status of servers

Weights are set by the manager during a manager cycle. At the start of the manager cycle, the manager retrieves the current weights from the executor function. The manager uses these values as the last known weight to determine if the status of a server has changed:
  • If you issue a quiesce command for a server, the executor function saves the current weight of the server and associates a new weight to the server with a value of -1. This is an example of the quiesce command:
    dscontrol manager quiesce server
  • If you issue the unquiesce command, a call is made to the executor function to revert the weight of the server to the saved value. The system sets a flag to indicate that the server is no longer marked down by the user. This is an example of the quiesce command:
    dscontrol manager unquiesce server
    If the unquiesce command occurs after the manager has retrieved the weights, the executor function overwrites the weight that is used to determine if the server state has changed. This process does not cause any side effects unless the server is also quiesced.

The chances of experiencing this problem increase with larger configurations because the manager cycle takes longer to run. Also, there is a higher probability that the manager cycle will be in progress when the unquiesce command is issued.


Icon that indicates the type of topic Concept topic



Timestamp icon Last updated: March 23, 2018 0:18
File name: ctrb_dispatcher.html