Health monitoring overview

[Version 2.5 and later] The message center aggregates health status events from all container and catalog servers in a collective, in real time. When the message center is configured, you can view an overview of the current critical events that are occurring in various servers without collecting the logs for each server.

Message center implementation

The message center is enabled by default. You can disable the message center in the user interface.

Data grid deployments can involve dozens or hundreds of distributed server processes. If a problem occurs, you can open the actual log file for the affected container server to further analyze the problem.

The message center consists of the following components:
Event aggregation
When you configure health monitoring on a catalog server, you receive aggregated events that are affecting the health of the entire catalog service domain. The framework includes an indication of the source and severity for the following types of events:
  • All FFDC events
  • All WARNING or SEVERE log entries
  • A filtered list of all log entries, including INFO, WARNING, and SEVERE log entries
  • Server start and server stop operations
  • Data grid is nearing capacity
  • Loss or regaining of quorum
  • Enabled SNMP traps are triggered
  • Replication is falling behind over a 15 minute time period
Message center in the web console
The message center in the web console displays the aggregated event records. These events include both recent events and real-time update notifications for events that occurred after the console was opened.
Events in the xscmd utility
You can also display a recent list of events with the xscmd utility. As events occur, you can redirect the event records to create automatic scripting utilities.
MBeans for integration with other monitoring software
You can also use the available management MBeans to plug the message center into your other Java Management Extensions (JMX) monitoring software. The documentation for these MBeans is included in the API documentation.
[Version 2.5 and later]

Health summary

With the health logs and the GetHealthStatus command in the HTTP command interface, you can get a summary of software and hardware health status for the appliance. This information includes data grid placement and replication status, hardware warnings, system and network health status, and other status messages.

Message center versus log analyzer

The log analyzer is another tool to analyze a set of log messages. This tool requires that you manually collect the logs from the various servers in your environment. Then, you can run the tool to create reports of problem conditions. Use the log analyzer for post-mortem analysis of your logs when you need to analyze a set of messages that is larger than the subset of 1000 messages that you can display in the message center. Use the message center for real-time monitoring of the health of the data grid to quickly identify issues that are occurring. Then, you can either review the log files for the related container server, or use the log analyzer to further research the problem.

Health monitoring configuration and architecture

The message center is enabled by default. You can disable the message center in the user interface. When the message center is enabled, the catalog servers in your collective are activated as hubs for health monitoring. Generally, the messages in the message center come from the catalog server that is on the appliance from which you originally created the collective. Each hub has its own subscriptions and separate event histories. Each event in the history has a sequence number. The event histories on separate catalog servers are not kept synchronized and are different. Catalog servers can subscribe to log and FFDC events from other catalog servers.