Tips for troubleshooting your WS-Notification-based publish and subscribe messaging for Web services.
To help you identify and resolve WS-Notification-related problems, use the WebSphere® Application Server trace and logging facilities as described in Setting up component trace (CTRACE).
To enable trace for WS-Notification, set the application server trace string to SIBWsn=all=enabled:com.ibm.ws.sib.webservices.*=all=enabled. If you encounter a problem that you think might be related to WS-Notification, you can check for error messages in the WebSphere Application Server administrative console, and in the application server SystemOut.log file. You can also enable the application server debug trace to provide a detailed exception dump.
A list of the main known restrictions that apply when using WS-Notification is provided in WS-Notification - known restrictions.
WebSphere Application Server system messages are logged from a variety of sources, including application server components and applications. Messages logged by application server components and associated IBM® products start with a unique message identifier that indicates the component or application that issued the message. The prefix for the WS-Notification component is CWSJN.
The Troubleshooter reference: Messages topic contains information about all WebSphere Application Server messages, indexed by message prefix. For each message there is an explanation of the problem, and details of any action that you can take to resolve the problem.
If you try to create a WS-Notification service, and you get the following stack trace, then SDO repository is not configured correctly. To resolve this problem, see Installing and configuring the SDO repository.
java.lang.Exception: com.ibm.ws.sib.webservices.admin.config.SIBConfigException: CWSWS5010E: Failed to store WSDL located at http://www.ibm.com/websphere/wsn/notification-broker due to the following exception: com.ibm.ws.sib.webservices.exception.SIBWSUnloggedException: CWSWS1007E: The following exception occurred: com.ibm.ws.sdo.config.repository.impl.RepositoryRuntimeException: javax.transaction.TransactionRolledbackException: CORBA TRANSACTION_ROLLEDBACK 0x0 No; nested exception is: org.omg.CORBA.TRANSACTION_ROLLEDBACK: javax.transaction.TransactionRolledbackException: ; nested exception is: javax.ejb.TransactionRolledbackLocalException: ; nested exception is: com.ibm.ws.ejbpersistence.utilpm.PersistenceManagerException: PMGR1014E: Exception occured when getting connection factory: com.ibm.websphere.naming.CannotInstantiateObjectException: threw NameNotFoundException while the JNDI NamingManager was processing a javax.naming.Reference object. [Root exception is javax.naming.NameNotFoundException: Context: smeagolNode03Cell/nodes/smeagolNode03/servers/server1, name: jdbc/com.ibm.ws.sdo.config/SdoRepository: First component in name com.ibm.ws.sdo.config/SdoRepository not found. [Root exception is org.omg.CosNaming.NamingContextPackage.NotFound: IDL:omg.org/CosNaming/NamingContext/NotFound:1.0]] vmcid: 0x0 minor code: 0 completed: No.
In some situations you might receive more notifications at a given notification consumer than the number of event notifications that have been inserted into the notification broker by a publisher. For example you might publish 4 messages, and receive 8, 12, 16 (or some other multiple of four) messages at the notification consumer.
This is normally caused by there being two or more active subscriptions that target the notification consumer - a situation that can occur if the subscriber application is run more than once. Each time the Subscribe operation is called, a new subscription must be created by the notification broker (see section 4.2 of the Web Services Base Notification specification), which causes duplicate messages to be delivered if a previous subscription exists.
To check whether this is what is happening, examine the SubscriptionReference property of the notifications received by the notification consumer. This endpoint reference contains the identifier of the subscription that caused the notification to be sent. If you find several different subscription identifiers, then there is more than one subscription active.
Subscriber applications should tidy up subscriptions when they are not required (or register them with a timeout), however you can tidy them up administratively using the run-time panels as described in Listing or deleting active WS-Notification subscriptions.
You should be wary of deleting and re-creating messaging engines on bus members for which WS-Notification-administered subscribers have been configured, because in some cases this can leave the remote Web service subscription active (and passing notification messages to the local server) even though there is no longer any record of it.
To avoid this situation you should delete the WS-Notification configuration, or just the administered subscribers, in a separate step to deleting the messaging engine. When the dynamic configuration update is then processed, or the server restarted, the remote Web service subscription is tidied up cleanly.
When you remove messaging engines from a cluster you should remove them in numerical order from highest to lowest, so as to avoid a situation where (for example) there are messaging engines numbered 001 and 002 and not 000. The reason for this is to provide additional surety when using WS-Notification, because WS-Notification attaches special significance to the "first" messaging engine in the cluster.
In a clustered use pattern there can be more than one messaging engine running on the "bus member" (cluster). Administered subscribers are defined against a service point (bus member) and so there are several alternatives when choosing the messaging engine that is responsible for creating the subscription to the remote Web service. In this situation, the "first" messaging engine in the cluster is responsible for making the subscription. For example in a cluster containing three messaging engines the messaging engines will have names following the pattern xxx-000-yyy, xxx-001-yyy, xxx-002-yyy, and the administered subscriber subscriptions will be managed by the "000" messaging engine.
If you delete the "000" messaging engine from the cluster then restart the servers, the administered subscriptions are now managed by the "001" messaging engine - being the lowest number engine in the cluster. However, as previously mentioned, deleting and recreating messaging engines on bus members for which administered subscribers have been configured can leave the remote Web service subscription active (and passing notification messages to the local server) even though there is no longer any record of it. Therefore if another messaging engine is later added to the cluster and there is no xxx-000-yyy messaging engine currently defined the new engine takes on the name of xxx-000-yyy. Therefore, in this instance it is possible for two messaging engines to concurrently believe that they manage the administered subscription, resulting in multiple subscriptions being made to the remote Web service.
In the unlikely event that you need to re-create messaging engine xxx-000-yyy, you can avoid duplicate messages from an administered subscription by completing the following steps:
Applications wishing to publish event notifications into the broker make use of the Notify operation. This is defined as a one-way (Web services) operation which means that it is not possible to return a fault (exception) if it is not possible to complete the operation. Thus the application will assume that the notification was successful, but subscribing applications will not receive the notification message.
The cause of this type of failure might be an application error (invalid topic syntax), or a mismatch between the application code and the server configuration (using an undefined topic namespace). Specific reasons for which an inbound notification might fail include the following:
You need to monitor this type of failure closely, because it might indicate a denial of service attack and certainly indicates that the application is not functioning correctly. The first time an inbound notification fails from a particular producing application, a warning message is sent to the SystemOut log of the server. If there are further notification failures for that producer, subsequent timed warning messages are logged at 30 minute intervals. Additional information is provided with each timed message to indicate how many failed notifications were received for that producer during the 30 minute interval.
The failure of an outbound Web service invocation (broker to application) is caused when a remote application is unavailable for invocation, and might be the result of an application failure, a network error, or a firewall configuration issue. Failure to pass event notifications to subscribed applications causes messages to build up on the subscriptions held on the server. The messages held on a given subscription can be observed using the run-time panels as described in Listing or deleting active WS-Notification subscriptions. Subscriptions for which the most recent event notification attempt has failed in this way are marked as being in ERROR state when viewed in the WS-Notification subscription runtime administration panel.
If the WS-Notification service point fails to successfully notify a NotificationConsumer application, a warning message is sent to the application server SystemOut log and the subscription is told to wait for 2 minutes. Reasons for a failure of this type might be that the remote Web service is not currently available, or that network conditions prevent contact between the local server and the service.
After 2 minutes, the notification is retried. If delivery is still not possible then the subscription is put back into a wait state for another 2 minutes. If the failure is caused by a transient I/O error, this pattern is repeated indefinitely, until the notification is either successfully delivered or you delete the subscription. If the error is caused by an application failure on the remote side then the notification will be retried up to the number of times defined in the 'Maximum failed deliveries' setting of the service integration bus topic space destination from which the message is being received. After the first warning message is output to the SystemOut log, subsequent timed warning messages are logged at 30 minute intervals.
The act of subscribing to the broker or registering a publisher creates a stateful resource on the server that consumes system resources while it is active. Normally an application specifies a termination time as part of the act of creating these resources, and thus they are automatically deleted when the termination time is reached. However it is also possible for the application to request an infinite lifetime for the resource. If this is done then it is possible for resources to remain on the server indefinitely even though the application might never be coming back to use (or destroy) the resource.
You can to view the stateful resources (subscriptions and publisher registrations) using the run-time panels described in Interacting at run time with WS-Notification. These panels also provide the ability to administratively delete the items if required. Only do this if you are sure that the application is no longer using the resource because it will cause application failures if the resource is referenced after being deleted.
When you create a subscription using a WS-Notification application, in other words by using the Subscribe operation, one or more durable subscriptions are created in the relevant service integration bus topic space destination. You can view these durable subscriptions in the service integration bus runtime panels for the publication point.
To delete a subscription that was created by a WS-Notification application, use the runtime panels provided by the WS-Notification implementation, as described in Interacting at run time with WS-Notification. This approach closes the active consumer and automatically deletes the related service integration bus durable subscriptions.
WebSphere Application Server depends on being able to access a running service integration bus messaging engine to send and receive messages, and to create and retrieve state for the various Web service resources that are created.
You can stop a messaging engine using the MBean interface or run-time panels. This prevents WS-Notification from successfully servicing any requests from applications that might come in during the time that the messaging engine is stopped. In this situation, error messages are logged as described in Failure of an inbound (application to broker) notification and Failure of an outbound (broker to application) notification. When you stop a messaging engine, all WS-Notification processing stops and all messaging applications cease to function. When you restart the messaging engine, WS-Notification processing resumes.
The WS-Notification configuration artefacts often depend on objects defined in other areas of the server configuration. For example the endpoint listeners through which application requests are received, and the service integration bus topic spaces to and from which messages are sent.
The following items describe the action that is taken by the WS-Notification runtime code when it meets relevant changes in the objects upon which it depends.
The service integration bus topic space is the primary messaging object upon which WS-Notification depends at run time. Notification messages from an application are published to the topic space specified by the (permanent) topic namespace mapping specified by the administrator.
Deleting a service integration bus topic space has the following effects upon new and existing WS-Notification applications:
Deleting the topic namespace mapping that was used to establish a (currently active) subscription has the same effect as deleting the underlying service integration bus topic space as defined previously, and subscriptions that were created using this namespace mapping are deleted.
Publisher registrations and pull points associated with the deleted topic namespace mapping are also deleted.
The fields of a permanent topic namespace mapping are read-only fields, so the only way to "change" the fields is to delete the namespace mapping and recreate it with new values. The effect of deleting a permanent topic namespace mapping is described in the previous item.
java.lang.Exception: com.ibm.ws.sib.webservices.admin.config.SIBConfigException: CWSWS5010E: Failed to store WSDL located at http://www.ibm.com/websphere/wsn/notification-broker due to the following exception: com.ibm.ws.sib.webservices.exception.SIBWSUnloggedException: CWSWS1007E: The following exception occurred: com.ibm.ws.sdo.config.repository.impl.RepositoryRuntimeException: javax.transaction.TransactionRolledbackException: CORBA TRANSACTION_ROLLEDBACK 0x0 No; nested exception is: org.omg.CORBA.TRANSACTION_ROLLEDBACK: javax.transaction.TransactionRolledbackException: ; nested exception is: javax.ejb.TransactionRolledbackLocalException: ; nested exception is: com.ibm.ws.ejbpersistence.utilpm.PersistenceManagerException: PMGR1014E: Exception occured when getting connection factory: com.ibm.websphere.naming.CannotInstantiateObjectException: threw NameNotFoundException while the JNDI NamingManager was processing a javax.naming.Reference object. [Root exception is javax.naming.NameNotFoundException: Context: KADGINNode01Cell/nodes/KADGINNode01/servers/server1, name: jdbc/com.ibm.ws.sdo.config/SdoRepository: First component in name com.ibm.ws.sdo.config/SdoRepository not found. [Root exception is org.omg.CosNaming.NamingContextPackage.NotFound: IDL:omg.org/CosNaming/NamingContext/NotFound:1.0]] vmcid: 0x0 minor code: 0 completed: No.For details on how to configure the SDO repository, see Installing and configuring the SDO repository.