Programming strategies

This section describes common strategies for parallel programming using the Compute Grid Parallel Job Manager.

Parameterization Strategies

Determining the number of subordinate jobs is a key decision for the top-level-job. The top-level-job calls the Parameterizer SPI to get this number. The Parameterizer receives all xJCL properties that you specify on the top-level-job's PJM job step. The Parameterizer can then use these properties to decide on the number of subordinate jobs needed for that parallel job.
Subordinate job xJCL Property Substitution
Besides determining the number of subordinate jobs, the Parameterizer is also responsible supplying substitution properties for each subordinate job instance. This is optional, but typically you want to specify unique processing instructions to each subordinate job, such as what range of data a subordinate job should process. The Parameterizer can return a properties object array that contains these substitutions. When the Parameterizer returns a subordinate job substitutions properties object array, the PJM applies those substitutions to the subordinate jobs it creates. The zeroth array object supplies substitutions to the first subordinate job, and so on.
Built-in Parameterizer
The simplest approach is to use the Parameterizer implementation that comes with Compute Grid. The class name is ‘com.ibm.ws.batch.parallel.BuiltInParameterizer'. This SPI implementation supports two top-level-job xJCL properties:
  1. com.ibm.wsspi.batch.parallel.jobs

    Specify the number of subordinate job instances you want on this property. For example, <prop name='com.ibm.wsspi.batch.parallel.jobs' value='2' >

  2. com.ibm.wsspi.batch.parallel.prop.<sub-job-number>.<substitution-property-name>

    Use this property to specify substitution properties for each subordinate job instance. <sub-job-number> specifies the logical subordinate job instance to which the property belongs. <substitution-property-name> specifies the name of the substitution property in the xJCL for which a value is specified.

For example, if the substitution properties are defined for the subordinate job as follows:
<prop name='com.ibm.wsspi.batch.parallel.prop.1.starting.key' value='A' />
<prop name='com.ibm.wsspi.batch.parallel.prop.1.ending.key' value='M' />
<prop name='com.ibm.wsspi.batch.parallel.prop.2.starting.key' value='N' />
<prop name='com.ibm.wsspi.batch.parallel.prop.2.ending.key' value='Z' />
then, the xJCL substitution for subordinate job 1 will be:
  • starting.key='A'
  • ending.key='M'
and xJCL substitution for subordinate job 1 will be
  • starting.key='N'
  • ending.key='Z'
Custom Parameterizer
The ‘BuiltInParameterizer' allows simple parallel job execution without custom code. However, its static nature limits you to having to know the number of subordinate jobs required at the time you submit the job. If you need a more dynamic or a more complex job partitioning algorithm, you can write a customer Parameterizer.
Many custom Parameters are data-driven. A common approach is for a Parameterizer to read from a file or database to determine how many subordinate jobs to create for a parallel job. The top-level-job properties can specify the file location or database query information necessary for a Parameterizer to access the data it requires to make its decision.

Collector/Analyzer Strategies

The SubJobCollector/SubJobAnalyzer pair provides an optional means for a top-level-job to receive information from its subordinate jobs. This provides a way for the top-level-job to establish a composite view of application-level state data from among its set of subordinate jobs.

The Batch Container calls the SubJobCollector for a subordinate job at the end of each checkpoint. Subordinate jobs have one or more checkpoints. The SubJobCollector allows a subordinate job to send a Java Externalizable object to its owning top-level-job.

The PJM calls the SubJobAnalyzer in two cases:
  1. to deliver a SubJobCollector Externalizable
  2. to deliver the return code from a complete subordinate job

A common use of the SubJobCollector/SubJobAnalyzer is to track error threshold across subordinate jobs. For example, if you have a batch processing strategy to end the parallel job if more than N% of records are in error, the ‘collector' can send the local subordinate job error count for each subordinate job and the ‘analyzer' can tally up the total. If the total exceeds the threshold you allow, the ‘analyzer' can throw a rollback exception to end the top-level-job in restartable state.

Commit/Rollback Strategies

The Synchronization SPI provides a way to coordinate a logical transaction across all the subordinate jobs of a given parallel job. Each subordinate job runs in its own transactional scope and has its own checkpoints. For some parallel jobs, allowing commit/rollback autonomy at the subordinate job level is acceptable. For other parallel jobs, there may be a business requirement to coordinate commit/rollback across all the subordinate jobs. This can only be done by the top-level-job. Through the Synchronization SPI, the top-level-job can orchestrate a compensation-based commit/rollback model. This may take different forms, including hidden record and undo records patterns.
Hidden Record
In the hidden record approach, you have a flag in your database record that indicates whether the record is hidden or visible. This approach can work well for parallel jobs that create new records. The record is created in the hidden state and then during the Synchronization commit or rollback call the record is either updated to the visible state or deleted. Obviously, this technique requires that other applications respect the hidden flag. In some cases, this can be built into the database views or queries applications use to access the data.
Undo Record
In the undo record approach, subordinate jobs store a ‘before' image of records they update. In the Synchronization commit or rollback calls you either discard the before image or use it to restore the target record to its original state.