This section describes common strategies for parallel programming
using the Compute Grid Parallel
Job Manager.
Parameterization Strategies
Determining
the number of subordinate jobs is a key decision for the top-level-job.
The top-level-job calls the Parameterizer SPI to get this number.
The Parameterizer receives all xJCL properties that you specify
on the top-level-job's PJM job step. The Parameterizer can then
use these properties to decide on the number of subordinate jobs needed
for that parallel job.
- Subordinate job xJCL Property Substitution
- Besides determining the number of subordinate jobs, the Parameterizer
is also responsible supplying substitution properties for each subordinate
job instance. This is optional, but typically you want to specify
unique processing instructions to each subordinate job, such as what
range of data a subordinate job should process. The Parameterizer
can return a properties object array that contains these substitutions.
When the Parameterizer returns a subordinate job substitutions properties
object array, the PJM applies those substitutions to the subordinate
jobs it creates. The zeroth array object supplies substitutions to
the first subordinate job, and so on.
- Built-in Parameterizer
- The simplest approach is to use the Parameterizer implementation
that comes with Compute Grid. The class name is ‘com.ibm.ws.batch.parallel.BuiltInParameterizer'.
This SPI implementation supports two top-level-job xJCL properties:
- com.ibm.wsspi.batch.parallel.jobs
Specify the number of subordinate
job instances you want on this property. For example, <prop
name='com.ibm.wsspi.batch.parallel.jobs' value='2' >
- com.ibm.wsspi.batch.parallel.prop.<sub-job-number>.<substitution-property-name>
Use
this property to specify substitution properties for each subordinate
job instance. <sub-job-number> specifies the
logical subordinate job instance to which the property belongs. <substitution-property-name>
specifies the name of the substitution property in the xJCL for which
a value is specified.
For example, if the substitution properties are defined for
the subordinate job as follows:
<prop name='com.ibm.wsspi.batch.parallel.prop.1.starting.key' value='A' />
<prop name='com.ibm.wsspi.batch.parallel.prop.1.ending.key' value='M' />
<prop name='com.ibm.wsspi.batch.parallel.prop.2.starting.key' value='N' />
<prop name='com.ibm.wsspi.batch.parallel.prop.2.ending.key' value='Z' />
then, the xJCL substitution for subordinate job 1 will
be:
- starting.key='A'
- ending.key='M'
and xJCL substitution for subordinate job 1 will be
- starting.key='N'
- ending.key='Z'
- Custom Parameterizer
- The ‘BuiltInParameterizer' allows simple parallel job execution
without custom code. However, its static nature limits you to having
to know the number of subordinate jobs required at the time you submit
the job. If you need a more dynamic or a more complex job partitioning
algorithm, you can write a customer Parameterizer.
- Many custom Parameters are data-driven. A common approach is
for a Parameterizer to read from a file or database to determine how
many subordinate jobs to create for a parallel job. The top-level-job
properties can specify the file location or database query information
necessary for a Parameterizer to access the data it requires to make
its decision.
Collector/Analyzer Strategies
The SubJobCollector/SubJobAnalyzer
pair provides an optional means for a top-level-job to receive information
from its subordinate jobs. This provides a way for the top-level-job
to establish a composite view of application-level state data from
among its set of subordinate jobs.
The Batch Container calls
the SubJobCollector for a subordinate job at the end of each checkpoint.
Subordinate jobs have one or more checkpoints. The SubJobCollector
allows a subordinate job to send a Java Externalizable object to its
owning top-level-job.
The PJM calls the SubJobAnalyzer in two
cases:
- to deliver a SubJobCollector Externalizable
- to deliver the return code from a complete subordinate job
A common use of the SubJobCollector/SubJobAnalyzer is
to track error threshold across subordinate jobs. For example, if
you have a batch processing strategy to end the parallel job if more
than N% of records are in error, the ‘collector' can send the
local subordinate job error count for each subordinate job and the
‘analyzer' can tally up the total. If the total exceeds the
threshold you allow, the ‘analyzer' can throw a rollback exception
to end the top-level-job in restartable state.
Commit/Rollback Strategies
The Synchronization
SPI provides a way to coordinate a logical transaction across all
the subordinate jobs of a given parallel job. Each subordinate job
runs in its own transactional scope and has its own checkpoints.
For some parallel jobs, allowing commit/rollback autonomy at the
subordinate job level is acceptable. For other parallel jobs, there
may be a business requirement to coordinate commit/rollback across
all the subordinate jobs. This can only be done by the top-level-job.
Through the Synchronization SPI, the top-level-job can orchestrate
a compensation-based commit/rollback model. This may take different
forms, including hidden record and undo records patterns.
- Hidden Record
- In the hidden record approach, you have a flag in your database
record that indicates whether the record is hidden or visible. This
approach can work well for parallel jobs that create new records.
The record is created in the hidden state and then during the Synchronization
commit or rollback call the record is either updated to the visible
state or deleted. Obviously, this technique requires that other applications
respect the hidden flag. In some cases, this can be built into the
database views or queries applications use to access the data.
- Undo Record
- In the undo record approach, subordinate jobs store a ‘before'
image of records they update. In the Synchronization commit or rollback
calls you either discard the before image or use it to restore the
target record to its original state.