The XLSMPOPTS environment variable sets options for program run time using loop parallelization. Suboptions for the XLSMPOPTS environment variables are discussed in detail in Suboptions of the XLSMPOPTS environment variable for parallel processing.
If you are using OpenMP constucts for parallelization, you can also specify runtime options using OMP environment variables, as discussed in OpenMP environment variables for parallel processing.
When runtime options specified by OMP- and XLSMPOPTS environment variables conflict, OMP options will prevail.
Runtime options affecting parallel processing can be specified with the XLSMPOPTS environment variable. This environment variable must be set before you run an application, and uses basic syntax of the form:
.-:-------------------. V | >>-XLSMPOPTS--=------option_and_args---+-----------------------><
For example, to have a program run time create 4 threads and use dynamic scheduling with chunk size of 5, you would set the XLSMPOPTS environment variable as shown below:
XLSMPOPTS=PARTHDS=4:SCHEDULE=DYNAMIC=5
Runtime option settings for the XLSMPOPTS environment variable are shown below, grouped by category:
XLSMPOPTS environment variable option | Description |
---|---|
schedule=algorithm=[n] | This option specifies the scheduling
algorithm used for loops not explicitly assigned a scheduling algorithm.
Valid options for algorithm are:
If specified, the chunk size n must be an integer value of 1 or greater. The default scheduling algorithm is static. |
XLSMPOPTS environment variable option | Description |
---|---|
parthds=num | num represents
the number of parallel threads requested, which is usually equivalent to the
number of processors available on the system.
Some applications cannot use more threads than the maximum number of processors available. Other applications can experience significant performance improvements if they use more threads than there are processors. This option gives you full control over the number of user threads used to run your program. The default value for num is the number of processors available on the system. |
usrthds=num | num represents
the number of user threads expected.
This option should be used if the program code explicitly creates threads, in which case num should be set to the number of threads created. The default value for num is 0. |
stack=num | num specifies
the largest amount of space required for a thread's stack.
The default value for num is 2097152. The glibc library is compiled by default to allow a stack size of 2 Mb. Setting num to a value greater than this will cause the default stack size to be used. If larger stack sizes are required, you should link the program to a glibc library compiled with the FLOATING_STACKS parameter turned on. |
XLSMPOPTS environment variable option | Description |
---|---|
spins=num | num represents
the number of loop spins, or iterations, before a yield occurs.
When a thread completes its work, the thread continues executing in a tight loop looking for new work. One complete scan of the work queue is done during each busy-wait state. An extended busy-wait state can make a particular application highly responsive, but can also harm the overall responsiveness of the system unless the thread is given instructions to periodically scan for and yield to requests from other applications. A complete busy-wait state for benchmarking purposes can be forced by setting both spins and yields to 0. The default value for num is 100. |
startproc=CPU ID | Enables thread binding and specifies the CPU ID to which the first thread binds. If the value provided is outside the range of available processors, the SMP run time issues a warning message and no threads are bound. |
stride=Number | Specifies the increment used to determine the CPU ID to which subsequent threads bind. Number must be greater than or equal to 1. If the value provided would cause a thread to be bound to a CPU outside the range of available processors, a warning message is issued and no threads are bound. |
yields=num | num represents
the number of yields before a sleep occurs.
When a thread sleeps, it completely suspends execution until another thread signals that there is work to do. This provides better system utilization, but also adds extra system overhead for the application. The default value for num is 100. |
delays=num | num represents
a period of do-nothing delay time between each scan of the work queue. Each
unit of delay is achieved by running a single no-memory-access delay loop.
The default value for num is 500. |
XLSMPOPTS environment variable option | Description |
---|---|
profilefreq=num | num represents
the sampling rate at which each loop is revisited to determine appropriateness
for parallel processing.
The runtime library uses dynamic profiling to dynamically tune the performance of automatically-parallelized loops. Dynamic profiling gathers information about loop running times to determine if the loop should be run sequentially or in parallel the next time through. Threshold running times are set by the parthreshold and seqthreshold dynamic profiling options, described below. If num is 0, all profiling is turned off, and overheads that occur because of profiling will not occur. If num is greater than 0, running time of the loop is monitored once every num times through the loop. The default for num is 16. The maximum sampling rate is 32. Values of num exceeding 32 are changed to 32. |
parthreshold=mSec | mSec specifies
the expected running time in milliseconds below which a loop must be run sequentially. mSec can be specified using decimal places.
If parthreshold is set to 0, a parallelized loop will never be serialized by the dynamic profiler. The default value for mSec is 0.2 milliseconds. |
seqthreshold=mSec | mSec specifies
the expected running time in milliseconds beyond which a loop that has been
serialized by the dynamic profiler must revert to being run in parallel mode
again. mSec can be specified using decimal places.
The default value for mSec is 5 milliseconds. |
Related information
OpenMP runtime options affecting parallel processing are set by specifying OMP environment variables. These environment variables, use syntax of the form:
>>-env_variable--=--option_and_args----------------------------><
If an OMP environment variable is not explicitly set, its default setting is used.
OpenMP runtime options fall into different categories as described below:
OMP_SCHEDULE=algorithm | This option specifies the scheduling
algorithm used for loops not explictly assigned a scheduling algorithm with
the omp schedule directive. For example:
OMP_SCHEDULE="guided, 4" Valid options for algorithm are:
If specifying a chunk size with n, the value of n must be an integer value of 1 or greater. The default scheduling algorithm is static. |
OMP_NUM_THREADS=num | num represents
the number of parallel threads requested, which is usually equivalent to the
number of processors available on the system.
This number can be overridden during program execution by calling the omp_set_num_threads( ) runtime library function. Some applications cannot use more threads than the maximum number of processors available. Other applications can experience significant performance improvements if they use more threads than there are processors. This option gives you full control over the number of user threads used to run your program. The default value for num is the number of processors available on the system. You can override the setting of OMP_NUM_THREADS for a given parallel section by using the num_threads clause available in several #pragma omp directives. |
OMP_NESTED=TRUE|FALSE | This environment variable enables
or disables nested parallelism. The setting of this environment variable can
be overridden by calling the omp_set_nested( ) runtime
library function.
If nested parallelism is disabled, nested parallel regions are serialized and run in the current thread. In the current implementation, nested parallel regions are always serialized. As a result, OMP_SET_NESTED does not have any effect, and omp_get_nested() always returns 0. If -qsmp=nested_par option is on (only in non-strict OMP mode), nested parallel regions may employ additional threads as available. However, no new team will be created to run nested parallel regions. The default value for OMP_NESTED is FALSE. |
OMP_DYNAMIC=TRUE|FALSE | This environment variable enables
or disables dynamic adjustment of the number of threads available for running
parallel regions.
If set to TRUE, the number of threads available for executing parallel regions may be adjusted at run time to make the best use of system resources. See the description for profilefreq=num in Dynamic profiling options for more information. If set to FALSE, dynamic adjustment is disabled. The default setting is TRUE. |