Tips for hash join
Hash join has several advantages over nested loop join:
- The structure of a temporary hash table is simpler
than that of an index, so less CPU processing is required to build and probe
a hash table.
- The rows in the hash result table contain all of
the data required by the query so there is no need to access the table data
of the table with random I/O when probing the hash table.
- Like join values are clustered, so all matching
rows for a given join value can usually be accessed with a single I/O
request.
- The hash temporary hash table can be built using
Symmetric Multiprocessing (SMP) parallelism.
- Unlike indexes, entries in hash tables are not
updated to reflect changes of column values in the underlying table. The
existence of a hash table does not affect the processing cost of other
updating jobs in the system.
Hash join cannot be used for queries that:
- Perform subqueries unless all subqueries in the
query can be transformed to inner joins.
- Perform a UNION or UNION ALL.
- Perform left outer or exception join.
- Use a DDS created join logical file.
- Have scrollable cursors.
- The size of the pool that the query is running in
is less than or equal to 20 megabytes.
The following discourages a hash join from being used:
- OPTIMIZE for nn Rows when nn is less than the
actual result set size.
- Running in a pool that has a small pool size.
The following encourages a hash join to be used:
- The Symmetric Multiprocessing (SMP) product is
installed on the system (server).
- The system degree value (QQRYDEGREE) is set to
*MAX.
For more information, see Database Performance and Query Optimization in the Information Center
.