Hash Lookup Function

The Hash Lookup Function obtains the value for a destination column from a lookup table, according to a hashed value derived from a source column. The Hash Lookup Function allows you to consistently mask data when you use the same source and lookup tables in any environment.

The source column that is hashed does not need to be a column that will be replaced by lookup table values.

The Hash Lookup Function is case-sensitive. For example, the source values John and JOHN will be hashed to different values. You can use the TRIM parameter to convert the source value to uppercase before it is hashed.

There are two forms of the Hash Lookup Function, single column and multiple column. The single column form inserts a value into a single destination column. The multiple column form inserts values from multiple lookup table columns into corresponding destination columns, based on a single hash value from a source column.

You can enter the multiple column Hash Lookup Function for any source column that will be replaced by lookup table values, but you must edit the Column Map to remove the names of remaining source columns that will also be replaced.

The lookup table must include a key column that contains sequential number values without any gaps, and the remaining columns contain replacement values. The key column must be a numeric data type. The lookup table is typically indexed. The function hashes a source column to derive sequential numbers from 1 to the maximum value in the key column of the lookup table. The hashed value from the source table is matched with the sequential numbers in the lookup table, and values from the corresponding lookup table row are inserted at the destination.

If the source column used to derive the hashed value contains certain values (NULL, spaces (for CHAR columns), zero-length VARCHAR), the value is not hashed and the following reserved values are used as keys to the lookup table:

Source Value Lookup Table Key
NULL -1
spaces (CHAR or VARCHAR) -2
zero-length VARCHAR -3

The lookup table should include a row for each of these numbers, allowing you to insert a lookup value for each of these source values. If one of these source values is found and a corresponding number is not in the lookup table, a conversion error is reported.

The IGNORE parameter allows you to ignore the lookup table and use a source value when a row in a specified source column contains a specified value (NULL, SPACES (for CHAR columns), or zero-length VARCHAR).

You can use the PRESERVE parameter to ignore the lookup table and use a source value when a row in a specified source column contains a specified value (NULL, SPACES (for CHAR columns), or zero-length VARCHAR). PRESERVE can also be used to ignore the lookup table if a source column does not contain a value.

The TRIM parameter allows you to specify characters that will be trimmed from the source value before it is hashed. For example, if you choose to trim commas from a source value, the values Smith, John, and Smith John will each be hashed to the same value. You can also use this parameter to convert the source value to uppercase before it is hashed.

If the source value is converted to uppercase, the trim characters are also converted to uppercase.

You can use the SEED parameter to vary the calculation performed by the hashing algorithm. The hashed value from the source column and the SEED value are matched with a sequential number from the lookup table to obtain the replacement value for the destination column.

The syntax is:

HASH_LOOKUP( [sourcecol,] [trim=([char1char2 ] [\u]),]
dest=(col1, coln), lktablename (search,
{ value | values=(col1, coln) } ) [ ,cache | ,nocache ]
[,ignore=(col (spaces, null, zero_len ), )
| PRESERVE=( [ NOT_FOUND, ] colname (spaces, null, zero_len), ) ] )[,seed=n])
sourcecol
Name of the source table column from which hashed values are derived (optional). If not specified, the name of the destination column is used.
trim=
List of characters to be trimmed from the source value before it is hashed as well as an option to convert the source value to uppercase before it is hashed. If the resulting source value is NULL or all spaces after characters have been trimmed, the source value will not be hashed and will be assigned the appropriate reserved value (-1 or -2).
char1char2...
Characters to be trimmed from the source value before it is hashed. The list is case-sensitive. You can specify a space or comma as a character. After the initial occurrence of a character, any additional occurrences in the list are ignored.

To specify a backslash “\” or a right parentheses “)”, you must precede the character with a backslash escape character. For example, to specify a right parentheses, enter: trim=(\)).

You can only use the escape character with a backslash, a right parentheses, or as part of the uppercase indicator.

\u
Indicates the source value is to be converted to uppercase before it is hashed. The characters to be trimmed are also converted to uppercase.
dest=
Names of the destination table columns in which values from the lookup table are inserted. (Required for multiple column lookup.)
col1,coln
Destination table column names. The order of the column names must correspond to the lookup table columns in the values= parameter.
lktablename
Name of the lookup table. You may specify the lookup table name as dbalias.creatorid.tablename, creatorid.tablename, or tablename. If you do not fully qualify the table name, the qualifiers for the destination table are used.
search
Name of the column in the lookup table that contains sequential values to match against the hash values from the source column.
value
Name of the column in the lookup table that contains the translated search value to be inserted at the destination. (Required for single column lookup.)
values=
Names of the columns in the lookup table that contain values to be inserted at the destination. (Required for multiple column lookup.)
col1,coln
Lookup table column names. The order of the column names must correspond to the destination table columns in the dest= parameter.
cache | nocache
Specify CACHE (default) to maintain a table of found lookup values in memory or NOCACHE to discard found values. Using CACHE is faster when retrieving a value many times, but requires extra memory.
ignore=
List of source columns with values that are inserted at the destination instead of the lookup value when the column has a row with a stated value (NULL, SPACES, ZERO, or zero-length VARCHAR).
col
The source column name.

For single column lookup, enter one column name only.

For multiple column lookup, the order of the column names must correspond to the destination table columns in the DEST= parameter. The number of columns must equal the columns in the DEST= parameter, and at least one column must include values. To not specify values for a column, do not enter a value. For example, coln().

NULL
Ignore the lookup table if the source column row has a NULL value.
SPACES
Ignore the lookup table if the source column row has a SPACES value. For CHAR columns only.
ZERO_LEN
Ignore the lookup table if the source column row has a zero-length VARCHAR value.
PRESERVE=
List of source columns with values that are inserted at the destination instead of the lookup value when the column contains a stated value (NOT_FOUND, null, spaces, or zero-length varchar).
NOT_FOUND
Ignore the lookup table if no match is found for the source column row.
Note:

Preserve= and ignore= are mutually exclusive. Ignore= will be deprecated in a future release.

The col, null, spaces, and zero-length varchar operands have the same effect when used with either preserve= or ignore=. Details on these operands are shown above with ignore=.

seed=
Use SEED= to vary the hashing algorithm calculation. Values from 1 to 2,000,000,000 can be used. If you use a value of 0, the SEED parameter is ignored.

Single Column Example

Use the Hash Lookup Function to insert values from a column in a lookup table into a destination table column, based on a value hashed from a source column.

For example, assume the source column, FIRST_NAME, contains first names and the destination column will include replacement first names from the lookup table. A lookup table, NAME_LOOKUP, contains a column (FIRST) with first names and a column (SEQ) containing sequential values.

To obtain values for the destination column using the NAME_LOOKUP table, specify:

HASH_LOOKUP(FIRST_NAME,NAME_LOOKUP(SEQ, FIRST))

The Hash Lookup Function matches the hash values from the source column with values in the SEQ column of the NAME_LOOKUP table. When a match is found, the function inserts the corresponding value from the FIRST column into the destination column.

Multiple Column Example

Use the Hash Lookup Function to insert values from columns in a lookup table row into columns in a destination table row, based on a value hashed from a source column.

For example, based on values hashed from a source column (FIRST_NAME) that contains first names, you can replace values in destination columns (FIRST and LAST) with first and last names from a lookup table. A lookup table named NAME_LOOKUP contains a column (SEQ) with sequential values as well as columns (FIRST_MASK and LAST_MASK) to mask values in the destination.

To replace names in the destination table based on values hashed from a source column, specify:

HASH_LOOKUP(FIRST_NAME,DEST=(FIRST,LAST), NAME_LOOKUP(SEQ,VALUES=(FIRST_MASK, LAST_MASK)))

The Hash Lookup Function matches the hash values from the source FIRST_NAME column with values in the SEQ column of the NAME_LOOKUP table. When a match is found, the function inserts the corresponding values from the lookup table FIRST_MASK and LAST_MASK columns into the corresponding destination columns.

Ignore Example

Use the following statement to extend the Single Column Example, where you want to use the source NULL and SPACES values instead of lookup table values:

HASH_LOOKUP(FIRST_NAME,NAME_LOOKUP(SEQ, FIRST),IGNORE=(FIRST_NAME(NULL,SPACES)))

NoCache Example

Use the following statement to extend the Single Column Example, where you do not want to maintain a table of found lookup values in memory:

HASH_LOOKUP(FIRST_NAME,NAME_LOOKUP(SEQ, FIRST),NOCACHE)

Trim Example

Use the following statement to extend the Single Column Example, where you want to trim spaces and commas from the source value as well as convert the source value to uppercase before it is hashed:

HASH_LOOKUP(FIRST_NAME, TRIM=( ,\u),NAME_LOOKUP(SEQ,FIRST))