Parsing is separating data and assigning parts of it into one or more variables. Parsing can assign each word in the data into a variable or can divide the data into smaller parts. Parsing is also useful to format data into columns.
The variables to receive data are named in a template. A template is a model telling how to split the data. It can be as simple as a list of variables to receive data. More complex templates can contain patterns; section Parsing with Patterns explains patterns.
The REXX parsing instructions are PULL, ARG, and PARSE. (PARSE has several variants.)
Other chapters show PULL as an instruction that reads input and assigns it to one or more variables. If the program stack contains information, the PULL instruction takes information from the program stack. When the program stack is empty, PULL takes information from the current terminal input device. See section Getting Information from the Program Stack or Terminal Input Device for information about the data stack.
/* This REXX program parses the string "Knowledge is power." */
PULL word1 word2 word3
/* word1 contains 'KNOWLEDGE' */
/* word2 contains 'IS' */
/* word3 contains 'POWER.' */
PULL uppercases character information before assigning it into variables. If you do not want uppercase translation, use the PARSE PULL instruction.
/* This REXX program parses the string: "Knowledge is power." */
PARSE PULL word1 word2 word3
/* word1 contains 'Knowledge' */
/* word2 contains 'is' */
/* word3 contains 'power.' */
You can include the optional keyword UPPER on any variant of the PARSE instruction.This causes the language processor to uppercase character information before assigning it into variables. For example, using PARSE UPPER PULL... gives the same result as using PULL.
The ARG instruction takes information passed as arguments to a program, function, or subroutine, and puts it into one or more variables. To pass the three arguments Knowledge is power. to a REXX program named sample:
REXX sample Knowledge is power.
/* SAMPLE -- A REXX program using ARG */
ARG word1 word2 word3
/* word1 contains 'KNOWLEDGE' */
/* word2 contains 'IS' */
/* word3 contains 'POWER.' */
ARG uppercases the character information before assigning the arguments into variables.
If you do not want uppercase translation, use the PARSE ARG instruction instead of ARG.
/* REXX program using PARSE ARG */
PARSE ARG word1 word2 word3
/* word1 contains 'Knowledge' */
/* word2 contains 'is' */
/* word3 contains 'power.' */
PARSE UPPER ARG has the same result as ARG. It uppercases character information before assigning it into variables.
The PARSE VALUE...WITH instruction parses a specified expression, such as a literal string, into one or more variables whose names follow the WITH subkeyword.
PARSE VALUE 'Knowledge is power.' WITH word1 word2 word3
/* word1 contains 'Knowledge' */
/* word2 contains 'is' */
/* word3 contains 'power.' */
PARSE VALUE does not uppercase character information before assigning it into variables. If you want uppercase translation, use PARSE UPPER VALUE. You could use a variable instead of a string in PARSE VALUE (you would first assign the variable the value):
string='Knowledge is power.'
PARSE VALUE string WITH word1 word2 word3
/* word1 contains 'Knowledge' */
/* word2 contains 'is' */
/* word3 contains 'power.' */
Or you can use PARSE VAR to parse a variable.
The PARSE VAR instruction parses a specified variable into one or more variables.
quote = 'Knowledge is power.'
PARSE VAR quote word1 word2 word3
/* word1 contains 'Knowledge' */
/* word2 contains 'is' */
/* word3 contains 'power.' */
PARSE VAR does not uppercase character information before assigning it into variables. If you want uppercase translation, use PARSE UPPER VAR.
In the preceding examples, the number of words in the data to parse is always the same as the number of variables in the template. Parsing always assigns new values to all variables named in the template. If there are more variable names than words in the data to parse, the leftover variables receive null (empty) values. If there are more words in the data to parse than variable names in the template, each variable gets one word of data in sequence except the last variable, which gets the remainder of the data.
In the next example, there are more variable names in the template than words of data; the leftover variable receives a null value.
PARSE VALUE 'Extra variables' WITH word1 word2 word3
/* word1 contains 'Extra' */
/* word2 contains 'variables' */
/* word3 contains '' */
In the next example there are more words in the data than variable names in the template; the last variable gets the remainder of the data. The last variable name can contain several words and possibly leading and trailing blanks.
PARSE VALUE 'More words in data' WITH var1 var2 var3
/* var1 contains 'More' */
/* var2 contains 'words' */
/* var3 contains ' in data' */
Parsing into words generally removes leading and trailing blanks from each word before putting it into a variable. However, when putting data into the last variable, parsing removes one word-separator blank but retains any extra leading or trailing blanks. There are two leading blanks before words. Parsing removes both the word-separator blank and the extra leading blank before putting 'words' into var2. There are four leading blanks before in. Because var3 is the last variable, parsing removes the word-separator blank but keeps the extra leading blanks. Thus, var3 receives ' in data' (with three leading blanks).
A period in a template acts as a placeholder. It receives no data. You can use a period as a "dummy variable" within a group of variables or at the end of a template to collect unwanted information.
string='Example of using placeholders to discard junk'
PARSE VAR string var1 . var2 var3 .
/* var1 contains 'Example' */
/* var2 contains 'using' */
/* var3 contains 'placeholders' */
/* The periods collect the words 'of' and 'to discard junk' */
For more information about parsing instructions, see section PARSE.
The simplest template is a group of blank-separated variable names. This parses data into blank-delimited words. The preceding examples all use this kind of template. Templates can also contain patterns. A pattern can be a string, a number, or a variable representing either of these.
If you use a string in a template, parsing checks the input data for a matching string. When assigning data into variables, parsing generally skips over the part of the input string that matches the string in the template.
phrase = 'To be, or not to be?' /* phrase containing comma */
PARSE VAR phrase part1 ',' part2 /* template containing comma */
/* as string separator */
/* part1 contains 'To be' */
/* part2 contains ' or not to be?' */
In this example, notice that the comma is not included with 'To be' because the comma is the string separator. (Notice also that part2 contains a value that begins with a blank. Parsing splits the input string at the matching text. It puts data up to the start of the match in one variable and data starting after the match in the next variable.
When you do not know in advance what string to specify as separator in a template, you can use a variable enclosed in parentheses.
separator = ','
phrase = 'To be, or not to be?'
PARSE VAR phrase part1 (separator) part2
/* part1 contains 'To be' */
/* part2 contains ' or not to be?' */
Again, in this example, notice that the comma is not included with 'To be' because the comma is the string separator.
You can use numbers in a template to indicate the column at which to separate data. An unsigned integer indicates an absolute column position. A signed integer indicates a relative column position.
An unsigned integer or an integer with the prefix of an equal sign (=) separates the data according to absolute column position. The first segment starts at column 1 and goes up to, but does not include, the information in the column number specified. Subsequent segments start at the column numbers specified.
quote = 'Ignorance is bliss.'
....+....1....+....2
PARSE VAR quote part1 5 part2
/* part1 contains 'Igno' */
/* part2 contains 'rance is bliss.' */
The following code has the same result:
quote = 'Ignorance is bliss.'
....+....1....+....2
PARSE VAR quote 1 part1 =5 part2
/* part1 contains 'Igno' */
/* part2 contains 'rance is bliss.' */
Specifying the numeric pattern 1 is optional. If you do not use a numeric pattern to indicate a starting point for parsing, this defaults to 1. The example also shows that the numeric pattern 5 is the same as =5.
If a template has several numeric patterns and a later one is lower than a preceding one, parsing loops back to the column the lower number specifies.
quote = 'Ignorance is bliss.'
....+....1....+....2
PARSE VAR quote part1 5 part2 10 part3 1 part4
/* part1 contains 'Igno' */
/* part2 contains 'rance' */
/* part3 contains ' is bliss.' */
/* part4 contains 'Ignorance is bliss.' */
When each variable in a template has column numbers both before and after it, the two numbers indicate the beginning and the end of the data for the variable.
quote = 'Ignorance is bliss.'
....+....1....+....2
PARSE VAR quote 1 part1 10 11 part2 13 14 part3 19 1 part4 20
/* part1 contains 'Ignorance' */
/* part2 contains 'is' */
/* part3 contains 'bliss' */
/* part4 contains 'Ignorance is bliss.' */
Thus, you could use numeric patterns to skip over part of the data:
quote = 'Ignorance is bliss.'
....+....1....+....2
PARSE VAR quote 2 var1 3 5 var2 7 8 var3 var 4 var5
SAY var1||var2||var3 var4 var5 /* || means concatenate */
/* Says: grace is bliss. */
A signed integer in a template separates the data according to relative column position. The plus or minus sign indicates movement right or left, respectively, from the starting position. In the next example, remember that part1 starts at column 1 (by default because there is no number to indicate a starting point).
quote = 'Ignorance is bliss.'
....+....1....+....2
PARSE VAR quote part1 +5 part2 +5 part3 +5 part4
/* part1 contains 'Ignor' */
/* part2 contains 'ance ' */
/* part3 contains 'is bl' */
/* part4 contains 'iss.' */
+5 part2 means parsing puts into part2 data starting in column 6 (1+5=6). +5 part3 means data put into part3 starts with column 11 (6+5=11), and so on. The use of the minus sign is similar to the use of the plus sign. It identifies a relative position in the data string. The minus sign "backs up" (moves to the left) in the data string.
quote = 'Ignorance is bliss.'
....+....1....+....2
PARSE VAR quote part1 +10 part2 +3 part3 -3 part4
/* part1 contains 'Ignorance ' */
/* part2 contains 'is ' */
/* part3 contains 'bliss.' */
/* part4 contains 'is bliss.' */
In this example, part1 receives characters starting at column 1 (by default). +10 part2 receives characters starting in column 11 (1+10=11). +3 part3 receives characters starting in column 14 (11+3=14). -3 part4 receives characters starting in column 11 (14-3=11).
To provide more flexibility, you can define and use variable numeric patterns in a parsing instruction. To do this, first define the variable as an unsigned integer before the parsing instruction. Then, in the parsing instruction, enclose the variable in parentheses and specify one of the following before the left parenthesis:
(Without +, -, or = before the left parenthesis, the language processor would consider the variable to be a string pattern.) The following example uses the variable numeric pattern movex.
quote = 'Ignorance is bliss.'
....+....1....+....2
movex = 3 /* variable position */
PARSE VAR quote part5 +10 part6 +3 part7 -(movex) part8
/* part5 contains 'Ignorance ' */
/* part6 contains 'is ' */
/* part7 contains 'bliss.' */
/* part8 contains 'is bliss.' */
For more information about parsing, see Parsing.
When passing arguments to a function or a subroutine, you can specify multiple strings to be parsed. The ARG, PARSE ARG, and PARSE UPPER ARG instructions parse arguments. These are the only parsing instructions that work on multiple strings.
To pass multiple strings, use commas to separate adjacent strings.
The next example passes three arguments to an internal subroutine.
CALL sub2 'String One', 'String Two', 'String Three'
:
:
EXIT
sub2:
PARSE ARG word1 word2 word3, string2, string3
/* word1 contains 'String' */
/* word2 contains 'One' */
/* word3 contains '' */
/* string2 contains 'String Two' */
/* string3 contains 'String Three' */
The first argument is two words "String One" to parse into three variable names, word1, word2, and word3. The third variable, word3, is set to null because there is no third word. The second and third arguments are parsed entirely into variable names string2 and string3.
For more information about parsing multiple arguments that have been passed to a program or subroutine, see section Parsing Multiple Strings.
What are the results of the following parsing examples?
quote = 'Experience is the best teacher.'
PARSE VAR quote word1 word2 word3
quote = 'Experience is the best teacher.'
PARSE VAR quote word1 word2 word3 word4 word5 word6
PARSE VALUE 'Experience is the best teacher.' WITH word1 word2 . . word3
PARSE VALUE 'Experience is the best teacher.' WITH v1 5 v2
....+....1....+....2....+....3.
quote = 'Experience is the best teacher.'
....+....1....+....2....+....3.
PARSE VAR quote v1 v2 15 v3 3 v4
quote = 'Experience is the best teacher.'
....+....1....+....2....+....3.
PARSE UPPER VAR quote 15 v1 +16 =12 v2 +2 1 v3 +10
quote = 'Experience is the best teacher.'
....+....1....+....2....+....3.
PARSE VAR quote 1 v1 +11 v2 +6 v3 -4 v4
first = 7
quote = 'Experience is the best teacher.'
....+....1....+....2....+....3.
PARSE VAR quote 1 v1 =(first) v2 +6 v3
quote1 = 'Knowledge is power.'
quote2 = 'Ignorance is bliss.'
quote3 = 'Experience is the best teacher.'
CALL sub1 quote1, quote2, quote3
EXIT
sub1:
PARSE ARG word1 . . , word2 . . , word3 .
ANSWERS