Copyright 1984 by ABComputing July 15, 1984 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º CLEANUP Your WordStar Files With Ada º º º º by º º º º George Gordon Noel º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ EDITOR'S NOTE: This article assumes knowledge of subjects discussed in this issue's Ada tutorial. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Introduction ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ I admit it. I love WordStar. (EDITOR'S NOTE: I wouldn't admit it!) But every WordStar user has a list of improvements to be included in MicroPro's next release. My own pet peeve (well, one of them at least) is that WordStar does not produce standard ASCII text files. If you display a WordStar-created text file with the DOS TYPE command, or with another editor, Greek letters and mathematical symbols are splattered all over the screen, instead of the desired file. This article describes the Ada program CLEANUP, that transforms WordStar data files into standard ASCII text files. (An executable version of this program is provided on Diskette B as the file CLEANUP.COM.) ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Extended ASCII Characters ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The IBM-PC uses 8 bits to represent ASCII characters. If the eighth bit of a character (with the rightmost one being considered bit 1) is zero then a standard ASCII character results. If the eighth bit is a 1, an extended ASCII character results, and these characters display as mathematical or graphic type symbols. WordStar, for inscrutable reasons of its own, sometimes sets the eighth bit of a character to a 1, insuring that the most unpredictable looking screens appear. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Character Processing ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The problem, then, is to clear the eighth bit, or set it to 0. How can a particular digit in any number be cleared? To understand this, it is helpful to think in decimal for a moment. In general, to set the nth digit (counting from the right) of a decimal number to zero involves dividing by 10 to the power of n-1 and taking the remainder. To clear the third digit in 345 (the number 3), we divide 345 by ten squared. The remainder after division is 045, which is the desired answer. To clear the eighth bit (digit) of a binary number divide the number by 2 to the 7th power - 10000000, or 128 decimal - and take the remainder. To be more specific, the lower case "d" has the ASCII code of 100 decimal or 01100100 binary. When WordStar sets the eighth bit, the result is 11100100 or 228, which is an upper-case Greek sigma in the extended-graphic set. To reverse the process, divide 228 by 128 (2 to the 7th power) and take the remainder - 100. Greek sigma to English d. Taking the remainder after division is such a common operation that programming languages, such as Pascal and PL/1, have a built-in operator or function, usually called "mod", for this purpose. CLEANUP uses the Ada mod operator, described in the previous issue of PCFL/PCUG. Ada also has the POS and VAL attributes, which are useful for going back and forth between the numeric and character representations of an ASCII code. (POS and VAL are discussed in this month's Ada tutorial.) The first step is to convert an object of the character data type to its numeric equivalent. The object in question might be a variable of type character, declared like this: CHAR: character; The expression character'POS(CHAR) returns an integer value corresponding to the position of the variable CHAR in the ASCII character set. If CHAR held the character literal 'd', the value of the expression would be 100; if CHAR held the Greek letter sigma, the expression would return 228. To clear the eighth digit in the CHAR variable, divide the variable by 128 and take the remainder: character'POS(CHAR) mod 128 If CHAR held the Greek sigma, the result of this evaluation would be the number 100 - the ASCII value of the character literal 'd'. Next, the integer value returned by the previous expression is converted into a character by using the VAL attribute: character'VAL(character'POS(CHAR) mod 128) If the value of CHAR is the Greek sigma, this expression returns the character literal 'd'. That's all there is to it! ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ File Processing ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Having accomplished the job at hand for a single character, we turn to process the entire file. Since a text file is a sequence of characters, our program is quite simple. It reads one character, clears the eighth bit of that character, outputs the character, and repeats the process until the end of the file is reached. The following program reads text from the "standard input device," clears the eighth bit of the characters, and writes the resulting, standard ASCII characters to the "standard output device." It uses Ada's END_OF_FILE function to determine when there is no more text to be converted. with TEXT_IO; procedure CLEANUP is use TEXT_IO; CHAR: character; begin while not END_OF_FILE (STANDARD_INPUT) loop get (CHAR); put (character'VAL(character'POS(CHAR mod 128))); end loop; end CLEANUP; It is an example of a "filter" - a program that ordinarily reads from the keyboard and writes to the display screen. (Filters can access disk files using the I/O redirection features of DOS 2.0, as described on page A-4 of the IBM DOS 2.0 manual.) Unfortunately, this nice, concise program won't run on the PC. As I have complained more than once, there is no standard Ada compiler for the PC. Therefore, the program listing for CLEANUP accompanying this article is written in Janus/Ada, the excellent subset compiler offered by R.R. Software of Madison, WI (and reviewed in PCFL/PCUG, issue 1). I won't discuss the Janus version of CLEANUP, since it involves some non-standard features. It works, though. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Conclusion ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ This version of CLEANUP is not a filter. It is invoked from the DOS command level by typing CLEANUP. The program asks for an input file (a WordStar text file) and an output file (to contain the cleaned up text). A sample run might look like this: Input file? DIRTY Output file? CLEAN After completion, the program returns to DOS. In addition to converting extended characters to standard ASCII ones, the Janus version of CLEANUP also eliminates the control characters that WordStar uses to format output for the printer. A comment in the source code marks the statement removing the control characters. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ File Name: ÛÛ ada2.txt ÛÛ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ