Designing DOS Filters (PC Magazine Vol 3 No 20 Oct 16, 1984 M. Abrash/D. Illowsky) The most useful utility programs are not necessarily the most complex or powerful. A simple utility can be very handy if it saves a few minutes a day, or if it lets you perform a needed function with a minimum of effort. DOS versions 2.0 and higher provide three programs in the form of filters MORE, FIND and SORT that make it easy to manipulate data files and to pass information between programs. Only a few filters are provided with PC-DOS, but new features, such as enhanced batch file processing and the redirection of I/O, make it a snap for you do design your own filter programs for various uses. We present two "home-made" filters: one filter guarantees that all carriage returns in a file are paired with linefeeds, while the other ensures that a file has an end-of-file marker. These filters are elegantly simple and run with the speed of assembly language, and are fully functional and easy to use. DOS 2.0 lets the user send input to a program from any file, just as if that input had been typed at the keyboard. This is known as redirection of the standard input. The standard input defaults to reading from the keyboard, but a less-than sign (<) on the command line is all that's required to redirect the standard input away from the keyboard. For example, the command line: LINK < LINKFILE.DAT runs the LINK program, taking the instructions for the linker from the file LINKFILE.DAT. The standard output from any program -- that is, the interactive output that normally goes to the screen -- can likewise be redirected to any file by using a greater-than sign (>). For instance, the command line: TREE > SUBDIR.LST sends the list of all the subdirectories on the default disk to the file SUBDIR.LST. For both input and output, the default standard device is the console device, CON:. On input, the console is the keyboard, and on output, it is the video display. A new feature introduced with DOS 2.0 is the filter. A filter is a program that accepts information from the standard input, modifies that data in some way, and then sends the transformed information on to the standard output. For example, the FIND filter, provided with DOS, accepts input from any text file and passes on to the standard output only those lines of text that contain the string of characters you specify. This allows you to pick out certain lines of interest. Either one or both of a filter's input and output may be redirected away from the console to any file. You can visualize a filter as sitting between the standard input and standard output; it modifies the information passed from the input to the output according to a unique set of rules. As an example, you should look at one that filters all bare carriage returns into carriage return/linefeed (CR/LF) pairs. Many users have been frustrated trying to use a file with only a bare carriage return marking the end of each line, rather than the CR/LF pair that most DOS programs require. This problem is particularly common when working with files transferred from other computers via a modem or direct connection. For example, files transferred from an Apple II typically contain no linefeeds and cannot be properly listed or used with most IBM software without being modified. In fact, both EDLIN and WordStar treat such a file as if it consisted of one long line. In the past, programs to fix files that contained bare carriage returns could be written in BASIC, but these were agonizingly slow. Alternatively, such programs could be written in assembly language, but it was no small undertaking. The redirection features and new functions provided by DOS 2.0 make it simple to design a compact, easy-to-use filter program that changes all bare carriage returns to CR/LF pairs with the speed of assembly language. The great advantage of filters is that they make it easy to massage information as it passes between programs and to perform a whole series of file manipulations with a single command line. A BASIC program is provided to create the filter program CRLF.COM. The file CRLF.COM will be present on the default disk and ready for use. To use the CR/LF filter, you redirect the input from the file with bare carriage returns and redirect the output to the file in which you want to store the corrected text. If you do not redirect the output, the corrected text is displayed on the screen. We strongly suggest that you don't filter a file back onto itself because this action simply destroys the original file. For example, if you try to type file BARECR.TXT, which has a program listing with each line terminated with a bare carriage return, then each line will overwrite the previous line because there are no linefeeds to advance the cursor to the next row of the screen. This is easily set right with the command line: CRLF < BARECR.TXT When executed, this command reads all the characters from the file BARECR.TXT, changes all bare carriage returns to carriage return/linefeed pairs, and sends the corrected text to the screen which is the default standard output. Because all carriage returns have been paired with linefeeds, the text will display legibly on separate lines. Similarly, the command line: CRLF < BARECR.TXT>CRLFPAIR.TXT takes input from the file BARECR.TXT, passes it through the CR/LF filter to correct all bare carriage returns, and sends the corrected text on to the file CRLFPAIR.TXT. You can then use the file CRLFPAIR.TXT as you would any normal DOS file. That's really all there is to using the CR/LF filter. A single command line, with redirection of the standard input and output, ensures that every carriage return in any file is properly paired with a linefeed. CR/LF works well with the piping features of DOS 2.0 as well. One nice feature of the CR/LF filter is that any carriage return that is properly paired with a linefeed is left alone. You can filter either a normal file or one that has both bare and paired carriage returns, and no harm will be done to the carriage returns already paired. However, some programs that set high bits may make linefeeds unrecognizable to CR/LF. Files created by such programs should first be passed through another filter to strip the high bits. Alternatively, you could modify CR/LF to ignore high bits. Let's create a small file with only bare carriage returns so that we can see why the CR/LF filter is needed and how it works. Use the DEBUG program shown below to create the file TEXTCRLF.DAT on the default disk, containing four lines of text -- each terminated with a bare carriage return. To verify that there are no linefeeds in this file, enter the command line: TYPE TESTCRLF.DAT You will see that text lines display one atop the other, so only the last line is visible. If you edit this file, you may find it does not display properly; EDLIN, for example, does not treat the lines as separate. Now enter the command line: CRLF < TESTCRLF.DAT to pass this file through the CR/LF filter and send it to the screen. The file will display correctly because a linefeed is inserted at the end of each line. To create a corrected version of the file TESTCRLF.DAT, you should enter the command line: CRLF < TESTCRLF.DAT > CORRECTED.DAT The filtered output, with all carriage returns properly paired with linefeeds, is stored in the file CORRECTED.DAT. You can edit or display this file as you would any normal text file. The procedure is just as simple for any file of any size. Just redirect the input from the file that contains bare carriage returns and redirect the output to the file in which you want the corrected text to be placed. A handy feature of the CR/LF filter is that it inserts an end-of-file (EOF) marker at the end of any file that lacks one. Ctrl-Z (value 26, or hexadecimal 1A) is generally used to mark the end of text files. Most text editors and word processors look for this EOF marker when they load a file, but EDLIN is an exception to this rule. However, not all files contain an end- of-file marker; for instance, files created with the COPY CON: command and those created with the DEBUG program lack the EOF marker. If the marker is not present, most programs assume that all of the last sector of information read from the disk is a valid part of the file, but it is not. If the character Ctrl-Z (hexadecimal 1A) is not the last byte of any file filtered with CR/LF, then a Ctrl-Z is added to the end of that file so that it can be edited properly. For example, put a disk with space for a file in the default drive and enter the command lines: COPY CON: NOEOF.DAT THIS FILE IS NOT TERMINATED WITH AN EOF MARKER and strike the F6 key. The file NOEOF.DAT is now created, with no EOF marker. To verify that the EOF marker is missing, edit NOEOF.DAT with your favorite word processor or editor (e.g., WordStar), and you'll probably see a row of "@" characters at the bottom of the file. These characters are garbage and do not properly belong in the file, but they are loaded because no EOF marker was present to tell the software where the file ended. Now pass the file through the CR/LF filter with the command: CRLF < NOEOF.DAT > ISEOF.DAT This creates the file ISEOF.DAT, which is identical to the file NOEOF.DAT except for a Ctrl-Z to mark the end of the file. Then if you edit the file ISEOF.DAT, you will see that the garbage at the end of the file has been eliminated. Because CR/LF can improperly modify files created by programs that set high bits, it is not an ideal tool for simply ensuring the EOF marker is present. The BASIC program that creates the filter program MARKEOF.COM does nothing but add an EOF marker to the end of any file lacking one. As an example of the use of the MARKEOF filter, you can place an EOF marker at the end of the NOEOF.DAT file that we created above with the command line: MARKEOF < NOEOF.DAT > ISEOF.DAT Apart from the possible addition of a Ctrl-Z as an EOF marker, no change is made to the text of the filtered file. Figure: Creating TESTCRLF.DAT with DEBUG. If any response differs from that shown (other than the segment address 6BF8), exit with the "Q" command and start over. A>DEBUG -F 100 L1C "LINE 1"0D "LINE 2" 0D "LINE 3" 0D "LINE 4" 0D -D100 11B 6BF8:0100 6C 69 6E 65 20 31 0D 6C-69 63 65 20 32 0D 6C 69 6BF8:0110 6E 65 20 33 0D 6C 69 6E-65 20 34 0D -RCX CX 0000 :1C -RBX BX 0000 :0 -N TESTCRLF.DAT -W Writing 001C bytes -Q A> ----------------------------------------------------------------- Custom-Made DOS Filters (PC Magazine Vol 3 No 21 Oct 30, 1984 M. Abrash/D. Illowsky) The design of the MARKEOF filter is simpler than the CR/LF filter's, so we'll examine its assembly language source code first (Figure 1). The key to the MARKEOF filter is the use of the DOS functions 3F (hex) and 40 (hex). Function 3F reads one or more characters, and function 40 writes one or more characters. Each function is called by typing the number of bytes to be read in register CX, the location at which the bytes are to be placed in register DX, and the functin number in register AH. Register BX has a file handle, which allows the device to be read from or written to. DOS lets you set up a file handle to refer to any file in any subdirectory, but we'll use only a small part of the file-handle feature. DOS provides several built-in file handles that are always automatically available. Two of these built-in file handles refer to the standard I/O, and that's all we need. If register BX contains 0 (for file-handle number zero), the standard input is used; if register BX contains 1, the standard output is used. When registers, AH, BX, CX and DX are set, DOS is invoked to execute the function with software interrupt 21. MARKEOF is interested in only the last character of the input file, so it just loops continually, passing characters from the standard input to the standard output, until the last character is reached; then an end-of-file marker is inserted if none is present. This loop extends across lines 29 to 51 (Figure 1). (Line numbers are used only for explanatory purposes and should not be included when entering the program.) Lines 30 through 36 read a character from the standard input with function 3F. Lines 43 through 49 immediately write the character to the standard output with functino 40. This loop continues until function 3F returns a 0 in register AX, indicating that the standard input has no more text. When the standard output runs out of text, MARKEOF checks on lines 57 through 60 to check the last character of the standard input. If the character is Ctrl-Z (hex 1A), then the end of the file is properly marked, and MARKEOF is done. If the character is not Ctrl-Z, then the end is unmarked, and a Ctrl-Z character is appended to the standard output by lines 65 through 71 so other programs will correctly detect the end of the file. Finally, DOS function 4C (hex) ends MARKEOF and returns control to DOS. The command sequence show in Figure 2 both assembles and links the file MARKEOF.ASM, as shown in Figure 1, into the executable filter program MARKEOF.COM. You must have the IBM Macro Assembler (the program MASM.EXE) in order to assemble this program, and you will need to use the LINK and EXE2BIN programs provided with DOS. LINK will produce the error message "Warning: No STACK segment." This is of no concern, since the program uses the STACK segment set by DOS. The assembler source listing for the CR/LF filter is shown in Figure 3. The bulk of CR/LF is simply the loop from MARKEOF, in which each character is read from the standard input and sent to the standard output. However, carriage returns are handled specially on lines 50 through 64. After a carriage return has been sent to the standard output, a linefeed is automatically sent to the standard output as well. This ensures that all carriage returns are paired with linefeeds. Of course, if the next character from the standard input is a linefeed, which means that the carriage return was already paired, then there would be two linefeeds. To avoid this, the character following the carriage return is read on line 54. If the next character is a linefeed, it is discarded; thus, the carriage return remains paired with a single linefeed. If the next character is not a linefeed, then the carriage return was bare and now has been corrected, and, as a result, the next character is saved. The pairing of every carriage return with only one linefeed is the sole difference between CR/LF and MARKEOF and is the only modification made to the text from the standard input as it flows through the filter. When all the text has been filtered, lines 91 through 102 of Figure 3 guarantee that a Ctrl-Z is present to mark the end of the file. Finally, DOS function 4C ends the program. The command sequence shown in Figure 4 assembles and links the file CRLF.ASM, as shown in Figure 3, into the executable filter program named CRLF.COM. After looking at the MARKEOF and CR/LF filters, you can see how easily a file can be modified with filters running under DOS 2.0 and how simple it is to make these filters. With the new redirection features of DOS 2.0, filter programs can be written in assembly language without file control blocks, open and close functions, and complex function calls. With DOS 2.0, even the neophyte assembly language programmer can easily design his own custom filter programs. - - - - - - - - - - Figure 1: The assembly language listings for MARKEOF. [1] ;* [2] ;* Assembly-language source code listing for MARKEOF, [3] ;* a filter to copy the standard input [4] ;* to the standard output, making sure that the [5] ;* text is terminated with Ctrl-Z (hex 1A) to [6] ;* mark the end of the file. [10] cseg segment [11] assume cs:cseg,ds:cseg [12] org 100h ;COM files start at offset 100h [13] markeof proc far [14] jmp short read_char [16] ; Equates and storage area [18] eof equ lah ;Ctrl-Z character that marks [19] ; the end of a text file [20] tchar db ? ;temporary storage for [21] ; character read from standard input [22] end_of_file db eof ;storage for end-of-file marker [24] ; Top of loop to read a character from the standard input [25] ; and write it to the standard output. [26] ; Read the next character from the standard input. [29] read_char: [30] sub bx,bx ;file handle for the standard input [31] mov cx,1 ;one character is to be read [32] mov dx,offset tchar ;character read is to be [33] ; stored in tchar [34] mov ah,3fh ;we want DOS function 3F (hex), [35] ; which reads a character [36] int 21h ;invoke DOS to read a character [37] ; from the standard input [38] and ax,ax ;is the standard input out of text? [39] jz done ;if so, then finish up [41] ; Write the character to the standard output. [43] mov bx,1 ;file handle for the standard output [44] mov cx,bx ;one character is to be written [45] mov dx,offset tchar ;character to be written is [46] ; stored in tchar [47] mov ah,40h ;we want DOS function 40 (hex), [48] ; which writes a character [49] int 21h ;invoke DOS to write a character [50] ; to the standard output [51] jmp short read_char ;read the next character [53] ; All text transferred - add an end of file marker [54] ; exists. [56] done: [57] cmp [tchar],eof ;was the last character read [58] ; the end of file marker? [60] jz eof_set ;if so, then we're done [62] ; The last character was not an end of file marker, so add [63] ; the marker to the standard output. [65] mov bx,1 ;file handle for standard output [66] mov cx,bx ;one character is to be written [67] mov dx,offset end_of_file ;end-of-file marker [68] ; to be written is [69] ; stored here [70] mov ah,40h ;DOS function 40 (hex) to write [71] int 21h ;invoke DOS to write the end of file [72] ; marker to the standard output [74] ; The end-of-file marker is all set, so we're done. [76] eof_set: [77] mov ah,4ch ;DOS function 4C (hex) to terminate [78] int 21h ;invoke DOS to end the program [79] markeof endp [80] cseg ends [81] end markeof - - - - - - - Figure 2: Assemble, link and conversion steps for making the source code of the filter MARKEOF, which is stored in the file MARKEOF.ASM, into the runnable filter program MARKEOF.COM. A>MASM MARKEOF; The IBM Personal Computer MACRO Assembler Version 1.00 (C)Copyright IBM Corp 1981 Warning Severe Errors Errors 0 0 A>LINK MARKEOF; IBM Personal Computer Linker Version 2.00 (C)Copyright IBM Corp 1981, 1982, 1983 Warning: No STACK segment There was 1 error detected A>EXE2BIN MARKEOF.EXE MARKEOF.COM A>ERASE MARKEOF.EXE - - - - - - Figure 3: The assembly language listings for the filter program CR/LF. [1] ;* Assembly-language source code for CRLF, a filter to copy [2] ;* the standard input to the standard output, making sure [3] ;* that every carriage return is paired with a linefeed, [4] ;* as with normal DOS files. Also, Ctrl-Z (hex 1A) is [5] ;* added to mark the end of the file if no end-of-file [6] ;* marker is present. [12] cseg segment [13] assume cs:cseg,ds:cseg [14] org 100h ;COM files start at offset 100h [15] crlf proc far [16] jmp short read_char [18] ; Equates and storage area. [20] cr equ 0dh ;carriage return character [21] lf equ 0ah ;linefeed character [22] eof equ 1ah ;Ctrl-Z character [24] tchar db ? ;temporary storage for character [25] ; read from standard input [26] linefeed db lf ;storage for linefeed character [27] end_of_file db eof ;storage for end-of-file marker [29] ; Top of loop to read a character from standard input and [30] ; write it to the standard output, making sure that all [31] ; carriage returns are paired with a linefeed. [33] read_char: [34] call read1 ;get the next character from [35] ; the standard input [36] save_char: [37] mov dx,offset tchar ;point to character read [38] call write1 ;write it to the standard [39] ; output [40] cmp [tchar],cr ;was the character a CR? [42] jz handle_cr ;if so, make sure it is [43] ; paired with a linefeed [44] jmp short read_char ;read the next character [46] ; Make sure that a CR is followed by a LF. [49] handle_cr: [50] mov dx,offset linefeed ;point to the LF charactr [52] call write1 ;write a LF for the CR [54] call read1 ;get the next character, to [55] ; make sure the the CR was [56] ; not already paired with a [57] ; linefeed [59] cmp [tchar],lf ;is the next character from [60] ; the standard input a LF? [62] jnz save_char ;if it is not a LF, then [63] ; then save it normally [64] jmp short read_char ;if it is a LF, then read [65] ; the next character; the [66] ; LF just read is discarded [68] crlf endup [70] ; Read the next character from the standard input, checking [71] ; whether the standard input has run out of text. [74] read1 proc near [75] sub bx,bx ;file handle for the standard input [76] mov cx,1 ;one character is to be read [77] mov dx,offset tchar ;character read is to be [78] ; stored in tchar [79] mov ah,3fh ;we want DOS function 3F (hex), [80] ; which reads a character [81] int 21h ;invoke DOS to read a character from [82] ; the standard input [83] and ax,ax ;is the standard input out of text? [84] jz done ;if so, then finish up [85] ret [87] ; All text transferred - add an EOF marker if none exists [90] done: [91] cmp [tchar],eof ;was the last character read [92] ; the EOF marker? [94] jz eof_set ;if so, then we're done [96] ; The last character was not an EOF marker, so add the [97] ; marker to the standard output. [99] mov dx,offset end_of_file ;EOF marker to be [100] ; written is stored [101] ; here [102] call write1 ;write the EOF marker [104] ; The EOF marker is all set, so we're done. [106] eof_set: [107] mov ah,4ch ;DOS function 4C to terminate [108] int 21h ;invoke DOS to end the program [109] read1 endup [111] ; Write the chracter pointed to by register DX to the [112] ; standard output. [114] write1 proc near [115] mov bx,1 ;file handle for the standard output [116] mov cx,bx ;one character is to be written [117] mov ah,40h ;we want DOS function 40 which [118] ; writes a character [119] int 21h ;invoke DOS to write a character to [120] ; the standard output [121] ret [122] write1 endp [123] cseg ends [124] end crlf - - - - - - - Figure 4: Assemble, link and conversion steps for making the source code of the filter CR/LF, which is stored in the file CRLF.ASM, into the runnable filter program CRLF.COM. A>MASM CRLF; The IBM Personal Computer MACRO Assembler Version 1.00 (C)Copyright IBM Corp 1981 Warning Severe Errors Errors 0 0 A>LINK CRLF; IBM Personal Computer Linker Version 2.00 (C)Copyright IBM Corp 1981, 1982, 1983 There was 1 error detected A>EXE2BIN CRLF.EXE CRLF.COM A>ERASE CRLF.EXE -----------------------------------------------------------------