DEBUG As An Assembler (PC World January 1985 Star-Dot-Star) ...take advantage of the DOS 2.x I/O redirection facility to mimic the operations of a conventional assembler. Use your word processing program to create a file that contains the commands and instructions shown in CTTYFIX.ASM (making sure that the file ends with DEBUG's Quit command). Then type: DEBUG < CTTYFIX.ASM > CTTYFIX.LST and DEBUG assembles the file into a .COM file and creates an .LST file similar to that generated by an assembler. Files can contain assembly language pseudo-ops such as DB for defining bytes and semicolons for making comments. Note that, thanks to this use of the redirection facility, you can edit a file and reassemble it if there are any errors. Another use of I/O redirection with DEBUG is to create an assembly language file (of sorts) from .COM and .EXE files. The key is to use the Unassemble command while the output is redirected to a disk file, then edit the file to remove extraneous data such as the segment address and offset that precedes each line of code. The resulting file can be commented using semicolons, edited, and then reassembled using DEBUG's assembler. ----------------------------------------------------------------- New DEBUG Command (PC World January 1985 Star-Dot-Star) Adding the routine below in DEBUGMOD to the DOS 2.x version of DEBUG.COM allows the program to list cross-references for Jump and Call instructions. The format for the new command is X range value, where range is the usual "segment:offset offset" or "segment:offset L value" style used under DEBUG, and value is a number between zero and 0FFFFh that represents an address. When executed, all the Jumps and Calls referencing that address will be displayed. The program does not work properly with .EXE files. For this reason, use a name other than DEBUG.COM so you can have both versions available. - - - - - DEBUGMOD A>DEBUG DEBUG.COM -A 2E80 xxxx:2E80 CALL 03AD xxxx:2E83 PUSH CX xxxx:2E84 PUSH AX xxxx:2E85 PUSH DX xxxx:2E86 MOV CX,0004 xxxx:2E89 CALL 051C xxxx:2E8C POP DI xxxx:2E8D POP ES xxxx:2E8E POP CX xxxx:2E8F MOV SI,2D3F xxxx:2E92 MOV AX,E9E8 xxxx:2E95 MOV [SI],AX xxxx:2E97 PUSH CX xxxx:2E98 PUSH DI xxxx:2E99 LODSB xxxx:2E9A SCASB xxxx:2E9B LOOPNZ 2E9A xxxx:2E9D JNZ 2EBB xxxx:2E9F PUSH DI xxxx:2EA0 MOV BX[DI] xxxx:2EA2 DEC DI xxxx:2EA3 ADD WORD PTR BX,03 xxxx:2EA6 ADD BX,DI xxxx:2EA8 CMP BX,DX xxxx:2EAA JNZ 2EB6 xxxx:2EAC PUSH DX xxxx:2EAD PUSH AX xxxx:2EAE CALL 0318 xxxx:2EB1 CALL 02B3 xxxx:2EB4 POP AX xxxx:2EB5 POP DX xxxx:2EB6 POP DI xxxx:2EB7 JCXZ 2EBB xxxx:2EB9 JMP 2E9A xxxx:2EBB CMP SI,2D41 xxxx:2EBF POP DI xxxx:2EC0 POP CX xxxx:2EC1 JNZ 2E97 xxxx:2EC3 RET xxxx:2EC4 -E 0396 80 2E -N D.COM -W Writing 2E80 bytes -Q A> ----------------------------------------------------------------- Simple Assembling with IBM DEBUG (COMPUTE! Magazine November 1985 by T. Victor) DEBUG includes a miniassembler, which converts assembly language instructions into machine language (ML) directly in memory, and a disassembler, which allows you to revers this process and examine ML programs already in memory. DEBUG also has trace and breakpoint functions for testing ML programs, utilities for loading and saving programs on disk, and several other valuable features. You can write small ML programs with DEBUG. Load DEBUG to get its hyphen prompt. You can return to DOS at any time by putting a DOS disk back in the drive, typing Q for Quit, and pressing Enter. Let's ask DEBUG to copy itself onto another disk. You could use the DOS COPY command, but using DEBUG is a good way to learn how to load and save ML program files. DEBUG has three commands for disk operations: L (Load), W (Write), and N (Name). N creates a data structure called a file control block (FCB) that DOS uses for all disk operations, including DEBUG's Load and Write. The FCB contains the name of a file, along with information such as size and file organization. The first step in backing up DEBUG is to load another copy of it into memory. Type N DEBUG.COM and press Enter. DEBUG responds with another hyphen. Next, type L and press Enter. This loads a second copy of DEBUG. Remove the DEBUG disk and replace it with a formatted disk that you'll be using for ML programs. Type W and press Enter. DEBUG displays the message "Writing 2E80 bytes". You now have a copy of DEBUG.COM on your ML disk. Let's try assembling a program with DEBUG. Start by typing A 100 to start assembling at address 100h. (All input and output with DEBUG is expressed in hexadecimal.) DEBUG responds with xxxx:0100, where xxxx is a four-digit hexadecimal number. This number is the current value of the code segment register. Now type in the following program. DEBUG displays the memory address of each instructino for you. All you need to enter are the instructions. MOV AH,09 MOV DX,109 INT 21 INT 20 DB "HELLO THERE$" Press Enter to leave the assembler. This program is the ML equivalent of everyone's first BASIC program: 10 PRINT "HELLO THERE". The ML version looks quite a bit longer, but it would be even more involved if it weren't for the INT 21h instruction, which calls a DOS function routine (Print String) by executing a software INTerrupt. Before calling this routine, the program takes two preparatory actions. The first instruction loads the AH register with the value 9. In 8088 machine language, instructions with two operands like MOV AH,09 operate from right to left -- just as A = 9 in BASIC moves the value 9 into the variable A. You specify the destination operand first, then the source operand. AH is the high (most significant) byte of AX, the 16-bit (two- byte) accumulator register of the 8088. When a program calls Interrupt 21h, the value in AH indicates the function you're asking DOS to perform. Function 9, Print String, displays a string on the screen, starting with the character at the address contained in the DX register and ending with the character $. The second instruction moves the address 109h into the DX register. The last instruction, INT 20h, ends the program by returning control to the program that called it -- in this case, DEBUG. Finally, we create the string we want to pring using DB, a pseudo- opcode (pseudo-op). When the assembler sees a pseudo-op such as DB, it performs a function instead of generating code. This particular psueod-op tells the assembler to store bytes of data in memory, beginning at the current location. The data can be either a list of hexadecimal numbers between 00 and FF, separated by spaces or commas, or a quoted string, as shown above. If the data is a string, the ASCII code for each character is entered in memory. The dollar sign at the end of the string is very important. Without this delimiter, the Print String function will keep printing whatever bytes it happens to find in memory following the message. It might be a long time before it comes across a $ and stops. Now that the program is in memory, we can use the disassembler to examine it. Type U for Unassemble, and DEBUG displays several rows of text on the screen (the number of rows differs between 40- and 80- column displays). Notice that the disassembled code is aligned in four columns. The first column shows the address of each instruction as two four-digit hexadecimal numbers separated by a colon, just as was displayed when you entered the program. The first four-digit number is the current value of the code segment register mentioned before, and the second is the value of the instruction pointer. To understand why two registers are needed to point to a single memory location requires some understanding of the 8088's addressing scheme. The 8088 microprocessor can access up to one megabyte (1024K) of memory using 20-bit addresses. However, for compatibility with older Intel processors, the 8088 has only a 16-bit instruction pointer. Because a 16-bit (four hexadecimal digit) register can only have values between 0 and 65,535, another register, the code segment register, is needed to address the entire 1,048,576 bytes allowed by the 8088. The code segment register is also a 16-bit register, but instead of addressing individual bytes, it points to blocks of 16 bytes, called paragraphs. Any five-digit hexadecimal address that ends in a zero is the beginning of a paragraph. For example, the byte of memory at 5D320h is at the beginning of the paragraph addressed by a segment register containing 5D32h. The code segment register points to the first paragraph of a 64K block of memory called the code segment (CS). There are three other segments, the data segment (DS), stack segment (SS), and extra segment (ES), plus a register that points to the beginning of each. In simple programs, however, all the segment registers usually have the same value as CS. To find the next byte of code to be fetched, the value in the instruction pointer is added to the address of the beginning of the code segment. The physical address of this byte can be found with this formula: Physical Address = IP + (CS*16). The effect of organizing memory this way is that a programmer doesn't have to know where the program will be loaded. When DOS loads a .COM program, it starts the code segment at the beginning of any available paragraph in memory. The program is loaded at an offset of 100h bytes above the start of the segment and the instruction pointer is set to 100h. The four segment registers, CS, DS, SS and ES, all point to the start of the code segment. The second instruction of the example program moves an address, 109h, into DX. This address is an offset into the current data segment. The string to be printed is located at an offset of 109h only if the data segment is equal to the code segment register and the program starts at offset 100h. In practice, the CS register is rarely changed except by DOS and needs little or no attention in most programs. The second column of the disassembled listing on the screen contains four- or six-digit hexadecimal numbers. These are the contents of the memory locations, the binary code which the 8088 can execute. Notice that the first MOV instruction is one byte shorter than the second. The first instruction only loads half of a 16-bit register (AH is the upper half of AX), so the data occupies one byte, but the second MOV loads all of DX, which takes two bytes of data (a word). The third column shows the mnemonics -- symbolic names for each opcode instruction. The fourth column displays the operands. This program consists of four opcodes: two MOV instructions followed by two INT instruction. Notice that the DB pseudo-op doesn't show up in a disassembly. Instead of displaying your characters, DEBUG tries to convert the string into assembly mnemonics, and therefore prints several meaningless instructions. DEBUG is frequently fooled this way because program instructions and data are both stored as binary bytes. DEBUG has no way of knowing where the program ends and the data begins. If you type another U, DEBUG continues to disassemble and display the next 16 or 32 bytes in memory (depending on your screen width). Since the program is only 21 bytes long, DEBUG starts displaying part of itself, still in memory from when you copied it. Type U 100 to disassemble from the beginning of your program again. DEBUG's U command also accepts both starting and ending addresses if you separate them by a space. Save your program on disk before running it. If the program causes something unexpected, like an infinite loop or a complete system crash, it's nice to have a copy saved. Then you can load it and search for the error without typing the program again from scratch. As before, you need to tell DEBUG the name of your file. Type N HELLO.COM. Now there's one more thing to consider: How many bytes of memory should DEBUG write to disk? When we used the W command to copy DEBUG, it wrote the same number of bytes that it had loaded, but now we're saving a new program which has never been loaded. When DEBUG loads a file, it stores the size of the file in the CX register and the four least significant bits of the BX register. The same registers are used when DEBUG writes a file. So if your program is less that 65,536 bytes long, the BX register should be set to zero. To examine and change CX, type R CX. DEBUG prints the contents of CX (probably 2E80h), left over from copying DEBUG), then prints a colon at the beginning of the next line. You can press Enter to leave the value unchanged, or type a new value. Since the new program is 21 bytes long, type 15 (the hexadecimal equivalent of 21) and press Enter. Now type W to write the program to disk. DEBUG responds with the message "Writing 0015 bytes," then returns the prompt. Now that your program is safe on disk, run it by typing G and pressing Enter. The screen should display HELLO THERE. Then DEBUG prints "Program completed normally" followed by its usual prompt. If your program completed but didn't print correctly, disassemble starting from 100h and check that all instruction are correct. If your program locked up the computer, reboot, restart DEBUG, and thank yourself for saving the program. Reload the program with N and L, then disassemble it to see what it looks like. If you don't know what's wrong, one technique is to try setting a breakpoint. This halts the program at a predetermined point so you can check the contents of the registers. For instance, to make the program stop before the INT 20h instruction, you can set one or more breakpoints. To set a breakpoint, type G followed by the addresses of one or more instructions in your program. If you set more than one breakpoint, separate the addresses with spaces. The program begins executing, but stops when the instruction pointer equals the address of a breakpoint. DEBUG displays the contents of all registers and flags and disassembles the instruction at the breakpoint (the instruction pointer, the next instruction to be executed). Type G to restart the program at the instruction that the instruction pointer references. If you stopped your program with a breakpoint but want to restart it from the beginning, type G=100. DEBUG sets the instruction pointer to 100h (or whatever address you specify) before starting. You can also set both the starting address and one or more breakpoints. Just include the breakpoint addresses on the same command line, separating them from the starting address and each other with spaces. Keep this in mind: Before DEBUG executes a G command, it saves the values of all the registers, including the instruction pointer. If the program runs normally, and completes by executing INT 20h, DEBUG restores all the registers. This is great if your program runs all the way from the beginning to end. You just type G and your program runs again. If, however, your programs has just completed after being restarted from a breakpoint, the instruction pointer now points to the location where the breakpoint was set. Typing G starts it from the breakpoint again. To run the program from the beginning, type G=100. Some other useful DEBUG commands are D (Dump), which displays the contents of a block of memory as hexadecimal numbers and ASCII characters; E (Enter), to examine and change the contents of individual memory locations; and T (Trace), which executes an ML program one instruction at a time, displaying all registers and flags between instructions. You'll find DEBUG a big help in testing your programs. Though you might use a separate assembler when your programs get larger, DEBUG remains useful for testing and modifying the assembled programs. If you want to know more, there is a complete description of each DEBUG command in Chapter 12 of the DOS 2.00 Manual and Chapter 8 of the DOS 2.10 Manual. Information on the DOS functions and interrupts can be found in Appendix D of the DOS 2.00 Manual and Chapter 5 of the DOS 2.10 Technical Reference Manual.