Disk Disassembler
George H. Watson, Jr.
Physics Dept.
University of Delaware
Newark, DE
Editor's Note: This program works on either BASIC 3.0 or 4.0 and any 2040 DOS. It uses a printer. On some systems, the question "SKIP BASIC?" should be answered "NO" even though the program under disassembly is entirely in machine language.
There are several fine disassemblers available (in BASIC and in machine language) which disassemble programs while they reside in PET memory. Problems arise though when the program to be disassembled normally resides in the same memory space allocated to the disassembler. By relocating the disassembler (moving it to different memory space) it may still be used, although with a bit more difficulty. This problem may be circumvented by using a disassembler which does not require the program to be in PET memory. Instead, the program can be disassembled directly from the diskette on which it is stored by transferring the machine code byte-by-byte (reading the program) and translating into mnemonics, but not storing the bytes in memory.
A computer program is a set of instructions which are stored in the computer's memory in the form of bytes (8-bit words). A machine language program is a set of bytes which the microprocessor in your computer understands directly. On the other hand, a BASIC program consists of bytes which represent the various BASIC statements. When you RUN a BASIC program, each byte is interpreted and the microprocessor acts according to machine language subroutines which exist in the computer's ROMs. When you LIST a BASIC program, the operating system of your computer translates the bytes into BASIC statements, which are then displayed on the screen. Unfortunately no such LIST command is available for machine language programs on the PET microcomputer. But something is available which will translate the bytes into a form more understandable to a human. By allowing a disassembler to operate on the machine code, the program will be "LISTED" as 6502 microprocessor mnemonics, the heart of every PET.
DISK DISASSEMBLER opens a file to be read (the program to be disassembled) in the disk drive. The first two bytes which are read will contain the address at which the file is normally loaded into PET memory. The remaining bytes to be read comprise the program. All bytes will be translated into mnemonics until an end-of-file marker is detected (through the error word, ST), at which point the disassembly is finished.
Many programs which you may be interested in disassembling will be a combination of BASIC and machine code. DISK DISASSEMBLER handles the case where the machine code follows the BASIC program. All bytes are skipped over until three consecutive zeroes are detected which indicates the end of the BASIC program. All subsequent bytes will be disassembled.
As much as possible, I have attempted to make the output resemble the source code used by assemblers. (Source code for an assembler consists of the mnemonics for the microprocessor which the assembler converts into machine code.) One major benefit of an assembler is its ability to represent addresses with labels. Thus the machine language programmer is not required to calculate relative addresses needed for conditional branches—a tedious chore. DISK DISASSEMBLER does not provide the option of inputting labels (too time-consuming) but relative branches ARE converted to absolute branches, which makes understanding the disassembly easier.
DIS TEST is a compilation of all legal opcodes (instructions) available to the 6502 microprocessor. When disassembled, an alphabetical listing of the mnemonics along with their addressing modes will be printed out. If there are errors in the mnemonics or addressing modes, carefully check the DATA statements in lines 9000–9155. If the relative branches are wrong, check lines 670–675. Check all lines containing the address counter, AD, if the memory locations in the first column are incorrect.
Try DISK DISASSEMBLER on your favorite game or utility. You can learn much about machine language programming by studying the tricks used by others. You may also be able to learn more about the routines available in the PET's ROMs by examining how other programmers use them.
One option available in DISK DISASSEMBLER is the ability to change a legal opcode to an illegal opcode. Why do this? Some programs which you may disassemble use a legal opcode (unused otherwise) as filler between subroutines. I suppose this is to thwart disassembly since a simple NOP would also do the job. You may overcome this limitation by making the opcode illegal. How? Find the mnemonic in the DATA statement; make sure you find the one with the correct addressing mode. Now simply replace the number immediately following the mnemonic with a zero.
DISK DISASSEMBLER was written on a 32K PET (3.0) with 2040 disk drive. The program as written is slightly less than 7K in length, while variables, arrays, and strings require slightly less than 8K, so the program will run on a 16K PET; remove the REM statements if there is a problem. DISK DISASSEMBLER will also run on 4.0 PETs and with the new disk drive ROMs. For readers not inclined to type in long programs, contact me at the above address and I will provide tape copies at $3 each. (Include SASE, mailer, and tape.) Happy disassembling!
Speeding up BASIC
Some notes on DISK DISASSEMBLER:
- Most frequently-used subroutines and the working part of the program should be placed at the beginning of the program (lower line numbers). When a GOSUB or GOTO is executed, BASIC begins at the first line of the program and compares each following line number until a match is obtained with the desired line number. Thus fewer line numbers need to be scanned for subroutines which are placed at the beginning. Disadvantage: a program may seem less structured.
- Variables should be dimensioned as in lines 2000–2020 and the most-used variables should be initialized first. Similar to 1), when a variable is encountered, BASIC begins at the first variable in the table of variables and compares each following variable with the desired variable until a match is made. Dummy variables (constantly changing value and heavily used in subroutines) are good candidates for the first positions in the table. The variables should then be used as often as permitted.
- When possible, use arrays of constants in place of conversions made with time-consuming subroutines. The biggest timesaving in DISK DISASSEMBLER was made by using an array of 256 hex characters, HG$(), in place of a subroutine which converted the decimal value of a byte to the hex value. Disadvantage: more memory consumed.
- Use IF FG THEN ... rather than IF FG <> 0 THEN ... and IF ST-64 THEN ... rather than IF ST<>64 THEN ... The branch will be made if the argument of the IF .. THEN .. is nonzero.
- Replace numbers with defined variables. In lines 300 and 400, B = 256. Time is saved since the conversion of the number 256 into the representation used by BASIC need not be done over and over; it was done once at initialization. Disadvantage: larger variable table.
- Since any statement following a GOTO or RETURN on the same line is never executed, a remark may be placed there with no time lost and with no REM statement. See lines 10 and 100.
- When DATA statements are read, if all that is seen is another comma (no data), then a variable is read to be zero and a string is read to be null.
I would also like to mention two shorthand tricks which are available.