USR Sort
Walter D. Thompson, Jr.
One way to create your own operands (commands) is to write subroutines that can be called from a BASIC program. You can, for example, use the Atari's USR function to create a routine with the speed of machine language. The time-saving routine can be especially important in sorting programs.
How many times have you created a simple file to which you have made several additions? Later, you discover that you want to list the records in the file in some order other than the sequentially entered order. Or perhaps you would even like to rewrite the file in a new order.
A sort utility in DOS could very well handle all possible demands. However, a simple array sort can handle many smaller applications by reading the fields or records into an array in memory and sorting the array. The only limitation is available memory.
The Sort Routine
A sorting routine is a method by which related elements are compared and ranked. In the Atari, this ranking is accomplished by comparing the ATASCII values. Thus, if we had an array with 26 elements, each one byte long, we could load the array with each letter of the alphabet, and after sorting the array, we would have the letters in order from A to Z.
The sort routine that we want would need to compare a string of letters to another string, perhaps of unequal length. The sort order is left justified so that an element beginning with a D would come before an element starting with an N. However, we also want DUSTY to come after DUST and before DUTY.
Our sort routine must compare each element byte to the corresponding byte of the elements before and after.
Some languages have built-in operands which accomplish this in some manner. For example, IBM BASIC operands AIDX and DIDX return a second array which has sorted pointers to the first array.
Atari Restrictions And Solutions
When Atari BASIC was written, several operands were omitted simply because there was not sufficient internal memory for all of the special operands available. However, BASIC does allow the user to build assembler subroutines, which can be called from the BASIC program. The utility of the Atari is greatly enhanced by the user's ability to write many special applications. These applications can then use the far greater speed of machine language.
Speed is particularly important in sort functions. Anyone who has written a BASIC program to do a bubble sort on a string array, and has then waited for many minutes for strings of moderate length to process, can attest to the importance of speed in a sort. A string of file-size proportions could very well take hours. This is because a sort, of any type, requires many repetitive compares to accomplish its final result. These are time-consuming operations in BASIC, but they are extremely fast with machine language.
Locating The Sort
So it seems that we must simply load a machine language sort module and call it from a BASIC program. But first we must consider where to locate the machine language module. It could be in the same place for any program which would load and access it, yet it must not interfere with the operating system or with our BASIC program which has called it. We would not accomplish very much if the sort module were located in the very middle of the array to be sorted.
The first sort utility I wrote worked extremely well and fit entirely into page six. So what more could I want from a sort that handled a large (32,000) element array and executed in mere seconds?
The answer is safety. Upon closer examination, I realized that a simple error in programming could result in the programmer sorting all of his or her program and memory above the array. This could include the display list, screen memory, and even BASIC itself. The routine needed a method of validating the total length of the sort array. We could have in memory an array of 100 elements of 127 bytes each, but through error, pass a value of 1000 elements to the sort routine. And the sort routine would, of course, attempt to sort 124K of memory, being forced to stop only when it attempted to sort ROM.
After developing and adding an error-check to make the program safe, I discovered that my original memory location was no longer able to hold the improved version. After some more juggling, I decided the solution was to make the object code relocatable. This means that the program permitted no jumps to a specified address; it allowed only relative branches.
Although the sort routine is relocatable, it does make use of a number of page zero and page six memory locations when executed. During the course of a sort, the program will alter the contents of locations $CB-$D1 and $6F8-$6FF.
Building The Array
We want to sort a group of unequal fields in a string array of as yet unspecified size. We must work with elements of a specified length, padding with blanks to the right for alphanumeric fields and zeros to the left for numeric fields. For example, a name field to be sorted may have a maximum length of 25 characters. There are 100 names or elements to be sorted. And so:
LET MLENGTH = 25 LET ELEMENTS = 100 LET X = MLENGTH * ELEMENTS DIM SARRAY$(X)
In order to load the array so that the sort routine has equivalent elements to compare, we assign each field the maximum length using the following technique:
DIM FIELD$(MLENGTH) FOR I = 1 TO MLENGTH : LET FIELD$(I) = " ": NEXT I LET L = LEN(NAME$) LET FIELD$(1, L) = NAME$ LET P = ELEMENT * MLENGTH LET S ARRAY$(P, P + MLENGTH - 1) = FIELD$
If numeric fields are to be loaded into a string array for sorting, the following technique will build equal length elements with leading zeros:
DIM FIELD$(MLENGTH), TEST$(MLENGTH) FOR I = 1 TO MLENGTH : LET FIELDS(I) = "0": NEXT I LET TESTS = STRS(ZIPCODE) LETL = LEN(TEST$) LET FIELD$(MLENGTH + 1 -L, MLENGTH) = TESTS
We have now built an array for sorting, retaining the number of elements and the length of each element.
The USR Call
Upon calling the sort routine, we specify the number of elements to be sorted, the length of each element, the sort order, and the relative variable number of our array.
We call the sort routine by executing the following USR call:
LET EFLAG = USR(STARTADDR, ELEMENTS, MLENGTH, ORDER, RELVARNUM)
As long as the program does not change graphics modes, at least prior to all sorting, a very handy place to locate the sort routine object code is just below RAMTOP. The STARTADDR = PEEK(741) + 256 * PEEK(742) -339. Or in BASIC A +, STARTADDR = DPEEK(741) -339. ELEMENTS should be the actual number of elements loaded into the array. The additional routine which validates the length of the array is based on the LEN value of the variable, not its DIMed length. This helps to prevent garbage from being mixed in accidentally with good data. In our case, MLENGTH is 25, although the range of values for length is from 1 to 255 bytes. ORDER determines whether we will sort the array elements in ascending or descending order; 1 represents ascending order, and 255 indicates descending order. RELVARNUM, or the relative variable number, is the relative appearance of our string array in the BASIC program. Looking back to our first example, we see that MLENGTH was the first variable in our example, ELEMENTS was second, X third, and SARRAY$ fourth. The first variable has a relative number of zero, so RELVARNUM = 4 -1 or RELVARNUM = 3.
When a USR routine is called, a value may be returned to the BASIC program. This value is used to return error conditions. Those error conditions are represented by values of EFLAG, which are described as follows:
A Value of | Indicates |
0 | An error-free sort |
1 | ELEMENTS * MLENGTH > ARRAYLENGTH |
2 | Sort ORDER <> 1 or 255 |
3 | MLENGTH = 0 or > 255 |
4 | ELEMENTS = 0 |
5 | Variable not a DIMed string array or relative variable number exceeds 31 |
6 | Invalid number of parameters passed |
The Sort Utility
Our sort utility works by comparisons made in a loop, from the most significant byte to the least significant byte. Comparisons are made until two compared bytes are found out of order or until the element is fully tested. The order of comparison reduces the number of passes through the loop to a minimum.
To further increase efficiency, only one pass is made through the array. When one element is exchanged with a lower one, it is tested against the next lower element, until it is no longer exchanged or until it is placed at the bottom of the array. After placing that element, the routine returns to the element's original position and moves to the next element. In this way, each element is ranked among all previously sorted elements.
Of course this is not a DOS utility sort, and it is limited by available memory. However, the USR function allows us to use machine language speeds in our routine. And with a little programming and thought, most files can be sorted on any field by placing the sort field first in the element string, followed by the entire record.
For convenience, the machine language sort routine is provided as a BASIC loader program.