Basic Atari BASIC Sorts
E.P.McMahon
Choosing a sort routine that eliminates unnecessary searches can save you time. Four sorting kethods are examined in terms of their speed, and there are some hints on making sors work faster.
Sorts - many programmers ignore them, many don't understand them, and most misuse them.
Let's look at the insertion sort, the selection sort, and the bubble sort. (the widely used bubble sort is about the most inefficient sort routine around.)
Why is it so widely used? Maybe because itt's so simploe: go through the list to be sorted and examine items, an adjacent pair at a time. If any pair is not in the correct order, swap the pair. Continue to the end of the list.n If a swap wap performed, repeat the above steps; if not, the sort is finished. This sounds more simple and direct than it may be.
Some Terms Defined
Afilecontains records (or items) which are to be sorted according to the keys which are a part, or all of, each record. (The last name in a file of names and addresses is a key for alphabetizing the list.) We will assume sorted means "placed in the order of ascending or descending value of the keys." Another way to sort is to build an auxiliary file of pointers which identify tthe records in the desired order - a good approach for large disk files.
One more definition: a stable sort does not disturb the results of a previous sort when the sort keys are equal. For, example, you sort a file of records consisting of names and addresses alphatically by first name (key = first name). You then sort the file by last name. If the sort is stable, when you have finished the second sort John Doe will follow Jane Doe and precede Joseph Doe; if not, the order of the Does will be arbitrary.
Multiple passes through a stable sort (in reverse order of importance of the keys) will accomplish the same thing as a sort on multiple keys. Simply said, a sort on multiple keys checks the second key any time the first keys of two records being compared are equal. This is how to convert any of the following single keys sorts into a multiple key sort.
Let's discuss the program listings now so you can refer to them as you read the rest of this article.
Bubble Sort
The first program is a bubble sort written in Atari BASIC. I'll review this listing since some of the REMark lines will apply to the other programs, and sections of the code will be identical in the other programs.
The file to be sorted is in string S$ and consists of N records each of length LREC. We will sort this file in place according to the key which is part of the record. The key starts at KB and ends at KF characters offset from the beginning of each record.
Lines in the 100's initialize; line 200 sets the clock to zero. Lines in the 1000's and 1100's are the sorts. Line 1500 reads and prints the clock; and the subroutine in the 2000's generates a random file to be sorted (each record consists of two random letters and a blank).
Let's look at the bubble sort. Why is it so weak? Primarily because many redundant comparisons are made, but also because records being moved are put down and picked up at each step. There really are better ways to sort which are just as easy.
The bubble sort (Program 1) uses one trick to make the "standard" bubble sort a little faster. Each pass through the file moves the largest remaining out-of-place record to its correct position. Also, we might be lucky and find some records already sorted. Remember that we use a flag to signal if another pass through the file is necessary. The trick is to use that flag to identify the location of the last swap made (line 1040). We never need examine past that point again; so, as shown in the program, FLAG and TOP limit the search. The bubble still isn't good enough.
Insertion Sort
I'll use a card player sorting a hand of 13 cards to help you visualize what's going on in each sort.
Our right-handed card player does the insertion sort by holding the first dealt card in the left hand and the other 12 cards in the right. Notice that the first card is already "inserted" in the sorted file in the left hand. He or she examines the next card to be sorted, initially card number two, and compares it to the cards in the left hand, initially just the first card. If card two is bigger, it remains card two as it is placed in the left hand; if smaller, card one is shifted to become card two, and card two from the right hand becomes card one in the left.
Each step, then, compares the next card to be inserted (from the right hand) with the last card in the left hand. If the new card is larger, it becomes the last card; if not, the old card in the left hand is moved one space lower, and the new card is compared with the next old card in line. This last step is repeated until the new card is inserted.
Now what is the worst case for this sort? A file that must be inverted. Each card must be compared with every card in the left hand, and every card in the left hand must be moved in each step. Best case? When the file is in order except for a new entry at the end (new last card).
Some people defend using the bubble sort when it's used to add a record to an already sorted file, but the insertion sort is faster at this, too. Just put the new record at the end of the file (new record number N) and change the loop indices (line 1000) to "FOR J = N TO N" and less than one pass through the sort will correctly place the new record.
Program 2 is an insertion sort written in Atari BASIC. Lines 1000-1100 are the sort itself; the rest of the lines follow the same convention described for the bubble sort.
Selection Sort
The selection sort is just as easy. This time, the card player holds all the cards in the right hand and scans from left to right for the smallest. The smallest card is extracted, placed in the left hand as card one, and the cards in the right hand are shifted to the right to fill the gap caused by the extracted card. The cards in the right hand are now numbered two to thirteen. The process repeats: scan the cards in the right hand, extract the smallest, and add it at the end of the cards in the left hand. Shift cards in the right hand to the right to remove the gap. When only one card remains in the right hand, it is the largest, and the sort is finished.
The worst case for this sort is also a file that must be inverted. Each card that is selected is the last one in the set of unsorted cards.
Let's look at the differences in these algorithms. In the insertion sort, we examined a sorted sub-list and insert a new record; in the selection sort, we examine an unsorted sub-list and select a new record. Suppose you are interested in the first ten items in a 100-item file. Which routine would you use? The selection sort of course, stopping after the tenth item is found.
If you implement the selection algorithm exactly as stated above to sort string variables, you'll find that shifting the "cards" in the right hand to remove the gap is inconvenient. (Try shifting a string of, say, ten characters five spaces to the right. If you don't know what will happen, try A$(6,16) = A$(1,10) and see what the result is.)
A Couple Of Tricks
Atari BASIC loves to shift strings to the left, so we'll modify the sort algorithm to take advantage of this. All we do is hold the unsorted cards in the left hand and put the extracted cards in the right hand. The gap is removed by shifting cards in the left hand to the left. Take a look at Program 3, a modified selection sort. There are a couple of tricks there. The variable TAIL defined in line 1000 locates the last record in the file S$. This location is the spot in our right hand where the selected card (record) will be placed.
The second trick is using the variable LAST to remember information from the last examination pass through the left hand. It is set to the next-to-the-smallest item in the list, so it has a head start on our next examination search. It is easy to save this information during the search.
Note that we save time on every other search (unless there are ties – then we save more) because we have to reset the flag in case we do not hit a swap. Line 1090 extracts the selected record, line 1100 moves the entire right side of the file one record to the left in one fell swoop, and the selected record is put at the tail. Lines 1140 to 1160 put the last record in its place at the end.
What would the bubble sort look like to our card player? He would examine cards one and two, and swap them if necessary. He would then compare cards two and three, swapping if needed. The process continues with cards three and four, four and five, and so on, to 12 and 13. Finished? Not yet. If any pair of cards were swapped, the process is repeated from the start. Have you ever sorted cards this way? Would you?
Modified Insertion Sort
The string-moving trick in the selection sort suggested that the same trick could be applied to the insertion sort. This results in the modified insertion sort. This results in the modified insertion sort (Program 4), where the sorted file is on the right of the string and the unsorted part of the file is on the left. The first record is always the record to be inserted, and when the insertion spot is found, the string up to the insertion spot is shifted to the right, over the first record.
This is a fast program; unfortunately, it is no longer as stable as the first three programs. It can be made stable by adding an artificial record to the file which is guaranteed to be the last record for any search key (no ties), since the instability occurs only with the last record in the file. To examine the stability of these sorts, sort first with both keys (KB and KF) equal to two, and then sort with both equal to one.
There is another way to make the modified insertion sort stable, and that is to pick the record to be inserted from the end of the unsorted part of the list (record J instead of record 1) and remove the equal sign from the sort test in line 1020. This results in a slower program than the modified insertion sort shown.
Powering Up
A short set of runs of the four programs (with no PRINT statements and with N = 50) gave average times of 80.8 seconds for the bubble, 48 for the insertion, 34 for modified selection, and 23.3 for modified insertion. The programs can be powered, made faster. One easy way is to precompute the constant part of the test in each sort statement. In the insertion sort, for instance, add line 1015 HOLD$ = S$((J-1)*LREC + KB, (J-1)*LREC + KF) and substitute HOLD$ for the right side of the test in line 1020.
If the above descriptions of the sort algorithms aren't clear to you, try sorting a hand of cards according to the rules. Then execute the programs as listed. If it will help, print out the loop indices at each step to see what's going on and how the tricks work to save a few searches here and there. If you're going to use these routines in another program, take out the REMs and print statements for more speed. Better yet, code the sort you need in machine language.
There are more efficient (and more complex) sorts: Shell's sort, Quicksort, and Heapsort, for examples. A quite complete study and reference on sorting (and searching) is the third volume of Donald E. Knuth's The Art of Computer Programming (Addison-Wesley, Reading, Mass., 1973).