Building Your Own
Random File Manager
Michael D. Lipay
There are several approaches to handling computer files (collections of data). Among the fastest and best is the random access disk file which uses special techniques to quickly locate any piece of information from anywhere within the entire file.
This tutorial explains how random access can be achieved and examines alternative ways to process data files. It includes a sample program, written in Applesoft BASIC, but which can easily be adapted to work on other computers using Microsoft BASIC.
Besides protecting earth from aliens, a main purpose of a computer is processing information. This data processing can be anything from keeping track of your stamp collection to maintaining a running inventory for your business. When it becomes necessary to retain the information long after the computer has been turned off, tape or disk storage is used.
Magnetic storage devices are capable of storing information indefinitely (provided they are kept clean and away from magnetic fields). Basically, there are two types of magnetic storage devices available to the micro computer user - tape and disk. Both devices are capable of storing large amounts of information, and do so in groups called files. A file is a collection of related information, and the user has three primary types of files to select from:
I) Sequential Tape Files
II) Sequential Disk Files
III) Random Access Disk Files
II) Sequential Disk Files
III) Random Access Disk Files
Which of the three you decide to use for a given program will depend on many factors. Each has its own advantages and disadvantages; they are discussed here in an effort to help you select the best one for your needs.
Sequential Tape Files
If you have large amounts of data which you do not need to process frequently, then tape files should be considered. Tapes can store vast amounts of data in a relatively compact space, and at a very low price. Tapes serve as an excellent medium to keep a backup of disk programs and files. The big drawback to using tapes is that they are slow, so make sure you have plenty of time.
Sequential Disk Files
Sequential disk files are best if you have small amounts of data to process. The files have the advantages of being faster than tape and more space conservative than random access files. Probably the only disadvantage of sequential disk files is the slowness of updating large files. In order to change a single record on a sequential file, you must copy all records to a work file, changing any records desired along the way, then delete the old file and rename the work file. This could be as time consuming as tape files, were it not for the speed of the disk.
Random Access Disk Files
Large volumes of data which must be updated with any frequency should be held in random access files. This type of file lets you easily update any given record without having to process or read through any other record on the file. It also has disadvantages such as requiring all records to be of the same, fixed length and needing to know where on the file a particular record is located.
There are several methods available to help you to determine where a particular record is located on a random access file. John Hudson covered the HASH/LINK method in the March 1982 issue of COMPUTE!. He did an excellent job; and if you desire to learn more about it, I suggest that you read this article. The HASH/LINK method does have some problems. For example:
I) If you fill the overflow area, you
will have to reorganize the file again.
II) As soon as you initialize the random file, you take up more space than you may need.
III) Successive "collisions" can greatly increase access time (rec 100 links to rec 212, rec 212 links to rec 487, rec 487 links to...).
IV) Expanding the main and overflow areas of the file may require major program revisions (deciding the main area should be 2000 recs instead of 1000 recs will require changes to your hashing logic), as well as requiring you to reload the file.
V) Sequential (ascending or descending) processing is almost impossible.
VI) If you need to "key" on an alphabetic field (such as a name), you must first convert it to a numeric value.
VII) Once the file has been created, it is impossible to select an alternate key (e.g., a file is hashed on the last name, but you need a report in social security number order).
VIII) Deleting a record requires several Read/ Write steps to keep the link field updated. Once a record has been deleted, the position that it occupied on the file is unusable, since all adds occur at the end of the file.
II) As soon as you initialize the random file, you take up more space than you may need.
III) Successive "collisions" can greatly increase access time (rec 100 links to rec 212, rec 212 links to rec 487, rec 487 links to...).
IV) Expanding the main and overflow areas of the file may require major program revisions (deciding the main area should be 2000 recs instead of 1000 recs will require changes to your hashing logic), as well as requiring you to reload the file.
V) Sequential (ascending or descending) processing is almost impossible.
VI) If you need to "key" on an alphabetic field (such as a name), you must first convert it to a numeric value.
VII) Once the file has been created, it is impossible to select an alternate key (e.g., a file is hashed on the last name, but you need a report in social security number order).
VIII) Deleting a record requires several Read/ Write steps to keep the link field updated. Once a record has been deleted, the position that it occupied on the file is unusable, since all adds occur at the end of the file.
In the rest of this article I will cover an alternate method known as Indexed-Sequential Access Method (ISAM).
ISAM
ISAM can solve all the problems associated with HASH/LINK files, but it has some problems of its own. ISAM works on the principle that it is faster to search memory than a disk. Unfortunately, before you can search memory, you must have something in it, and this is the problem with ISAM.
ISAM works by loading the desired "key" field of each record in a file into an array. This is done by placing the key field of the first record into the first position of the array, the key from the second record into the second position of the array, etc. Once the array has been loaded, you simply search the array for the desired key; its position in the array is the record number for the random access file. Described below are the procedures necessary for the most common types of file processing:
I) ADD A RECORD
a) Search the array to determine if the
record already exists.
b) Move the new "key" to the end of the, array, or to the first "open" position in the array.
c) Use the position number of the array to write the record to the file.
II) DELETE A RECORDb) Move the new "key" to the end of the, array, or to the first "open" position in the array.
c) Use the position number of the array to write the record to the file.
a) Find the key in the array.
b) "Open" the entry in the array by moving a "dummy" key into the array (such as zeros).
c) Write the dummy values to the file.
III) CHANGE A RECORDb) "Open" the entry in the array by moving a "dummy" key into the array (such as zeros).
c) Write the dummy values to the file.
a) Find the key in the array.
b) Use the position number to read in the record.
c) Make your change to the record (even change the key).
d) Write the new record to the file using the position number.
e) If you changed the key, move the new key into the array.
IV) PROCESS SEQUENTIALLY BY KEYb) Use the position number to read in the record.
c) Make your change to the record (even change the key).
d) Write the new record to the file using the position number.
e) If you changed the key, move the new key into the array.
a) Sort the array into the desired
order (ascending or descending).
b) Process the records sequentially through the array.
V) PROCESS BY A DIFFERENT KEYb) Process the records sequentially through the array.
a) Load the array with the new keys
from the file.
b) Process normally using the new array.
b) Process normally using the new array.
Listed below are sample programs, written in Applesoft, which illustrate ISAM programming techniques. The programs are shells which can easily be modified to suit your own purposes. Note that all branch instructions bypass the REM statements; thus, if you want to key the program in without remarks, no line numbers will have to be changed. Variables used in the programs are:
D$ - Control-D (disk access)
IA - Index Array
IE - Index End (last entry used)
IP - Index Pointer (entry number for the part searched for)
IO - Index Open (entry number for first "open" or empty record)
FOUND - Switch to indicate if part searched for is in the index:
0 - part not in index
1- part in index
PART - Part number being
searched for1- part in index
10-13 This section goes to a one-time routine to load the index array with the desired key field (in this case a part number).
100-114 Display the options available in a menu format.
120-122 This gets the option into a string. Then, using the VAL command, goes to the appropriate routine. Note that if zero, a non-numeric character, or a number greater than five is entered, the menu is displayed again.
200-215 The index array is searched sequentially in this section. If the key is found, the following values are returned:
FOUND =1
IP = Entry in array
for desired key
IO = First open entry in array (entry with key of zero)
If the key is not found, the following values are returned:IO = First open entry in array (entry with key of zero)
FOUND = 0
IO = First open entry in array
Note on lines 212 and 213 the method used to exit from the FOR/NEXT
loop. This is the method suggested by Apple to exit the loop from other
than normal completion. Its purpose is to prevent ?OUT OF MEMORY errors
from occurring as a result of too many "open" loops.IO = First open entry in array
300-324 ADD A PART
310 Accepts the part number to be added to the file.
311 Goes to the routine to search the index. If the part already exists (FOUND =1), an error message is displayed and control is returned to the menu.
321-322 The new part is written to the master file using the open entry pointer (IO) as the record number.
323 If the new part is added to the end of the file, the number of the last entry (IE) is updated.
324 Returns to the menu.
400-424 DELETE A PART
410 Accepts the part to be deleted.
411 Goes to the search routine. If the part is not on file (FOUND =0), an error message is displayed and control is transferred to the menu.
420 The part is removed from the index by making the entry zero.
421-422 The part is removed from the master file.
423 If the part was the last one in the array, the ending pointer (IE) is reduced by one.
424 Return to the menu.
800-813 UPDATE INDEX POINTER
810-811 Write the number of the last entry in the index to record zero of the master file.
812 Closes the master file.
813 Stops the program.
900-930 LOAD THE INDEX ARRAY 910 Initially sets up variables.
911 Sets up an error routine to handle end-of-data and not-found conditions.
912 Opens the master file.
913-914 Read the number of the last record on the master file.
915 Turns off the error routine, dimensions the index array to allow up to ten records to be added to the end of the array (this can be changed to allow for more expansion).
916 If no records exist on the.master, control goes to the menu.
920 Sets up the error routine.
921-924 Load the key field (part number) into the array.
930 Turns the error routine off, returns to the menu.
The second program offers a different method of handling the index. Type in lines 10-630 from Program 1, then add the lines from Program 2. In this program the index is kept on a sequential disk file, for speed of loading the array.
800-833 Save the index array.
810 Check the index change switch; if it is zero, the index has not changed and does not have to be rewritten. Control goes to 832.
811 Deletes the index file.
820-823 Write the array to the index file.
830-831 Write the number of the last entry in the index to record zero of the master file.
832 Closes the master file.
833 Stops the program.
900-940 LOAD THE INDEX ARRAY
910 Initially sets up variables.
920 Opens the master file.
921 Sets up the error routine.
922-923 Read the number of entries in the index file.
930 Sets up a new error routine.
931 Dimensions the index array (with expansion of 10).
932-934 Read the index file into the array.
935 Turns the error routine off and closes the index.
940 Turns control over to the menu.
Program 1: ISAM
10 REM
11 REM CALL INDEX LOAD ROUTINE
12 REM
13 GOTO 910
100 REM
101 REM SELECT OPTION
102 REM
110 HOME : PRINT "1) ADD PART"
111 PRINT "2) DELETE PART"
112 PRINT "3) CHANGE PART"
113 PRINT "4) DISPLAY PART"
114 PRINT "5) STOP"
120 PRINT : INPUT "SELECT OPTION: ";OPT'$
121 ON VAL (OPT$) + 1 GOTO 110,310,410,510
,610,810
122 GOTO 110
200 REM
201 REM SEARCH INDEX ARRAY
202 REM
210 IO = IE + 1: IF IE = 0 THEN FOUND = 0:
RETURN
211 FOR I = 1 TO IE
212 IF IA(I) = PART THEN IP = I:I = IE + 1:
NEXT :FOUND = 1: RETURN
213 IF IA(I) = 0 AND 10 = IE + 1 THEN 10 =
I: NEXT
214 NEXT I
215 FOUND = 0: RETURN
300 REM
301 REM ADD A PART
302 REM
310 INPUT "ENTER NEW PART NUMBER: ";PART
311 GOSUB 210: IF FOUND = 1 THEN PRINT "PA
RT ALREADY ON FILE": GOTO 110
320 IA(IO) = PART
321 PRINT D$;"WRITE MASTER,R";IO
322 PRINT PART: PRINT D$
323 IF ID > IE THEN IE = 10
324 GOTO 110
400 REM
401 REM DELETE A PART
402 REM
410 INPUT "ENTER PART TO BE DELETED: ";PARK
411 GOSUB 210: IF FOUND = 0 THEN PRINT "PA
RT IS NOT ON FILE": GOTO 110
420 IA(IP) = 0
421 PRINT D$;"WRITE MASTER,R";IP
422 PRINT 0: PRINT D$
423 IF IP = IE THEN IE = IE - 1
424 GOTO 110
500 REM
501 REM CHANGE A PART
502 REM
510 INPUT "ENTER PART TO BE CHANGED: ";PART
511 GOSUB 210: IF FOUND = 0 THEN PRINT "PA
RT IS NOT ON FILE": GOTO 110
520 PRINT D$;"READ MASTER,R";IP
521 INPUT PART: PRINT D$
530 REM CODING TO CHANGE PART
540 IA(IP) = PART
541 PRINT D$;"WRITE MASTER,R";IP
542 PRINT PART: PRINT D$
543 GOTO 110
600 REM
601 REM DISPLAY PART
602 REM
610 INPUT "ENTER PART NUMBER: ";PART
611 GOSUB 210: IF FOUND = 0 THEN PRINT "PA
RT IS NOT ON FILE": GOTO 110
612 PRINT D$;"READ MASTER,R";IP
613 INPUT PART: PRINT D$
620 REM CODING TO DISPLAY PART
630 GOTO 110
800 REM
801 REM UPDATE INDEX POINTER
802 REM
810 PRINT D$;"WRITE MASTER,RO"
811 PRINT IE
812 PRINT D$;"CLOSE MASTER"
813 END
900 REM
901 REM LOAD INDEX ARRAY
902 REM
910 D$ = CHR$ (4):IE = 0:IP = 0:10 = 0:FOUN
D = 0:PART = 0
911 ONERR GOTO 915
912 PRINT D$;"OPEN MASTER,L25"
913 PRINT D$;"READ MASTER,RO"
914 INPUT IE: PRINT D$
915 POKE 216,0: DIM IA(IE + 10)
916 IF IE = 0 GOTO 110
920 ONERR GOTO 924
921. FOR I = 1 TO IE
922 PRINT D$;"READ MASTER,R";I
923 INPUT IA(I)
924 NEXT I: PRINT D$
930 POKE 216,0: GOTO 110
Program 2: Index Array Routine
800 REM
801 REM SAVE INDEX
802 REM
810 IF IC = 0 GOTO 832
811 PRINT D$;"DELETE INDEX"
820 PRINT D$;"OPEN INDEX"
821 PRINT D$;"WRITE INDEX"
822 FOR I = 1 TO IE: PRINT IA(I): NEXT I
823 PRINT D$;"CLOSE INDEX"
830 PRINT D$;"WRITE MASTER,RO"
831 PRINT IE
832 PRINT D$;"CLOSE MASTER"
833 END
900 REM
901 REM LOAD INDEX ARRAY
902 REM
910 D$ = CHR$ (4):IE = 0:IP = 0:IC = 0:IO =
0:FOUND = 0:PART = 0
920 PRINT D$;"OPEN MASTER,L25"
921 ONERR GOTO 930
922 PRINT D$;"READ MASTER,RO"
923 INPUT IE
930 ONERR GOTO 935
931 DIM IA(IE + 10)
932 PRINT D$;"OPEN INDEX"
933 PRINT D$;"READ INDEX"
934 FOR I = 1 TO IE: INPUT IA(I): NEXT I
935 POKE 216,0: PRINT D$;"CLOSE INDEX"
940 GOTO 110