R.T.Russell Home Page

Chapter 14 - Grouping Data in Arrays




For reasons that will become apparent, arrays are often used in conjunction with an earlier subject, loops. To illustrate the need for arrays, let's pretend you're working on a space invaders game. Somewhere near the end of the code, you need to see if there are any aliens left alive so you can start a new level. If each alien has a variable to say whether it's still alive, you could do this:
REM ...
NewLevel=TRUE : REM Assume new level
IF Alien1Alive THEN NewLevel=FALSE
IF Alien2Alive THEN NewLevel=FALSE
IF Alien3Alive THEN NewLevel=FALSE
IF NewLevel THEN
  REM setup new level
ENDIF
REM ...
The first line assumes a new level is required. The program then inspects each AlienXAlive flag in turn and if this alien is still alive, resets the NewLevel flag to false. If it completes its checks and the flag is still true, all the aliens must be dead so line 'em up and start again. Question: what happens if there are 40 aliens? Or 100? Using this method, that's a lot of lines of code. There must be a better way:
DIM AlienAlive(50)
REM ...
NewLevel=TRUE : REM Assume new level
FOR I%=1 TO 50
  IF AlienAlive(I%) THEN NewLevel=FALSE
NEXT I%
IF NewLevel THEN
  REM setup new level
ENDIF
REM ...
The first thing is the keyword DIM. DIM stands for DIMension and is an instruction to the computer to set aside an area of memory for an array variable. You tell BASIC how many variables the array holds in the brackets after the name. In this case, we are telling the computer we have 50 aliens, so reserve 50 locations, one for each. Previously, we have let BASIC define our variables when first used, but an array is different. By defining the size of an array, BASIC can perform a check each time it is used in code. If we try to access outside the array, BASIC will tell you by halting the program and producing a friendly message (i.e. it crashes). Although not compulsory, it is good practice to declare all arrays at the top of the program. This puts them all together in one place and gets all the memory allocations out of the way when the program starts up.

Once declared, we can use the array pretty much as we do any other variable, each individual element is accessed by specifying its number: AlienAlive(1), AlienAlive(39) or as in the line IF AlienAlive(I%) ... using the counter in a FOR loop to decide which element we wish to inspect. By using this method, we can check as many aliens as we wish all in the same number of lines of code.

Here is a complete example which stores grades for a number of pupils and then does something with the data:

REM Student grade program
DIM Grade%(5)
      
REM First, collect the data
FOR I%=1 TO 5
  PRINT "Enter grade for student ";I%;": ";
  INPUT Grade%(I%)
NEXT I%
      
REM Work out the average
Total%=0
FOR I%=1 TO 5
  Total%=Total%+Grade%(I%)
NEXT I%
REM Print the average
PRINT "The average grade was ";Total%/5
      
REM Find the minimum grade
Minimum%=999
FOR I%=1 TO 5
  IF Grade%(I%)<Minimum% THEN
    Minimum%=Grade%(I%)
  ENDIF
NEXT I%
REM Print the minimum
PRINT "The minimum grade was ";Minimum%
      
END
Make room for 100 pupils if you want by changing the relevant lines, but you might get bored entering all that data!

BBC BASIC actually gives you an extra element, so when you declare Grade%(5) you get 6 elements: Grade%(0) to Grade%(5). In practice, it's often easier to ignore the first element because it's human nature to think in terms of pupil 1, pupil 2 etc, not pupil 0. But it's there if you want it. All the elements of an array are set to 0, or empty in case of strings, on declaration and once declared, you can't resize the array by re-declaring it later on. It may help to think of an array as a collection of boxes, so if we entered values for the above students of 40,80,60,70 and 55, this is how they would be stored:
 

Element 0 1 2 3 4 5
Value 0 40 80 60 70 55

You can declare arrays with more than one dimension. If you think about a grid (like the imaginary one you use to place characters on the screen) we would have a two dimensional array:
DIM Grid%(5,5)
This would define a grid 6 cells by 6 cells (remember element 0). This is how it would look in our box diagram:

  0 1 2 3 4 5
0            
1            
2            
3            
4            
5            
   
Each row has 6 columns as can be seen from above. These are accessed by Grid%(0,0),Grid%(0,1),Grid%(0,2) ... Grid%(0,5) then Grid%(1,0),Grid%(1,1) ... Grid%(1,5) etc. up to Grid%(5,5). Into each of these locations, we can put a value, that's 36 locations in all. As you can see, this gives us a powerful way to group information.

You can define an array with 3 dimensions:

DIM Grid%(10,10,10)
If you think of a two dimensional grid as a page, we've just declared 11 pages, each containing a grid 11 elements square. If you're working with the demo version, arrays are a superb way to run out of memory. Think about it, there are 11*11*11 = 1331 elements in this array, so just be wary before getting too carried way. Realistically, unlike some other BASICs, with BB4W you can declare as many dimensions as you want, but in practice programmers rarely use more than 3.

Initialising Arrays

READ, DATA and RESTORE

Arrays can be used to hold data, but one of the problems is getting the data into the array. In the grade program above, entering is fine because every time you run it, you would probably want to enter different values. Consider this:

REM Days in a month
DIM Month%(12)
 
REM First, collect the data
FOR I%=1 TO 12
  PRINT "Enter days in month ";I%;": ";
  INPUT Month%(I%)
NEXT I%
 
REM Now ask for month
INPUT "Enter month number: " M%
PRINT "Month ";M%;" has ";Month%(M%);" days."
 
END
Sort of defeats the object really, doesn't it? BASIC, of course, has got there before us and provides a way to set up variables and arrays that are going to be the same each time.
REM Days in a month
DIM Month%(12)
 
REM First, collect the data
FOR I%=1 TO 12
  READ Month%(I%)
NEXT I%
 
REM Now ask for month
INPUT "Enter month number: " M%
PRINT "Month ";M%;" has ";Month%(M%);" days."
 
END
DATA 31,28,31,30,31,30,31,31,30,31,30,31
That's much better. There are a pair of keywords here, READ and DATA. DATA contains just that, a collection of data, numeric or string that can be used in the program. What the data is and the order you put it is entirely up to you. When the program encounters a READ statement, it goes off and finds the next DATA statement. It then reads the value back into the variable given, a little like INPUT. BASIC remembers where it got up to, so the next time it encounters READ, it carries on from where it left off.

Obviously, there should be as many pieces of data as there are READ instructions (including the number of times READ is called in a loop) or the program gets upset. You are not constrained to using READ only with arrays, it is perfectly acceptable to set single variables using this method, but with a loop it is possible to initialise an array with very few lines of code.

DATA statements can be split in any way you choose, for example, we could have written:

DATA 31,28,31,30,31
DATA 30,31,31,30,31
DATA 30,31
or even:
DATA 31
DATA 28
DATA 31
DATA 30
DATA 31
DATA 30
DATA 31
DATA 31
DATA 30
DATA 31
DATA 30
DATA 31
You're the programmer, it's up to you, just make sure that there are as many items as there are READs.

You can mix and match items as long as you read the correct type into the correct variable.

REM Days in a month
DIM Month%(12), Name$(12)
 
REM First, collect the data
FOR I%=1 TO 12
  READ Month%(I%)
  READ Name$(I%)
NEXT I%
 
REM Now ask for month
INPUT "Enter month number: " M%
PRINT Name$(M%);" has ";Month%(M%);" days."
 
END
DATA 31,January,28,February,31,March
DATA 30,April,31,May,30,June
DATA 31,July,31,August,30,September
DATA 31,October,30,November,31,December
DATA statements can be placed anywhere in the program, BASIC will just ignore them until told to use them. They are usually placed at the bottom of the program after END so they don't clutter the code, with the exception given below.

There are occasions when it is necessary to reread a set of data. Suppose you had a default set of values which could be reset from an option in a menu. To force the data pointer back to a specific place, we use the RESTORE command. RESTORE has a few options, the first has no argument and resets the data pointer to the first DATA statement in the program.

Another use is to specify a line number. (Line numbers are discussed in Appendix A. If you're not familiar with them, the general consensus these days is that you are not missing much but might like to zip off there and give them a glance.) Using RESTORE with a line number will reset the data pointer to the first item on the line given. This can be useful for specifying alternate sets of data.

 10 REM Months in English and French
 20 DIM Month$(12)
 30 REM First, collect the data
 40 INPUT "Do you want French? (y/n) " Ans$
 50 IF Ans$="N" OR Ans$="n" THEN
 60   RESTORE 170
 70 ELSE
 80   RESTORE 210
 90 ENDIF
100 FOR I%=1 TO 12
110   READ Month$(I%)
120 NEXT I%
130 REM Now ask for month
140 INPUT "Enter month number: " M%
150 PRINT "Month ";M%;" is ";Month$(M%)
160 END
170 DATA January,February,March
180 DATA April,May,June
190 DATA July,August,September
200 DATA October,November,December
210 DATA Janvier,Fevrier,Mars
220 DATA Avril,Mai,Juin
230 DATA Juillet,Aout,Septembre
240 DATA Octobre,Novembre,Decembre
The line number can be calculated if required.

To remove the line numbers, RESTORE gives us yet another option. In this we specify an offset from the line containing the RESTORE instruction (NOT the line with the first DATA statement). The number given tells BASIC the number of lines to move forward from the current position. To indicate that we are using an offset and not a line number, the number must be preceded by a +. Here is the above program without line numbers using this method. Examine the numbers in the RESTORE lines and count forwards to see where each one points to.

REM Months in English and French
DIM Month$(12)
REM First, collect the data
INPUT "Do you want French? (y/n) " Ans$
IF Ans$="N" OR Ans$="n" THEN
  RESTORE +11
ELSE
  RESTORE +13
ENDIF
FOR I%=1 TO 12
   READ Month$(I%)
NEXT I%
REM Now ask for month
INPUT "Enter month number: " M%
PRINT "Month ";M%;" is ";Month$(M%)
END
DATA January,February,March
DATA April,May,June
DATA July,August,September
DATA October,November,December
DATA Janvier,Fevrier,Mars
DATA Avril,Mai,Juin
DATA Juillet,Aout,Septembre
DATA Octobre,Novembre,Decembre
You can only go forwards here, try specifying a negative offset and BASIC will complain. Lines here are actual physical lines, including blank ones and REMs, not just those containing code. As can be seen, if the program was to be extended, we could easily lose track of the offsets and create chaos. Perhaps this method is best employed when the DATA is close to the RESTORE statements.
REM ...
IF Ans$="N" OR Ans$="n" THEN
  RESTORE +4
ELSE
  RESTORE +6
ENDIF
DATA January,February,March
DATA April,May,June
DATA July,August,September
DATA October,November,December
DATA Janvier,Fevrier,Mars
DATA Avril,Mai,Juin
DATA Juillet,Aout,Septembre
DATA Octobre,Novembre,Decembre
In any of the above methods, if there is no data at the line given by RESTORE, BASIC finds the next line that contains DATA and uses that instead.

Initializing without READ and DATA

It is possible to initialize an array directly in code. This saves DATA and READ statements and is more in keeping with the way languages like C do this sort of thing. The first example initializes all the elements after the array has been declared. Don't forget that MyArray%(3) has 4 elements 0 to 3 so we need 4 values.

REM Inline initialization
DIM MyArray%(3)
      
MyArray%() = 1,2,3,4
FOR I%=0 TO 3
  PRINT MyArray%(I%)
NEXT I%
 
END
This can be done at anytime, not just when the array has been declared.
REM Inline initialization
DIM MyArray%(3)
      
MyArray%() = 1,2,3,4
FOR I%=0 TO 3
  PRINT MyArray%(I%)
NEXT I%
      
MyArray%() = 5,6,7,8
FOR I%=0 TO 3
  PRINT MyArray%(I%)
NEXT I%
      
END
If you supply less than the number of elements, only the given number are initialized.
REM Initialize the first three elements
DIM MyArray%(3)
      
MyArray%() = 1,2,3
FOR I%=0 TO 3
  PRINT MyArray%(I%)
NEXT I%
 
END
However, and this is really useful, you can preset an entire array if only one value is given. Great for initializing big arrays.
REM Set each element to 50
DIM MyArray%(100)
      
MyArray%() = 55
FOR I%=0 TO 100
  PRINT MyArray%(I%)
NEXT I%
 
END
Multi-dimensional arrays are not a problem, as long as you get the order correct: right to left. In the example, the line has been split. This is not a problem, but you must still include a comma at the end of the split line, just as you would if it was continuous.
REM Initializing a multi-dimensional array
DIM MyArray%(2,3)
 
MyArray%() = 1,2,3,4,10,20,30, \
\ 40,100,200,300,400
FOR I%=0 TO 2
  FOR J%=0 TO 3
    PRINT MyArray%(I%,J%)
  NEXT J%
NEXT I%
 
END
Did somebody say strings? You can do these as well, but then, you'd expect that.
REM Initializing a string array
DIM Month$(12)
 
Month$() = "","January","February", \
\ "March","April","May","June", \
\ "July","August","September", \
\ "October","November","December"
 
FOR I%=1 TO 12
  PRINT Month$(I%)
NEXT I%
 
END
When do you use inline and not READ/DATA? It's completely up to you. One of the advantages of the inline method is that it gets round the dreaded line number dependency when using RESTORE. On the other hand, if you have large amounts of DATA, then you may prefer to keep it from clogging up the body of the code.

Finding the size of an array

The DIM statement can be used as a function. It can have one or two arguments passed to it. The first parameter is always the name of the array. When used with just the name, DIM will respond with the number of dimensions of the array.

REM Finding the size of an array
DIM Array1D(10)
DIM Array2D(10,9)
DIM Array3D(10,9,8)
      
PRINT "Array","Dimensions"
PRINT "Array1D",DIM(Array1D())
PRINT "Array2D",DIM(Array2D())
PRINT "Array3D",DIM(Array3D())
 
END
As previously stated, it is possible to have an array name which is the same as a variable so we use the array name followed by the empty brackets to distinguish it from an ordinary variable.

Once you know the number of dimensions, it is possible to ask DIM for the size of each by passing the particular dimension number as the second parameter.

REM Finding the size of an array
DIM Array3D(10,9,8)
PRINT "Dimension 2 has "; \
\       DIM(Array3D(),2);" elements."
END
To combine the above two, we can now find the number of dimensions and the size of each one.
REM Finding the size of an array
DIM Array3D(10,9,8)
      
N%=DIM(Array3D())
FOR I%=1 TO N%
  PRINT "Dimension:";N%;
  PRINT " Elements:";DIM(Array3D(),I%)
NEXT I%
 
END
We'll mention this use of DIM again in the section on procedures and functions which is coming up soon, whereupon you will be able to see why we would want to do this. Until then, just bear it in mind.

Exercise

Here are the figures for the first six months' sales of triple fruit chocolate covered syrup and treacle flavour ice lollies from one local newsagent:

    January     105
    February    261
    March       482
    April       195
    May         347
    June        626

Set an array to hold these values. Then add to the program so it loops through the values to find and print:
a) the month number for the lowest sales;
b) the month for the highest sales;
c) the total sales.
You can use the same FOR loop to achieve all three or do them separately: your choice.

Modify the program to have an array of month names, initialize it and adapt the above program to display real names for the months.

Left CONTENTS

CHAPTER 15 Right


Best viewed with Any Browser Valid HTML 3.2!
© Peter Nairn 2006