Computers: Universe: Basic Arrays

Static vs Dynamic arrays

UniVerse Basic supports two kinds of array: dimensioned arrays (also called static arrays), and dynamic arrays. They have in common merely the fact that they are arrays: that is, they can hold more than one value. They are otherwise entirely different in their use and function.

(The difference between static and dynamic arrays has nothing to do with the differences between static and dynamic hashed files. The similarity in terminology is entirely coincidental.)

Dimensioned arrays

A dimensioned array is very similar to the array structures found in other languages.

A dimensioned array (uniquely amongst UniVerse Basic variables) must be declared before it can be used, using the DIM statement:

DIM DEMO.ARRAY(10)

This statement dimensions the array DEMO.ARRAY to store 10 values. The values can be individually addressed using a subscript in round brackets. The elements are numbered from 1 to 10 (C and Pascal programmers please note: not 0 to 9). Each individually addressed element can be treated much as any variable can.

DEMO.ARRAY(1) = 'This is a value for element 1'
DEMO.ARRAY(1) := ' with a bit appended'
PRINT DEMO.ARRAY(1)

To assign the same value to all elements of a dimensioned array in one go, use the MAT statement:

MAT DEMO.ARRAY = ''

You can create one or two dimensional arrays using DIM. To create a two dimensional array and address its elements, use code like the following:

DIM TWO.DIM.ARRAY(10,10) ; * TWO.DIM.ARRAY contains 100 elements
TWO.DIM.ARRAY(1,1) = 'This is a value for element 1,1'
TWO.DIM.ARRAY(1,1) := ' with a bit appended'

(For some reason, the special term vector is sometimes applied to a one dimensional array, and matrix to a two dimensional one.)

You can change the number of elements in a dimensioned array while your program is running, simply by reusing DIM with the same array name. An example follows:

DIM MATRIX(3,3)
MAT MATRIX = 99
DIM MATRIX(4,2)

The state of the MATRIX array is now as follows:

MATRIX

1

2

(3)

1

99

99

(lost)

2

99

99

(lost)

3

99

99

(lost)

4

(unassigned)

(unassigned)

Counting squares in the table above reveals:

Six elements have subscripts valid in both the old and new versions of MATRIX. They are 1,1, 1,2, 2,1, 2,2, 3,1 and 3,2. They all retain their value of 99..

Three elements in the original MATRIX had subscripts outside the range of the new matrix: 1,3, 2,3 and 3,3. The values these elements contained are simply lost, and the memory used to store them is simply returned to the operating system.

Two elements in the new MATRIX are themselves new: 4,1 and 4,2. These elements are 'unassigned'. Reading their values will cause a run time error until some value has been assigned to them.

Dynamic arrays

A dynamic array is simply a long string divided into sections by the occurence of certain delimiter characters.

There are three levels of delimiter, known as the field mark (ASCII character 254), the value mark (ASCII character 253) and the subvalue mark (ASCII character 252). If these names and values seem familiar, it is because they exactly correspond to the markers used to separate fields, values and subvalues in a record in a UniVerse file (see The UniVerse Paradigm).

There is no equivalent to the DIM statement which applies to dynamic arrays. As a dynamic array is merely a string, you can begin just by assigning a value to it. You can use the UniVerse Basic system variables @FM, @VM and @SM to evaluate to field, value, and subvalue markers.

DYNARRAY = '[F1]' : @FM : '[F2V1]' : @VM : '[F2V2S1]' : @SM : '[F2V2S2]'

As a dynamic array is merely a string, you can also use it wherever you would use a string. For instance, you could:

PRINT DYNARRAY ; * Prints [F1]þ[F2V1]ý[F2V2S1]ü[F2V2S2]

(As characters 252, 253 and 254 lie outside the range of ASCII 'printable characters' (32, space, to 126, ~), the way they are displayed will depend on the terminal or printer you are using. The odd characters shown above are dependent on the font used to display this page: they may be different on your own equipment should you run these examples. If you are particularly unlucky, they may even set printer or display modes, and cause the equipment to behave oddly: the truth is that these markers are not generally meant to be displayed.)

You can access the fields in a dynamic array using a subscript in 'vector' or 'pointy' brackets (as I've heard them called) like this:

PRINT DYNARRAY<1> ; * Prints [F1]
PRINT DYNARRAY<2> ; * Prints [F2V1]ý[F2V2S1]ü[F2V2S2]

To access the values in the second field individually, use two subscripts within the pointy brackets:

PRINT DYNARRAY<2,1> ; * Prints [F2V1]
PRINT DYNARRAY<2,2> * Prints [F2V2S1]ü[F2V2S2]

Finally, and predictably, three subscripts allow you to access individual subvalues:

PRINT DYNARRAY<2,2,1> ; * Prints [F2V2S1]
PRINT DYNARRAY<2,2,2> * Prints [F2V2S2]

As you can see, a dynamic array can be used to store up to three 'dimensions' of information. Note also that using 'out of range' subscripts causes no run time error when accessing fields, values, or subvalues: the dynamic array simply yields an empty string if the subscript you are looking for 'doesn't exist'. The code below illustrates this point:

PRINT DYNARRAY<1,2> ; * Prints an empty string, ie. nothing
PRINT DYNARRAY<1000> * * Prints an empty string, ie. nothing
PRINT DYNARRAY<99,472,293> * * Prints an empty string, ie. nothing

The subscript notation can also be used to assign values to a dynamic array, either overwriting existing values, or creating new ones.

DYNARRAY<1> = '[F1V1]' ; * Overwrites field 1 of DYNARRAY
PRINT DYNARRAY<1> ; * Prints [F1V1]
DYNARRAY<1,2> = '[F1V2]' ; * Adds second value to field 1, creating a value mark in the process
PRINT DYNARRAY<1> ; * Prints [F1V1]ý[F1V2]
DYNARRAY<10> = '[F10]' ;* Adds a string of field marks to position [F10] in field 10
PRINT DYNARRAY ; * Prints [F1V1]ý[F1V2]þ[F2V1]ý[F2V2S1]ü[F2V2S2]þþþþþþþþ[F10]

You can insert values into a dynamic using the INS statement:

INS '[NEWFIELD]' BEFORE DYNARRAY<10;>
PRINT DYNARRAY ; * [F1V1]ý[F1V2]þ[F2V1]ý[F2V2S1]ü[F2V2S2]þþþþþþþþ[NEWFIELD]þ[F10]

Note the difference here between using INS to load field 10 and using ...<10>=.... A simple assignment would have overwritten the value in field 10. INS actually inserts a new field at that position, pushing the value [F10] out into field 11.

You can also delete values using DEL:

DEL DYNARRAY<10;>
PRINT DYNARRAY ; * [F1V1]ý[F1V2]þ[F2V1]ý[F2V2S1]ü[F2V2S2]þþþþþþþþ[F10]

The string F10 is now back in field 10.

If you wish to find a string within your dynamic array, you can do so using the LOCATE statement:

* Finds [F10] at field number 10 and prints '10'
LOCATE '[F10]' IN DYNARRAY<1> SETTING POSITION THEN
PRINT POSITION
ELSE
PRINT 'Not found'
END

Note the <1> after the DYNARRAY array name in this code. This indicates that the search should begin with field 1, but more importantly it indicates that the string being searched for is a complete field. Compare the following:

* Fails to find [F1V2] because it is not a complete field, and prints 'Not found'
LOCATE '[F1V2]' IN DYNARRAY<1> SETTING POSITION THEN
PRINT POSITION
ELSE
PRINT 'Not found'
END

To find [F1V2], you would have to search at value level:

* Finds [F1V2] as the second value in field 1 and prints 2
LOCATE '[F1V2]' IN DYNARRAY<1,1> SETTING POSITION THEN
PRINT POSITION
ELSE
PRINT 'Not found'
END

The <1,1> not only defines the starting position for the search, it indicates that the search must find a value within field 1. The general principle remains that you can only search 'one level down': you can search an entire dynamic array for a complete field, a field for a value, or a value for a subvalue. As the result is returned as a single number and not a set of coordinates, it could not be otherwise. How would LOCATE report finding [F1V2] in field 1, value 2, by returning a single number?

LOCATE really comes into its own, however, when dealing with a field only list (containing no values or subvalues): particularly if the fields are in sorted order. Consider the following:

NEWARRAY = 1 : @FM : 2 : @FM : 4 : @FM : 5
LOCATE 5 IN NEWARRAY<1> BY 'AR' SETTING POSITION ELSE PRINT '5 absent' ; * Prints nothing
PRINT POSITION ; * Prints 4
LOCATE 3 IN NEWARRAY <1> BY 'AR' SETTING POSITION ELSE PRINT '3 absent' ; * Prints '3 absent'
PRINT POSITION ; * Prints 3

First, to explain the BY 'AR' clause. This optional clause informs LOCATE that it is dealing with a sorted list: the AR denotes the sort order: ascending right-justified. AL stands for ascending left justified, DR for descending right justified, and DL for descending left justified.

One advantage of the BY clause is that it can speed up your search. If your list is sorted AR, and you are looking for the value 3, UniVerse knows that it can give up the search as soon as it reaches the first string higher than 3: which it does when it gets to 4. It therefore doesn't have to search the entire dynamic array for a value that is not there unless the value it is looking for happens to be bigger than all the fields already in the array.

More importantly, though, it adds an interesting twist to the action of the SETTING clause in the case where the string searched for cannot be found: as the variable following SETTING is set to the position the string would have been in if it had been found. By using INS to insert the string in this position, the dynamic array can be maintained in a sorted order without the need for a sorting algorithm.

This is a very neat and widely used trick. The code runs as follows:

LOCATE STRING IN SORTEDARRAY<1> BY SORTORDER SETTING POSITION ELSE
INS STRING BEFORE SORTEDARRAY<POSITION>
END

This code will maintain a sorted list of STRING values automatically.

Comparison of dimensioned and dynamic arrays

From all this, it will seem that dynamic arrays are vastly more flexible than dimensioned arrays: and they are. They are supported by a LOCATE statement, for which dimensioned arrays have no equivalent, and allow strings to be inserted into the middle of an array by shuffling other strings to the right, which dimensioned arrays cannot do. They do not require each element to be initialised, do not cause run time errors if accessed 'out of range', and do not require resizing to accomodate more or fewer strings: all of which dimensioned arrays do. They even offer three dimensions in place of the maximum of two dimensions supported by dimensioned arrays.

Why, then, should one ever use dimensioned arrays? While there are several reasons, one stands head and shoulders above the others: because they are faster. UniVerse can address a single element in a dimensioned array almost as quickly as a single variable. The fields, values, and subvalues of a dynamic array require clever searches along long strings counting field, value, and subvalue markers to address. While they are cleverly implemented, this will never be as fast as a dimensioned array.

In general, I would make the following recommendations:

1. Where each element you are going to store has a specific and unique meaning, and is wedded to its numeric position, use dimensioned arrays. This is precisely the situation you are in when you read a record from a UniVerse data file. Field 1 of the record may be the customer name, field to the product code, and so on. The values of these fields do not require sorting, and it would be meaningless to 'insert' elements into such an array. The number of elements you will require to hold the record is not going to change during the life time of the program. All these factors suggest dimensioned arrays as the natural home for records read into memory. UniVerse Basic offers two statements to read a record into memory: READ which will read one into a dynamic array, and MATREAD which will read it into a dimensioned array. For most data files, MATREAD is appropriate.

2. Don't store values and subvalues in dynamic arrays: store fields only. By keeping your dynamic array structure flat, or 'one-dimensional', you allow the dynamic array to be searched for values, you simplify the business of inserting and deleting values, and you make your code easier to understand and maintain. Rather than creating an array called INVOICES which stores fields, each of which has an invoice number in value 1 and invoice value in value 2, create two dynamic arrays, called INVOICE.NUMBERS and INVOICE.VALUES. Whenever you insert a field into one, insert a corresponding field into the other. Whenever you delete a field from one, delete the same field from the other. This has the added advantage that as each dynamic array is half the size, it is far quicker to process.

3. In short, use dimensioned arrays for structures built from elements of data with different meanings, and dynamic arrays for lists of a single entity. If you must process lists of structures, combine the two: create a dimensioned array called INVOICES, element one of which is a dynamic array full of invoice numbers (stored as fields), element two of which is a dynamic array full of the corresponding invoice values, and so on.