Computers: Universe: Indexing Fields

So far in this course, the enquiries you have been running have been extremely simple. They haven't applied any criteria to reduce the number of records displayed: you have been displaying all the records in each file you've enquired on.

Later, you will discover how to list only records which match certain criteria: such as (for example) all the CUSTOMER.INVOICES made out to a particular client, or for over a certain amount. If you apply such a condition to your enquiry, UniVerse simply reads all the records on the file, and includes only those which are appropriate. But what if your file contained 20,000 records and you were only expecting 3 or 4 to be displayed? You'd be waiting a long time for a very short enquiry: long enough for UniVerse to read 20,000 records, and decide that 19,997 of them aren't needed.

This is a problem common to all databases, of course. To get round it, most databases allow you to create indexes (or, as some people have it, 'indices'), Roughly speaking, an index is a file turned upside down: you use the value of a chosen data field as the key, and then read the keys corresponding to that value as data. For instance, an index on the CUSTOMER.NAME would allow you to use the name 'United Boxgirders' as the key, and read all the CUSTOMER.INVOICE keys which carry this company name.

UniVerse supports a rather nice indexing system: reasonably efficient, and (compared to some) very simple to use.

First, we need a file to play with. We'll create a copy of the VOC file to experiment with:

>CREATE.FILE COPY.OF.VOC Creating file "COPY.OF.VOC" as Type 30. Creating file "D_COPY.OF.VOC" as Type 3, Modulo 1, Separation 2. Added "@ID", the default record for RetrieVe, to "D_COPY.OF.VOC". >

Then we need to load it with records. We can make copies of all the records in the VOC file using the COPY command. Though we wont be covering this in detail until later, its format is fairly transparent:

>COPY FROM VOC TO COPY.OF.VOC ALL 825 records copied.

(The number of records in your VOC file may be different, of course).

As all the records in the COPY.OF.VOC file are the same as those in the VOC file, they must have the same overall structure. If they have the structure, they can use the same dictionary. You can therefore copy all the records from the VOC dictionary to the COPY.OF.VOC dictionary to. You must add the keyword OVERWRITING though, because there is already an @ID record on the dictionary of COPY.OF.VOC which was added automatically during the CREATE.FILE (see above), and you want this overwritten, not preserved.

>COPY FROM DICT VOC TO DICT COPY.OF.VOC ALL OVERWRITING 19 records copied.

You now have a COPY.OF.VOC file that is almost identical to the original VOC the only differences being between their names (of course), and their file types: as the VOC is static, while COPY.OF.VOC is dynamic.

Imagined that you wished to list only the menu pointers, or 'M' type records, in your COPY.OF.VOC file (see Other VOC Records). You can do this by adding a WITH clause to your LIST command. (WITH hasn't been explained yet, but it is not difficult to see the basic idea here: it is described in more detail later in the course).

>LIST COPY.OF.VOC WITH TYPE = 'M' LIST COPY.OF.VOC WITH TYPE = "M" 15:47:52 08-18-99 PAGE 1 NAME.......... TYPE DESC.......................... CONV.ACCT.MENU M Menu - UniVerse Account Importation PRINT.ADMIN.ME M Menu - UniVerse Spooler NU Administration BASIC.CONT M Menu SP.JOB M Menu CONSYS.MENU M M SP.MOD M Menu ACCT.CONV M Menu SP.STATUS M Menu MENUS M MENU selector - Invoke the MENU maintenance menu 9 records listed. >

In order to display these results, LIST had to read every record in the VOC file: all 800 or so of them, simply to reject 791 as irrelevant. On most systems, the delay while this happened would have been imperceptible, but if you are running on modest hardware you may have noticed a pause before the enquiry finished. But no matter how powerful your computer, by the time this file reached 80,000 or even 800,000 records in size, you'd have noticed the delay. If the number of M type COPY.OF.VOC records remained in ration, you'd now be reading 800,000 records and rejecting 791,000 of them.

To avoid this, you can create an index. The command to do so is extremely simple:

>CREATE.INDEX COPY.OF.VOC TYPE >

The command is extremely quiet: you are simply returned to the > prompt once it has done its thing. You have created the index, but it is initially empty. To bring the index up to date:

>BUILD.INDEX COPY.OF.VOC TYPE Locking 'COPY.OF.VOC' file for exclusive use. Starting SSELECT for file 'COPY.OF.VOC index TYPE'. Compiling "@Ak.0". IF F1 [ 1 , 1 ] = P THEN F1 [ 1 , 2 ] ELSE F1 [ 1 , 1 ] Compiling "@INDEX.TYPE". IF F1 [ 1 , 1 ] = P THEN F1 [ 1 , 2 ] ELSE F1 [ 1 , 1 ] ; @1 : ( char ( 251 ) ) : @Ak.ID 825 record(s) selected to SELECT list #0. Clearing Index File INDEX.000 Starting DATA processing for index 'TYPE'! ********************************* 825 total processed. Updating INDEX.MAP flags... Index build of TYPE complete. File 'COPY.OF.VOC' Unlocked. >

As you can see, by contrast BUILD.INDEX is rather noisy. Thankfully, though, you can ignore it's rather cryptic output. The index has now been brought up to date.

But the best has yet to come. From now on, this index will be updated automatically whenever a record in COPY.OF.VOC is added, changed, or deleted. You need make no changes to your application: even UniVerse Basic programs which update COPY.OF.VOC written before the index was created will automatically update the index whenever they write to COPY.OF.VOC. In fact, these indexes require so little effort that its easy to forget that they are there, which is sometimes a problem, as old, little used, or obsolecent indexes are not always removed, but (like all indexes) still require disk space to store and CPU and I/O time to keep up to date.

It's a good idea to periodically check the indexes on a file: particularly if you are about to add a new index to it. You can do so using LIST.INDEX:

>LIST.INDEX COPY.OF.VOC ALL Alternate Key Index Summary for file COPY.OF.VOC File........... COPY.OF.VOC Indices........ 1 (0 A-type, 0 C-type, 0 D-type, 1 I-type, 0 SQL, 0 S-type) Index Updates.. Enabled, No updates pending Index name Type Build Nulls In DICT S/M Just Unique Field num/I-type TYPE I Not Reqd Yes Yes S L N IF F1[1,1]='P' T HEN F1[1,2] ELSE F1[1,1] >

Finally, if you do need to remove an index, the syntax runs:

>DELETE.INDEX COPY.OF.VOC TYPE Removing index file TYPE. >

So how do you take advantage of your index? In most cases, you need do nothing. An enquiry like the one you used above, LIST COPY.OF.VOC WITH TYPE = 'M', will automatically consult the TYPE index if there is one. This also means that any existing application code does not usually have to be modified to take advantage of the new indexes: it will use them if it finds them and gain the performance improvements automatically.

I say 'usually' because there are a few UniVerse Basic statements which are designed to consult indexes directly, and will only work if they are there: you will learn about these later. But these are purely optional: any existing query, whether executed from UniVerse Basic or from the command line, will automatically use indexes if they will help.

Given that they are so easy to create and maintain, why not just index everything and have consistently speedy enquiries? Indexes do carry two costs:

1. Updating an index takes a certain amount of CPU and I/O time. Each index you add increases the amount of work your system has to do when it writes a record.

2. Indices occupy disk space: a file with many indices may grow to several times its original size, and similarly multiply all its costs of storage, maintenance and backup.