Easy databases. (data base management systems) (includes related articles and product listing)
by Tom Campbell
Every day, we're engulfed with important details to sort and file away. Phone numbers, appointments, seminars, records, and new ideas flood us. It doesn't matter whether we're small business owners, PTA leaders, club and co-op members, or coin collectors; the Information Age has touched us all. There's so much to keep up with that it can turn the most energetic, organized, dedicated newcomer into a tired, frustrated cynic in no time--that is, unless you have a database management system (DBMS) looking after things for you.
No Reason for Fear
Don't let the word database scare you away. Database managers are easier to use than ever, and if you can draw a picture onscreen, you can use just about any of the popular DBMSs on the market today. In fact, some of them are downright fun to play with. True, a few years ago database managers were forbidding and dull, but now they're one of the most competitive arenas in the software world. Software developers have learned that it takes more than horsepower to bring new users into the fold.
In fact, you may be using a DBMS already. If you run Windows and use Cardfile
as a phone book/dialer, you're using a simple DBMS. Consider the list of BBS
phone numbers and modem settings in your communications program--that's a
DBMS. Spreadsheet jockeys who know their way around @HLOOKUP and @VLOOKUP are
using the database portion of their spreadsheets (remember that Lotus 1-2-3
originally was advertised as a combination plate: spreadsheet, database,
graphics, Coke, and fries). And if you've ever battled with your word
processor's mail merge, you've been dealing with a DBMS, too. (Do you end a
secondary merge field with {END MERGE} or Database Details
Broadly speaking, a DBMS is any program that stores information by category
and lets you get at that information in an orderly manner. Parts inventories,
mailing lists, accounting systems, and order retrieval systems are all
databases and require a DBMS. The categories are normally called fields or
columns. The group of fields (for example, last name, first name, street
address, ZIP code, and so on) is called a database or a table, as illustrated
in the accompanying figure.
Ever notice how Microsoft registration cards are divided into boxes, one per
letter, like this?
Last name:!C!A!M!P!E!L!L!!!!!!!
Microsoft asks you to print and to employ a couple of odd conventions (the
tail of the Q points up instead of down) for its optical character recognition
system, which automatically translates your printing into computer-readable
format. In fact, systems like this can also link directly to a database
manager. You're limited to a certain number of character boxes because of the
limitations of database managers; most of them restrict the amount of
information that can be stored in records or fields. Each customer record is
given only 200 bytes (or 2000), and the fields within are also stored in fixed
sizes (30 characters for the last name, 25 for the city name, and so on). Of
course, the person designing the database determines those sizes.
The reason a DBMS imposes this rather severe limitation is speed--a recurrent
theme in database management systems. Without being able to assume that
records are of equal size, a DBMS would have no choice but to search through
each record until it found the one it was looking for. It's much, much faster
for a DBMS to know that, in a database with 120-byte records, record 100 is at
position 12000 in the file. Both hard and floppy drives are programmed to be
able to position anywhere on the disk in a very short time.
This speed obviously comes at the cost of disk space. Most American surnames
fit into 10 or 15 characters, but foreign and hyphenated names can be much
longer. Do you want to alienate your customers by starting letters to Mrs.
Adamkiewicz-Stanislaw with "Dear Mrs. Adamkiewic" because you can't afford the
50-percent slack such generosity would cost with your 20MB database, or do you
want to eat the disk space and shell out $600 for a new drive? Most databases
for small businesses, clubs, or home use turn out to be of an easily
manageable size. If you have 100 people in your community band and each record
uses 200 bytes, the database will be less than 20K in size--easily small
enough to fit on even the humblest floppy disk. If you run a stationery store
with 6000 customers on your mailing list for offices in the surrounding urban
areas and each customer record fits in 220 bytes, your database will weigh in
at only 1.3 megs and will still back up to a single high-density 3 1/2-inch
disk.
Many DBMSs that use fixed-length records store only one record in RAM at a
time, leaving the rest on disk, so the database capacity is limited by disk
space instead of RAM. This contrasts to word processors, which are often
limited to available RAM or, worse, the 64K-segment limitation of the
15-year-old 8088 microprocessor that came with the orginal IBM PC.
Most database managers allow the information they store to be restricted. For
example, a numeric field allows for storage only of digits, plus signs, minus
signs, and decimal points. A character field can store any information that
can be printed--letters, numbers, punctuation marks, whatever. A date field
contains only allowable numbers for months (1-12), days (1-31), and years
(typically 1900 and up). A logical field can only store T for Ture and F for
False (or sometimes Y for Yes and N for No).
Restricting the kind of information that can be entered into a field confers
several advantages to the DBMS user. First, most databases automatically force
the user to input only that type of data, so an absent-minded data entry
person won't be able to enter ZIP code (numeric) into the state field
(character). Second, it speeds indexing.
An index is a copy of one or more key fields (fields you want to sort by, for
example, last name or ZIP code) that's designed for very fast access. If you
index a file by its ZIP code, the database manager can create a separate index
file with a copy of the ZIP code for each record stored in a quick-access
format. Computers can sort numbers faster than letters, so the indexing
mechanism can take advantage of that trait and convert its copy of the ZIP
codes, which are stored as printable characters in your record, into binary
values for the index file. That way, printing your customer list by ZIP code
will be sped up enormously, perhaps a hundred or a thousand times.
The designers of dBASE III Plus recognized that some databases would do well
with the ability to store free-form information, at the cost of limiting how
that information could be searched. For example, if you want to store a log of
a client's dealings with your customer service department, you'll quickly
realize that most records leave this field empty, but the ones for which it is
used can easily spill over your database's maximum record size of 4000
characters. The answer to this problem is memo fields, pioneered by dBASE.
Memo fields take up ten bytes per record no matter what. That's the bad news.
The good news is that those ten bytes are used to refer to a location in a
separate memo file that can store up to 64K of information per memo field.
(FoxPro and some other dBASE-compatible databases manage to allot even larger
memo fields.) Memos can't be sorted or used in indexes, because indexes copy
the contents of each indexed field and they can't be searched easily--but when
you need 'em, you need 'em bad. Many database managers that claim to be dBASE
compatible don't support memo fields, whereas others, such as FoxPro and
Clipper, go dBASE one better by offering beefed-up support for memo fields.
Fox for the Mac, for example, uses memo fields to store MacPaint images!
Family Relations
There are two kinds of fixed-length DBMSs (that term includes database
managers that use memo fields): flatfile and relational. A relational DBMS is
one that lets you use several databases at once, connected by common fields.
An example is a customer list that's related to accounts receivable by
customer ID. Scroll through the customer list, and you'll see how much each
customer owes you because the DBMS knows to look up each account as you
scroll. (The inventor of relational database design theory, Edward F. Codd,
rightly disputes this simplified example and wrote an entire book explaining
what relational databases really are, but we're using the popular definition
here for the sake of discussion.) Database managers without this ability are
known as flat-file managers, because they use only one database at a time.
In general, relational database managers are much more capable than flat-file
managers and by definition can do everything flat-file managers can, but
they're usually more expensive and more difficult to learn.
Indy Car or Family Sedan?
A DBMS that uses fixed-length records isn't the only game in town, of course.
You might think of it as the family sedan of databases, which can do just
about any task you can throw at it with varying degrees of success--big
capacity, reasonably fast sorting and indexing, pretty flexible, and nicely
paired with the performance characteristics of a disk drive. Three other
varieties of DBMSs exist--network, hierarchical, and object-oriented, but they
aren't popular on PCs. A fifth variety, free-firm databases, has been around
for some time and occupies a small but solid niche in the PC world. Lotus
Agenda and AskSam are well-known examples of free-form databases.
While you can make free-form databases look like traditional row-and-column
database managers, why bother? Their true power is precisely in not forcing
you to such a narrow organizational paradigm. Instead, you're encouraged to
make up freestyle associations; Agenda knows that the phrase next Wednesday is
the same as December 9 and can spot other such connections without your
explicitly asking. Free-form DBMSs usually store all information in RAM, so
they can be sharply limited in the amount of information they store.
Two hybrid DBMSs are HyperPad, a sort of character-mode HyperCard for DOS, and
ToolBook, a Windows application that also looks like HyperCard. Fields aren't
typed, essentially doing the job of both character and memo fields, and they
may contain up to 32,000 characters. But nonetheless, they can be sorted and
searched, and they don't waste disk space. Each comes with a wonderfully rich
programming language and terrific sample applications that you're encouraged
to modify for your own use. HyperPad boasts much snappier operation and works
well even on the lowliest 8088 DOS machine. ToolBook is unpleasant on anything
less than a 386 with four megs running at 33 MHz. If you plan to sort a
10,000-name mailing list in ToolBook, plan to leave your computer running
overnight, but that's not really what ToolBook is for. If you want to
prototype a Windows application or put together a visual database fast,
ToolBook is without peer. Both HyperPad and ToolBook are stable, mature
products.
Gone Fishing
With all the options available, you're sure to find a DBMS that suits your
style and needs. The most difficult challenge database managers pose may well
be choosing the best one for your needs. If you find this to be the case,
consider setting up an appointment with a computer consultant, who will
evaluate your habits and system capabilities. But don't delay. Information
keeps pouring into your life; the sooner you get it managed, the sooner you
can take that vacation you've been putting off.