The Beginner's Page
Richard Mansfield
Assistant Editor
Checksum, Terabytes, And Disaster Avoidance
In many ways, your brain is an ideal data storage device. It is in a dust-free case, it can hold an estimated twelve-and-one-half-million terabytes (12,500,000,000,000,000,000 eight-bit bytes. An average microcomputer disk holds about 170 thousand bytes), it self-regulates temperature, and it uses about the same amount of electricity as a twenty watt lightbulb. All in all, an impressive memory.
Until we can manufacture memory devices of this excellence, we will have to follow some rules to make sure that our data and programs are safely stored on tape or disk. Most of our computers rely on memory chips which hold only a few K. The "K" means kilobyte, 1,024 bytes. This is not much, really. One kilobyte could hold about 175 English words; less than a double-spaced, typewritten page. To hold this page of COMPUTE! we would need about 6K RAM. In an 8K computer, that would leave little space left over for a word processor program to allow corrections, additions, and everything else.
The future of memories looks bright though. 64K on a single chip will be available to us fairly soon—even greater densities, at lower prices, seem inevitable. In fact, there is a possibility that memory cells might actually be grown, like mushrooms. Efforts are now underway to create protein memory cells. But, for now, we must do without unlimited, inexpensive memory. For now, we compose programs and enter data into a limited RAM and then SAVE what we've created onto cassette tapes or disk drives.
The word SAVE implies a kind of safety, a secure storage. It can be secure, but you should observe some precautions. Last month we looked into the management of files. Normally, a file of data is typed into the computer, SAVEd as a file, and then used by a program or programs. The data is kept on a disk or a tape because the computer wipes its RAM memory clean each time power is turned off or each time a new program or set of data comes in.
Backup
Redundancy is an important feature of SAVEing. On your part, this means keeping a backup copy of each program or file. When you write a program (or buy one), the first thing to do is to make a second tape/disk copy of it and put it in a cool place in a dust-free, plastic box. Dirt, smoke, heat or extreme cold, and the oils on fingers are all enemies of magnetic data because both tape and disks are a thin plastic which is easily deformed.
Another danger is vacuum cleaners, TVs, or nearly anything which uses electricity and can generate electric fields. This can remagnetize (erase) tapes and disks. So you cannot safely put a cassette on top of a TV or a refrigerator.
Computers can help us by using their own redundant method of data backup. When a program is sent to a tape machine, some computers record the entire program twice. Then, when the program is LOADed back into the computer, the two versions can be compared. The computer then can use the "best" version if there are differences. How does it know which is best?
Data is transferred very fast and many things can degrade it. Often, a checksum is used to see if the data made the trip intact. There are various checksum schemes, but here's a simple one. Imagine that we were sending the word face to a cassette. The computer would send the numbers 70–65–67–69–271. The letters of the alphabet are each given a code number in computing (the ASCII code). Uppercase A is the number 65, B = 66, C = 67, D = 68, E = 69, F = 70 and on up. Computers work only in numbers. The word face means nothing to the computer—it is merely a pattern of numbers. It can print the pattern, alphabetize it (which, to a computer, is merely putting the numbers in numerical order), search for it in a paragraph, and all the rest—without ever thinking of the word as anything other than a particular number sequence.
So, it is easy to see why the computer sends 70 65 67 69 271 to the cassette. The number 271 is the sum of the previous numbers. While sending them to the tape, the computer is also adding them up and sending the total at the end. Then, when LOADing, it also adds them up and checks its sum against the one that comes in from the tape. If the sums are not the same, then there was an error in the data transfer. An error of addition is nearly as impossible for a computer as taking a wrong turn would be for a roller coaster. It has been known to happen, but we can be almost certain that it will never happen to you. The computer can be virtually sure that mismatched checksums are the result of bad data on the tape.
This is how it knows which is best of the two versions it recorded on tape. If version one had a bad checksum on the word face, but a good checksum on the word lift it could keep the word lift, but wait for the word face in the second version. Checksums are done on longer samples than individual words, but the technique is the same.
Computer Wrestling
All of this is an effort, by the computer as well as by the computerist, to protect data. If you make a backup and the computer makes two versions—there are four copies of a program or file. There are two more ways to prevent problems: scratchpad SAVEs and respect for your computer.
When you write your first database program you might want to consider what you are up against. Building a database means typing in lots of records. You do not want to do it twice. Last month we set up a database management system which would permit instant indexing of COMPUTE! articles by author or by topic. If you are planning to type hundreds of records (each subject-author-issue number is a record) you don't want to work for hours only to have a fuse blow or someone trip over the computer's electric cord. In a flash, your data is destroyed.
To avoid this, it's a good idea to keep a cassette or disk which is labelled "SCRATCH." It is a temporary scratchpad which is left in the tape or disk drive and SAVEd to every half hour while you rest your fingers.
Finally, the machines themselves, the computers and disk/tape drives, deserve respect. This means gentle treatment. We all know someone who has problems with machines—knobs break off, keyboards malfunction, things jam and fail. They are frustrated by constant "bad luck" with machinery, but, if you watch them make a tape copy, you'll see what's wrong. They move quickly, they force a balky lid down, they fight their machinery. To further compound the problem, this same personality type usually avoids instruction manuals. They don't learn that placing electronic devices in direct sunlight, transferring finger oil via disks to drive heads, plugging in peripherals with the power on all invite disaster. We all have our faults, but computer wrestling is an expensive fault. Repairs are slow and expensive. Computer technicians are in short supply.
Transistorized devices are among the most reliable machines man has ever built. A bit of caution and care will keep your data intact and your machine out of the repair shop—until we can buy those disposable terabyte protein box memories for $l.