Variable speech. Herb Friedman.
Like the computer, the human brain can assimilate information much faster than information can be fed in. In fact, when listening to speech, the brain works at about one-half to one-third of capacity and it gets bored, often causing the listener to lose track of what is going on. Experiments have shown that the brain works most efficiently if the information rate through the ears--via speech--is the "average" reading rate, which is about 200-300 wpm (words per minute), yet the average rate of speech is in the neighborhood of 100-150 wpm.
In short, the brain works at about one-half of its assimilation capacity when listening to speech; whether it is a classroom lecture, an educational cassette tape, or even a TV show. Experiments have also shown that the brain trends to wander when working well under its capacity, so the listener often ends up tuning out altogether. Speed Increases Understanding
Comprehension is not only increased if we speed up the rate of speech, but we can assimilate two or three times the amount of information in the same time. Instead of listening to an educational tape for one hour, by doubling the speech rate--which is called "speech compression"--we could hear the same information in only 30 minutes, and the brain would comprehend more because it wasn't watching the clouds drift by.
In fact, this is exactly what is done in some TV commercials and by many large companies such as IBM and Sperry, who use "speech compression" in their training tapes.
Their trainees spend 50 percent or less of the normally expected time listening to tapes. For example, they actually spend less than 30 minutes listening to tapes it took an educator 60 minutes to record. In the case of TV commercials, speech compression allows the sponsor to almost double the size of the sales pitch he can throw at you: in effect, he is broadcasting two commercials for the price of one.
You might expect that speech compression, which is technically termed VSC, for Variable Speech Control, can be done by simply increasing the playback speed of a tape recording perhaps two or three times. But when you do this, the recorded frequencies and the "rate of speech" are increased proportionately to the increase in tape speed. If the tape speed is doubled, the rate of speech and frequency response is doubled, producing the "Donald Duck" effect in which the voice gets so high-pitched and rapid it becomes impossible to comprehend. Increasing the playback speed, therefore, is not the way to compress speech. Trimming Information
The way to effect VSC without affecting either the frequency response or the rate of speech is actually to remove small sections--snippets--of information. Research has shown that if minute bits of information are randomly removed from a string of words, the brain achieves from full to 80 percent comprehension--the exact degree of comprehension determined by the amount of information removed. In fact, early experimenters in speech compression physically removed the snippets when running their experiments. They would record a string of words on tape and edit random snippets with a razor blade.
Now if snippets--no matter how small--are removed from the recording, the total length of time is reduced. If the snippets add up to 50 percent of the original tape length, the amount of time required to hear the edited tape is reduced by 50 percent, yet there is no apparent increase either in the rate of speech or the frequency range.
Even though the speech compression is 50 percent, the playback sounds natural to the listener. On the other hand, if we had attained 50 percent compression by doubling the playback speed of the original tape recording, the reproduction would sound like a chattering chipmunk.
In fact, VSC is so effective that it can be used for other things besides speech. As example, JVC uses the technique in their Vidstar model 6700 video cassette recorder (VCR) for a past scan of the TV picture. Though the tape is running at a fast-wind speed the user can view a reasonable facsimile of the picture, rather than a "hash" of color streaks. How It Works
Obviously, no one is editing every tape with a razor blade to make a VSC recording--certainly not video tape. Using digital technology and large scale integrated circuits, VSC is done electronically, and at a budget price. A complete VSC system can be packaged on a small printed circuit board, as shown in the photographs. In fact, a VSC system can be integrated within the cabinet of an ordinary portable cassette recorder, and the entire device can be retail priced at less than $200.
The first thing that is done to electronically reduce the playback time is to reproduce the cassette recording at a higher speed than it was recorded.
Let's assume that twice the speed is an X2 factor that doubles the pitch (frequency) of everything recorded on the tape. Next, we feed the X2 playback through a preamplifier and on to a VSC controller that removes snippets of the signal as shown in the chart diagram.
Note that for illustration we show cycles (from recorded tone) originally recorded in 40 milliseconds. On playback the eight cycles reproduce in 20 msec. The VSC controller electronically removes four of the eight cycles, leaving us with an electrical signal (still within the VSC) of four cycles, and a four cycle space, all utilizing 10 msec.
If we were to feed this signal out of a speaker it would sound like someone was strangling a chipmunk. Before the signal leaves the VSC, however, it is passed through a BBD, bucket brigade device that serves as a time delay which literally stretches the 10 msec. signal out to 20 msec. Whatever is left of the signal gets stretched back to its original frequency but the reproduction time now is only 20 msec., one-half the original recording time of 40 msec. The "stretch" fills in most of the gap between cycles four and nine; the remaining gap is extremely small and is not noticeable if the compression rate is 50 percent or less.
Now this might appear complex, or even incorrect at first reading, but if you re-read the foregoing while referring frequently to Figure 1, you will see that we have taken a signal that originated in 40 msec. and reproduced it at the same frequency in only 20 msec.
Imagine if you will an entire speech processed through the VSC. It too would be reproduced in half the time at the same rate of speech. No Donald Ducks, no chipmunks; just twice the information in half the time. Only if VSC is attempted at a factor greater than X2.5 would the speech become choppy, with a loss of comprehension.
Just as VSC can be used to compress speech, so too can it expand speech, up to 100 percent. In other words, if a speech takes 60 seconds to record, VSC can take up to 120 seconds to reproduce it. This time, instead of speeding up the tape, creating a gap, and then tightening up the gap, the tape is placed at, say, one-half speed, thereby halving the recorded frequencies. The VSC unit then eliminates half of each sample, creating a gap while restoring the original frequency.
At this point we have a restoration of the original waveform with gaps in between. The VSC then repeats the same waveform in the gap: in essense, everything is being repeated. The ear hears the original at the original speech rate, but now it takes twice as long to listen from beginning to end. Of what value is expanded speech? We'll get to that soon. Servo Control
It should be obvious that the electronic frequency restoration of the VSC must in some way be tied to the increase (or decrease) in tape playback speed. After all, if the tape has been set for a X1.5 speed increase and the VSC is restoring the playback frequency at a X2.0 ratio the output sound is going to be very bassy. The problem is eliminated by simply tying the tape speed to the VSC restoration controller through a servo or "tachometer" cassette drive motor, similar to those used for high fidelity speed-controlled turntables. The motor has a built-in tachometer that sends a feedback signal to the VSC electronics.
The VSC compares the tacho feedback with its control signal to the drive motor and makes the necessary correction to insure the frequency restoration is directly proportional to the tape speed. If, for example, you are using a VSC Corp. Speec Controller cassette recorder, you will find a single control calibrated for a VSC rate of 0.6 to 2.5 sets both the cassette and decoder rates, which are interlocked. If one drifts slightly the other drifts the same amount, thereby maintaining frequency stability of the output sound ([plus-or-minus] 1%). Talking Books
Earlier, we mentioned a use for expanded speech, which implies other uses than simply listening to an educational tape or a talking book. The same principle of voice expansion and compression can be used to change the pitch of an input signal, say to make it easier to understand the speaker.
For example, VSC can be used simply to lower or raise the pitch of a voice. This is easily accomplished by opening the servo lock between the drive motor of the recorder and the VSC. Then the controller of the VSC is used to move the sound up or down in frequency because the motor speed will not change as the VSC is adjusted.
Among the other uses for VSC are: film and video editing (for comprehensible audio during high speed scanning); high speed transmission of signals through limited bandwidth circuits (the signal can be "pushed" or moved into the most effective frequency range); time-compression of radio and TV announcements; movement of incoming sounds into an intelligible range for persons with frequency-selective hearing loss...and, well, the uses of VSC are limited only by your imagination.
For additional information on VSC products and a complete PC board assembly suitable for experimentation write to The Variable Speech Control Corp., 185 Berry St., San Francisco, CA 94107.