All About PET/CBM Character Sets
Louis F. Sander
Pittsburgh, PA
Commodore's unique assortment of graphics characters, combined with their numerous ROM sets and keyboard configurations, make the various PET and CBM character sets maddeningly hard to comprehend. Occasional inaccuracies in published documentation have confused the situation even more. But as of today, the mystery is over — this article describes all the PET/CBM character sets in specific detail, and shows how they relate to each other and to the standard ASCII character set used by many other manufacturers. Such information will be useful to any PET/CBM owner wanting to get past the beginning stages of programming, and will be invaluable to anyone using the IEEE bus or the user port to communicate with a non-Commodore device.
First, some definitions. Many computer devices can display a group of symbols, or characters, on paper or on a CRT. The symbols so displayed are called printing characters, and they consist of letters, numbers, punctuation marks, special characters, etc.
Within a given piece of hardware, each character is represented by a pattern of bits, which can be stored, manipulated, and transmitted electrically. The binary numbers corresponding to these bit patterns are called character codes. In the PET and CBM, all character codes are 8-bit binary numbers, and they are usually referred to by their decimal equivalents.
For example, the code for a PRINTed asterisk (*) is 0010 1010 binary, or 42 decimal. Eight bits allow 256 different codes, which can be represented as decimal numbers in the range 0-255 inclusive. A given code can represent different characters in different machines, or even within one machine, depending on context. In the PET/CBM, for example, different codes are used to put a given character on the screen by PRINTing or by POKEing.
Some character codes do not represent a printed character at all. Instead, they instruct the hardware to take a certain non-printing action. These codes are called control codes or control characters. RETURN, CURSOR DOWN, and RVS are some familiar PET/CBM control actions. If you have ever made your machine do a RETURN by executing the statement PRINT CHR$(13), you have used a control code (the 13) to generate a control action (the RETURN).
A device's character set is its complete set of printing and control characters, along with their associated codes. Many computer devices use a standard character set called ASCII, pronounced ask-ee, which stands for American Standard Code for Information Interchange.
ASCII and the PET/CBM character sets have quite a bit in common, but there are large differences between them which have to be resolved whenever a PET/CBM is to communicate with an ASCII device. The information in this article will allow you to resolve these differences quickly and accurately in your own programs.
The Printed Set
Now let's look in depth at the PET/CBM character sets. To keep things simple, we'll first investigate the printed character set, or the complete set of symbols that PET/CBM can display on its screen. The Character Set Demo program will allow us to do just that. Type it in and RUN it right now, being sure to include the semicolon at the end of line 210. If you have an 80-column machine, you need to substitute line 310 for line 200.
If everything has been entered properly, you'll see 256 evenly-spaced characters on the screen. You'll also see the notation "59468 = 12" (or 14), indicating the current contents of memory location 59468. Press any key several times, and observe that the notation alternates between 12 and 14, and, as it does, some of the displayed characters alternate as well. As you press a key, the demo program is changing the contents of 59468, and PET/CBM is changing certain printed characters as that happens. No character codes are being altered at all.
We are demonstrating that every PET/CBM has two sets of printing characters. A given character code will produce characters from one set or the other, depending on a number POKEd into 59468. A 12 in that location produces what is often called the "standard" set of printing characters. It is the same in all PET/CBMs, and we will call it Character Set S, for "standard." POKEing 59468 with a 14 produces what is often called the "alternate" character set. This nomenclature is ambiguous, because there are two different alternate character sets. Which one you have depends on the ROM set installed in your machine. In this article, we'll call the alternate character set installed in the Original PET ROMs Character Set A0, for "alternate, original," and the alternate character set in all other ROMs Character Set A. These two alternate sets contain the same characters, but in a different order, as we will see later on.
About 75% of the characters in all three sets are identical. Sets S, A, and AO differ only in the characters produced by the alphabetic keys A through Z, and in four other characters, all graphics. At "power on," graphics keyboard PET/CBM's have Character Set S enabled, while machines with business keyboards have Character Set A.
Character sets can be switched by POKEing 59468 with 12 or 14, or with other numbers as well. Numbers having a binary representation of the form XXXX 110X will produce character set S, while any other number will produce your machine's alternate character set. In machines having the GRAPHIC and TEXT commands, these can also be used to switch character sets.
Now back to the demo program. Without touching the keyboard, study the characters displayed on your screen. Notice that there are 256 characters, all different, and that the first 128 of them are repeated in reverse field to make up the second 128. (There may seem to be two identical SPACE characters, but there aren't – the second one is SHIFTED SPACE, and your computer treats it as a separate character altogether.) This is the complete set of printing characters from the currently activated set. In other words, you are looking at every character your machine can display at this moment.
Now press any key and study the characters in the other set. Again, there are 256 unique symbols, 128 regular and 128 in reverse field. Press a key several times, and notice which characters change as the character sets are toggled. If you count them, you'll find 60 characters that change – 30 regular and 30 reverse field. Note which ones they are, and notice that certain combinations of characters can never be on the screen at the same time, the HEART and the lowercase "s," for example.
You have now seen every character that your machine is able to/display. All other PET/CBM's have the same printing characters, but in some machines they are gotten at in a slightly different way. Altogether, there are 316 different characters, 256 of them available at any one time.
Since we've now looked at the complete repertoire of printing characters, let's look further into character codes, the other part of the character set. A character can be displayed on PET/CBM's screen in one of three ways: by POKEing a code into a screen memory location, by pressing a key, or by executing a PRINT statement. Additionally, your machine can send characters to, or receive them from, devices connected to the IEEE, user, recorder, and memory expansion ports. In every case, character codes are used to specify which character is to be displayed, recorded, or transmitted.
The Screen And CHR$ Sets
Our demonstration program POKEd characters to the screen, using the 256 character codes from 0 to 255 inclusive, which produced 256 different printed symbols. POKEing a 1 gave an A, a 2 gave a B, and so on through all the printed characters. This particular combination of codes and characters is valid only for screen POKEing, and is summarized in Table 1. We'll call it the Screen POKE Character Set.
All other character manipulation in the PET/CBM uses a completely different group of codes to print these same characters, and it is summarized in Table 2. Many of the printing characters and control functions in this set can be activated directly from the keyboard, and all of them can be activated by using the CHR$ function. We will call this the CHR$ Character Set.
Some people call it PET ASCII, but that terminology is misleading – PET/CBM's CHR$ character set has twice as many codes as ASCII, and only about half of the 128 ASCII codes have the same meaning in the ASCII and CHR$ character sets!
All PET/CBM keyboard and PRINT operations use the CHR$ character set; it is also used whenever characters are sent to or received from external devices such as printers, files, or modems. If you tell PET to send an asterisk to your printer, it will, in fact, send 0010 1010, or 42 in decimal notation. And whenever PET receives a 42, whatever the 42 may have represented in the sending device's character set, PET interprets it as an asterisk.
There are 256 CHR$ codes, numbered from 0 to 255 inclusive, and the CHR$ character set differs substantially from the POKE set, although both can be used to display the same symbols. Here are the essential differences:
- Very few characters have identical POKE and CHR$ codes.
- There are no CHR$ codes for reverse field characters. Instead, the RVS ON/OFF key or its corresponding CHR$ codes are used to produce them.
- The CHR$ set includes 14 control characters (28 in 80-column machines and newer 40-column machines) in addition to its 128 printing characters.
- Since there are 256 CHR$ codes, and only 128 + 14 = 142 CHR$ characters (156 in some machines), many of the CHR$ codes have no meaning at all in the PET/CBM, and in many cases one printed character has two different CHR$ codes!
TABLE 1 – SCREEN POKE CHARACTER SETS FOR PET/CBM
TABLE 2 – CHR$ CHARACTER SETS FOR PET/CBM
TABLE 3 – AMERICAN STANDARD CODE FOR INFORMATION INTERCHANGE (ASCII)
Table 3 shows the standard ASCII character set. It is presented in a similar format to Table 2, to facilitate comparison of the ASCII and PET/CBM character sets. Study the two tables carefully, and you'll see that PET/CBM has all but seven of the ASCII printed characters, (94-96 and 128-126), but often with different character codes.
You'll also notice that ASCII, being a seven-bit code, has no character codes above 127, and lacks many of PET/CBM's printing characters.
Because there are so many ASCII control codes, most ASCII keyboards use a special CONTROL key, similar to the SHIFT key, to generate them. CTRL A often sends a 1 code (SOH), CTRL B a 2 code (STX), CTRL C a 3 code (ETX), etc. Also, the meanings of the ASCII control codes, established with commercial message traffic in mind, are almost completely foreign to PET/CBM.
No wonder it's sometimes hard to use non-Commodore devices with your machine! But now that you have Tables 2 and 3, you can write programs for perfect conversions between ASCII and PET/CBM codes. Table 2 shows you exactly what code PET/CBM sends when a given character is transmitted, and Table 3 shows you exactly how an ASCII device will interpret that code. Conversely, Table 3 shows you the intended character representation of every ASCII code your machine receives from outside, while Table 2 shows which code it has to be converted to to have the same representation inside your PET/CBM.
Some Example Conversions
A few examples will illustrate the conversions. Suppose that your PET, with Character Set A enabled, is connected through a modem to an ASCII terminal, and that you are sending messages back and forth. The ASCII terminal sends the lowercase letter "a." Table 3 shows that the code actually transmitted will be 97 decimal, or 0110 0001. If your PET receives that code and displays it on the screen as a PRINTed character, Table 2 shows that it will be displayed as an exclamation point!
So you'll need some software in your PET that converts received ASCII input to CHR$ format before displaying it. In this case, whenever PET receives a 97, the program should convert it to a 65 before PRINTing it. Of course, the program should also be smart enough to convert (or not convert) any of the other ASCII codes between 0 and 127 so that they give the proper display on your PET.
Going the other way, suppose that you press the unshifted "b" key on your PET, and want the distant ASCII terminal to see it as a lowercase "b." Table 2 tells us that your PET will send a 66, which Table 3 tells us the ASCII terminal will interpret as an uppercase "B," which is not at all what you want. So your program has to convert the 66 to a 98 before transmitting it, and to make conversions on any other transmitted characters where it's appropriate.
If you study Tables 2 and 3, you'll be able to determine every sending and receiving conversion, and to write your programs accordingly. If the remote device has a character set different from standard ASCII (many of them do), all you need to do is compare it to Table 2, and you'll be able to program the conversions.
100 REM *** CHARACTER SET DEMO *** 120 REM 130 REM SHOWS EVERY PET/CBM CHARACTER 140 REM (KEY PRESS CHANGES CHAR. SET) 150 REM 160 PRINT "{CLEAR}" 170 FOR CH = 0 TO 255 180 POKE(32768 + 2 * CH + 40 * INT(CH/20)), CH 190 NEXT CH 200 FOR I = 1 TO 23 : PRINT : NEXT 210 PRINT TAB(32) "59468=1"; 220 IF PEEK(59468) = 14 THEN 250 230 POKE 59468, 12 : POKE 33767, 50 240 GET A$ : IF A$ = "" THEN 240 250 POKE 59468, 14 : POKE 33767, 52 260 GET A$ : IF A$ = "" THEN 260 270 GOTO 230 280 REM 290 REM ** LINE 200 FOR 80 COL. CMB’S: 300 REM 310 FOR I = 1 TO 11 : PRINT : NEXT