Archive for April, 2012

Words

Sunday, April 29th, 2012

Earlier this year, someone asked me a question about document editing, and in a split-second, my brain did one of those weird image fly-by things through 40 years of technology. “Hey, I could write a little posting about this,” I thought. So here I am eight weeks later, with a half-finished combination of geek memoir and technical explanation, which turns out to be more than a thousand words. So I’m posting this half, with noble intentions of finishing the second half in a much shorter interval.

Like my other quasi-historical postings (computers, lasers), the usual caveats apply – my comments are based on my memories of my perceptions, which, in turn, are the product of my environment. Oh, I think most of it is pretty accurate. I just feel obligated to warn you, in the spirit of truth.

Bits

Classical computer discussions usually begin with a thing called a “bit”, short for binary digit. Computers started with binary data because it is easier for a computer to do a simple thing many times real fast than to try to do slower, more complicated operations, like decimal math. So, in the beginning was the bit. Mathematically, one or zero. Conceptually, there or not-there. Switchwise, on or off. Magnetically, flux or no-flux. Optically, light or dark. In a punch card or paper tape, hole or not-hole. On an electrical conductor, voltage or not-voltage. Bits can be represented in many media.

But what can you do with a bit?

For our purposes, with a little thought and effort, you can make a series of bits mean whatever you want. At first bits were used to represent numbers as in this table…

Decimal Binary
1 1
2 10
3 11
4 100

… and then characters

A 10000
B 10001
C 10010

… and with the addition of the space character (” “), words, and with the “new line” character, paragraphs.

(The image to the right shows a binary rendition of the preceding two lines. Optional exercise for the reader: Why don’t the bold words show up differently?)

So, bits can be used to represent Information.

But to share your information with another person or computing machine, you must both agree on the meaning of the strings of bits. Just like a secret code, except not secret in this case.

ASCII

In a giant step for computer geekdom during the 1960s, a national standards body created and approved the American Standard Code for Information InterchangeASCII – usually pronounced “ass-key”. It is probably unnecessary to point out that ASCII focused on the English language as practiced in the USA. Since those early provincial days, the computing infrastructure has expanded to support many languages, not only those that use a similar alphabet, but also languages that use different character sets, and even those that read right-to-left. Commerce is a great equalizer. But for simplicity, I am going to stick with American English in this post.

Built on the technology of the telegraph, then the teletype, ASCII is an under-appreciated standard (IMNSHO) that was a crucial enabler for the rapid spread of computer technology. (Other similar under-appreciated technologies include the RS-232 serial interface and the Hayes Modem command set.)

In the early days, computer memory and data transmission were scarce and expensive, so the first character codes used as few as 5 and 6 bits, which means you could only represent 32 or 64 symbols. Eventually 7-bit ASCII became the accepted standard. Seven bits fit nicely on the 7-channel magnetic tape in use at the time, and the 8th bit available in most computer memory systems was used as a parity bit, a rudimentary way to check for errors in data storage or transmission.

Seven bits can be used to represent 96 symbols, including the 52 letters of the alphabet (upper and lower case), ten digits, miscellaneous mathematical and grammatical symbols, plus the critically important non-ink-consuming characters (space, tab, end-of-line), one audible character (BEL, which sounded a tone or bell), and some miscellaneous symbols used for data transmission, of which my two favorites are ACK and NAK (acknowledge and not-acknowledge).

Plain text

So now we have a set of agreed-upon symbols that can be used to represent information. These bit-based information elements can be stored in computer memory, recorded onto magnetic media, punched into paper tape, and transmitted electrically. But not yet easily seen. (Some of the early computers featured consoles that displayed the binary contents of selected storage registers. A person who could read and interpret those was truly a geek’s geek, or perhaps a mutant.)

Teletype Model 33 ASR

For reading purposes, we need a visual representation of the characters, which we now call a font, derived, interestingly, from fondue, because the original printer’s fonts were made of molten metal. (See font.) Over the centuries, monks, calligraphers and designers of moveable type have given us a variety of practical and artistic letter renditions to choose from.

But the early efforts were much more utilitarian. Original output devices, such as the Teletype Model 33, printed on paper. The “font” was determined by the typeface molded into the metal print head. The teletype “font” was selected for readability on inexpensive paper using an inked cloth ribbon that probably should have been replaced months ago.

Sample Teletype Output

Later, so-called “dot matrix” printers were developed that composed letters from, well, a matrix of dots, but the font was still preset within the printer.

The purpose of these printers was to provide the user with a visual representation of the computer’s output. As a side effect, the paper copy provided a backup in case the computer lost information (which they did quite often), and provided a means of sharing information. Remember, early computers were isolated machines, with no e-mail, internet connections, or even removable storage. Data was input by typing, and output by printers to volumes and volumes of paper.

An ecological and economical advance was the cathode-ray terminal (CRT), which used the same technology as the television to represent computer characters on a glass screen by turning a beam of electrons on and off as they swept across a phosphorescent layer. The characters usually glowed in an oddly-appealing orange or green hue, but the size and shape of the characters was preset in the terminal.

These terminals followed the lead of the teletype, and presented character data on a screen divided into 25 lines which were typically 80 characters wide. And the font? A matrix of electronic dots.

Plain text

ll of the information, systems, and devices mentioned so far used “plain” ASCII text – there were no real options for bold, italic, or underlined characters, although some clever programmers figured out how to create a slight bold effect by printing a word, then backspacing the teletype head back to the beginning of the word, and reprinting it. There were no font options unless the printer had multiple fonts designed into its circuits. Even then, changing fonts required a manual selection process.

This is a good time to note that the fonts used on computers, like those on typewriters, were all non-proportional – a skinny lower-case “i” occupied the same horizontal space as a fat capital “W”. The complexities, and benefits, of proportional spacing, ligatures (which, for example, join the f with the i) and kerning, were the domain of typesetters in the world of printing. Kerning, by the way, is the precise spacing of proportional letters to produce a pleasing visual effect, most obvious in large newspaper headlines, and most often violated in amateur signs, hand-made posters, and ransom notes.

For simplicity and predictability, early computing, display, and printing devices used the much simpler non-proportional spacing. This is why you could state unequivocally that there were, say, 80 characters in a line – the printer was mechanically indexed to space the letters that way.

Plain text is exactly that – a numerical representation of characters used to create language. Plain.

Editing

So now we have a character set, and a means of storing, displaying, and printing the characters. We now need a tool to enter and edit our strings of characters. Early typewriters allowed you to enter characters, but there was no way to edit the text, leading eventually to the invention of Wite Out, and also setting the stage for several generations of jokes about Wite Out on computer monitors.

Fortunately, since computing systems were used to write programs, they already included text editors, and these could be used to edit words, paragraphs, and entire treatises, although the appearance of the output was still very dependent on the device used to render the text. Early text editors included the tersely-named ed, ex, and vi, followed by EMACS (my personal favorite).

So, armed with a character set, a text editor, and a computing and storage device, an enterprising writer can record and edit thoughts, ideas, and information.

Whew

And for now, we shall leave our enterprising writer in plain-text-only mode. That’s really not such a terrible place to be – there is really quite a lot one can accomplish with plain text.

Next time, we’ll figure out how to get from plain text to the feature-rich word processing tools we enjoy, and abuse, today.