Storing Text Digitally
An excerpt from "The Sydnie Emails", written Feb 4, 2008Copyright (c) 2008, Kevin Farley
Ok, think about this, what you are calling "digits" and "letters" are nothing more than symbols that are used to represent concepts of numbers and language elements. The "numbers" or "digits" are the language-dependent symbols that are associated with quantity and counting while the "letters" are the language-dependent symbols that are associated with language utterances.
So when you think about it globally, there is nothing intrinsic to the letter "k" that denotes a "k" sound. We, as English speakers associate that letter to the beginning sound made when saying "k". And similarly we associate the letter "w" to a sound that has no relationship to its name, but merely represents a known sound. So then the "letters" of the English alphabet are used to represent some language-dependent sound.
And when we want to store some information in a computer, we need to be able to associate the language-dependent symbols with some computer-usable patterns of 0s and 1s. And really, that is all that is done. Each letter of the alphabet, some punctuation, the numbers, and a few other "characters" are simply associated with binary values in the computer.
Think of it like a look up table. You want to store the symbol "k" so you need to define a unique value for that symbol so that every time you see it, it will ever only mean "k". Do the same for each letter of the alphabet, including both upper case and lower case characters (think about it, capitalization of a letter may not change the sound it makes, but it is uniquely different in meaning and what it represents).
The result is a map where you can look up a symbol (character/letter) to get its representation, or using the value of the representation, you can look up the symbol.
Now the most widespread of such mappings is the ASCII code. This is the world's most recognized standardized character map, but it only maps English characters (gee, I wonder who invented the entire computing industry). This mapping code has been around a long time. Google it sometime if you are interested.
The basic ASCII chart assigns 127 letters, numbers, punctuation, and some special characters to the values 0 through 127. There is an extended character set that uses the values 128 through 255 but that is another matter altogether. Also because they wanted to keep the range of values for characters to something that can be stored in a single "byte" (8 binary digits/bits), all character mappings must be less than or equal to 255 which is the maximum value you can store in 8 binary digits (equivalent to 11111111).
Note: The Unicode character mapping set contains what is known as "wide characters", meaning they can be larger than a single byte. Most often they are two bytes wide which allows up to 65536 unique values as opposed to the 256 unique values used by the single byte wide ASCII characters. Some Unicode character sets are 4 bytes wide.
The first 32 values (literally 0, 1, 2... 31) are assigned to "control characters". Do you know what happens every time you press "control-c" to copy something? The keyboard generates a key scan code that is translated into the numeric value 3 by the keyboard device driver. The software interprets this value 3 to mean "copy the highlighted text to the copy buffer/clipboard". There is nothing magic about "control-c", its the mapping that makes the magic.
So starting with numeric value 32 (0x20) through 127 (0x7f) you have your "printable" characters. They are called printable because they result in some character you can see (with the exception of space and delete which are technically not seen). The base-10 digits, starting from 0, are mapped to values 48 (0x30) through 57 (0x39). Upper case letters, starting from 'A', are mapped to values 65 (0x41) through 90 (0x5a). The lower case letters from 'a' are mapped to 97 (0x61) through 122 (0x7a).
So then, when the name "Sydnie Pye" is stored in the computer it is actually stored as a sequence of numeric values in binary digits. So it is actually stored like the following
01010011 <-- S
01111001 <-- y
01100100 <-- d
01101110 <-- n
01101001 <-- i
01100101 <-- e
00100000 <-- space
01010000 <-- P
01111001 <-- y
01100101 <-- e
Alternatively, I could have simply written:
0x53 0x79 0x64 0x6e 0x69 0x65 0x20 0x50 0x79 0x65
By standardizing on the way the characters (letters) are represented in the computer, all the computers in the world can accurately store and recall that name correctly.
There are other mappings of an alphabet and characters to numeric values. One of the older ones is EBCDIC, an old IBM standard still in use to some extent. The new modern standard starting to be adopted globally is called Unicode. In Unicode, characters are not a single byte, but instead each character requires from 1 to 4 bytes depending on the specific encoding, and there are several.
This was needed because some of the Asian alphabets (most notably Kanji) have no simple equivalents to our English letters. Also, because ASCII is tuned for English and related languages (most European languages but not Russian and Russian derivatives), its not suitable for encoding all the intricacies of more complex alphabets.
So then the answer is "yes, binary numbers are used to store textual information in a computer."
I do not say "letters" because that is a language-dependent attribute. Asian alphabets like Kanji do not have any letters, they have glpyhs. And technically speaking, the English alphabet has glpyhs too, we just call them letters.