Thursday, March 08, 2012

Counting With Letters? No Way!


Counting With Letters? No Way!

An excerpt from "The Sydnie Emails", written Jan 31, 2008
Copyright (c) 2008, Kevin Farley


When you say "count with letters too", I assume you are talking about working with digits other than 0 through 9 and that means number bases beyond 10. Recall that binary has only 2 digits, 0 and 1.
Think about it. We have 10 "numbers" because we have a base 10 number system. In English, we have assigned the "symbols" 0, 1, 2, 3, 4, 5, 6, 7, 8,  and 9 to the number positions 10^0 through 10^9 respectively. Semantically we call the symbols we assign to represent numeric quantities "numbers". But that is more of a grammar thing and not a math thing. The math thing is to call them "digits".
In math, a symbol is used to represent a quantity, an operation on quantities, unknown quantities, and properties. But that is all these symbols are, representations of a concept. Digit symbols are used to represent powers of the base of the number system.
So then grammatically in English, using our base 10 number system we only have 10 symbols for the digits 0-9. But the symbols can be anything. If you look at ancient Maya number systems, their number system was based on 20 and their "numbers" were glyphs of combinations of bars and dots. I suppose they counted on their toes too hence the base 20 system ;)
So instead of being based on powers of 10 representing digit positions of 1, 10, 100, 1000..., the Mayans numbering system was based on powers of 20 which means that the digit positions (if they had them) would be 1, 20, 400, 8000, 160000...
Now we can still count in Mayan using their glyphs. But also we can count in Mayan using English "symbols" instead of the glyphs. We can start by using the "number symbols" 0 through 9, and then (borrowing from the computing world and hexadecimal) we can start with the "letter symbols" A, B, C, etc.
Thus our Mayan digits are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, G, H, I, J
Now we also know from Deep Thought that the ultimate answer to the universe, to life, to... everything... is 42 (in decimal).
But the ultimate answer in Mayan is 22. Why?
Because 2 * 10^0 + 2 * 10^1 = 2*1 + 2*20 = 2 + 40 = 42
Now since I brought it up, lets talk about hexadecimal, which is to say the base 16 number system.
Though binary is used (conveniently enough) for computing because of the nature of electrical switching and the on/off detection of electrons in circuits, programmers rely most heavily on "hex" math. The reason is simple, strings of binary digits are simply too cumbersome to keep track of (how many 1s in a row can you look at before you lose track?). So a shorthand representation of the binary numbers is needed.
Why not just use decimal? Well for starters, 10 is not a power of 2. What do I mean by that?
Binary is based on powers of 2 and decimal is based on powers of 10. To convert between base 2 and base 10 requires some mental agility (or calculator) as the digits don't "line up". I will explain that.
If I have the decimal number 117, that is the binary number 1110101. Now I can't look at any sequence of those binary digits (bits) and "mathematically see" any digit of 117. Meaning, I can't look at the string of bits and see a substring of bits that mean 100 and a substring of bits that mean 10 and a substring of bits that mean 7.
Well technically I can look at 1110101 and see 117 because I have done this for over 2 decades, but that is an entirely different matter.
So then you want to use a numbering system that shortens the representation of binary numbers but is readily convertible. As a programmer, I want to look at the number and be able to immediately see the bits underneath.
Now if I use a number system that is in itself a power of 2 at some higher order, I can achieve that because the basis of the number system is still 2, but each digit represents larger values of powers of 2.
Early in the days of computing, programmers started using "octal" representation, which is based on the base 8 number system. In octal you only have the digits 0 through 7, because remember, you do not have the base number in your digit set.
Using octal that decimal number 117 becomes 0165 (and 01110101  in binary) . I prepended the number with a 0 because that is standard practice in computing to distinguish a number as being octal: it will have the 0 in front of the number which is not normally done for decimal numbers.
So if I look at the digits of 0165 I see 5 which is "101" in binary, 6 which is "110" in binary and 1 which is "001"in binary. Thus we have:
 1   6   5
001 110 101

See how you can visualize the bits? Each octal digit represents a string of 3 bits. I can look at the octal digit and I only have to do the bit conversion for 8 values total, which is represented in 3 bits. When you use decimal, the base 10 digits don't allow such simple visualization. You can't simply write the decimal digits 117 and have the underlying binary pattern fall out sequentially.
To see the failure of decimal, just look at the lowest digit of 117, which is 7. In binary, the 7 is represented as 111 because that is 1*2^0 + 1*2^1 + 1*2^2 = 1*1 + 1*2 + 1*4 = 7. But clearly the bottom binary digits are 101 and not the expected 111. This is because decimal is not a multiple of a power of 2 (the basis of binary).
If we were to simply use the decimal digits like I did the octal digits we would have the following:
 1   1   7
001 001 111 <<-- WRONG!

And that would actually be the value 79 in decimal, not 117. So clearly, decimal does not lend itself readily to binary visualization.
While octal is all well and good and an improvement on handling binary numbers, we want it still more compact and yet allow us to visualize the bits as in octal. So if we look to the next power of 2, we have 16 (we went from 2, to 8 - skipping 4, and the next is 16). That leads us to hex numbers in base 16.
So to use hex, I need 16 digits. English only has 10 "numbers", so we proceed on to the letters like with the Mayan example. So my base digit set in hex is:
0 1 2 3 4 5 6 7 8 9 A B C D E F

Which yields decimal values 0 through 15 inclusive.
Now, to distinguish a number in hex from those in octal and decimal, programmers typically prefix the number with "0x". This is the magic sign to tell us that we are looking at a hex number.
Now back to the decimal value 117. When we convert that number to hex we get 0x75 because 7 * 16^1 + 5 * 16^0 = 7*16 + 5*1 = 112 + 5 = 117.
Now remember the visualization thing? The hex digit 7 is "0111" in binary and the hex digit 5 is "0101" in binary. Thus we have:
  7   5
0111 0101

Now see again how we can visualize the bits?
So back to the original question of using letters, lets look at a much larger number, say 0x7EA6CF82.
In binary that is: 1111110101001101100111110000010
In octal that is: 017651547602
In decimal that is: 2124861314
In hex that is: 0x7EA6CF82
In mayan that is: 0m1D407D5E

Now for the hex visualization:
  7    E    A    6    C    F    8    2
0111 1110 1010 0110 1100 1111 1000 0010

With each hex digit, the programmer can "see" the underlying bit patterns. As a programmer, we instinctively know (now after doing it a while) that "F" is 15 and "C" is 12. We also know that 15 is "1111" and 12 is "1100".
Now the question that may have popped into thought: but who uses numbers that big?
Programmers do all the time. Its not the "data" that is usually that large, its memory addresses that are that large.
A regular PC has anywhere from 128 MB to 1 GB or more of RAM. A MB of RAM is actually 1048576 bytes. This is because 1 kilobyte (KB) is 1024 bytes, and a megabyte (MB) is 1024 KB. So 1024 * 1024 = 1048576. So then a gigabyte (GB) of RAM is 1024 MB or 1073741824 bytes.
Why 1024 and not 1000? Because 1024 is a power of 2 (it is 2^10 to be specific). Remember, computing uses a base 2 number system at its lowest level, and 1000 is  decimal concept. But since 1024 is almost 1000, we use the "kilo" prefix and instead of 1 million we have a little over that and use the "mega" prefix. The same for "giga" where a GB of RAM is actually more than 1 billion bytes.
So if you are talking about memory, the prefix kilo means 1024 and mega means 1024*1024. But when you are talking about CPU clock speed of a computer, that is a different matter. A 500 MHz CPU has a clock speed of 500 million cycles per second where the M for mega means 1,000,000. Also a 3 GHz processor is running at 3 billion cycles per second where G for giga means 1,000,000,000.

As a side note, disc drive manufacturers do not use 1024 as the order of magnitude, but they use the smaller 1000 instead. So that 60 GB hard drive is smaller than 60 GB of RAM because 60 * 1000 * 1000 * 1000 is less than 60 * 1024 * 1024 *1024.
Why do they do that? Marketing. Almost a bait and switch and most people don't know the difference. But in reality, a 100 GB hard disc drive has 7.3 GB less than one would think (100*1073741824 - 100*1000000000 = 7374182400). But I digress..



RAM is random access memory, and to use it, each byte must be individually accessible. To access memory, each byte has a unique address. That address is simply a one-up number. So the very first byte of RAM has memory address 0 and the last byte of a 1 GB RAM chip has memory address 1073741823.
That is supposed to be 1 less than the total locations because remember, despite how we all learned to count as children, the first of anything mathematically is really item 0, not item 1.
Another piece of this is that nearly all personal computers today use virtual memory, which is a really long discussion that is beyond what you need to get into at this time -- or ever ;)
Simply put, virtual memory means the computer can act like it has 4 GB or RAM even if it only has 64 MB, it just uses a hard disc to swap in and out sections of RAM.
To get addresses for 4 GB you have numbers in the range from 0 to 4294967295.
And because programmers are always looking at (virtual) memory addresses, we always, daily, perpetually, and in all other ways, have to deal with really really large numbers.
So that last virtual memory location, 4294967295, is 11111111111111111111111111111111 in binary, 037777777777 in octal, and 0xFFFFFFFF in hex.
And since each digit of the hex string is exactly represented by 4 binary digits (bits), the hex version is the optimal way of looking at really really large numbers in computing.
In summary, the point of having letters is just to get more digits than 0 through 9 which are need for number systems beyond base 10.
Now I am sure that all of this is well beyond your basic question. But I am the computer guy and the math guy and since I like this stuff, I like to explain it. Thanks for putting up with this long-winded explanation.



No comments: