, , , , , , ,

“More bullshit has been written about entropy than about any other physical quantity.” David Beeman

The entropy theory of information is still quite new to me, having heard it briefly explained several weeks ago. It has something to do with a theory developed by a bloke called Claude Shannon back in the 1940s when information was still an undefined ‘thing’ communicated, stored and processed. Shannon came up with a theory that gave a definition to the word ‘information’, which is basically the change between a state of uncertainty to a state of certainty. The greater this change, the more information has been communicated. If the receiver knew in advance what a message is, little or no information is carried.

Entropy itself is a measure of disorder, and the overall entropy in the universe is steadily increasing as everything moves from a state of order to a state of chaos. It’s got something to do with the Second Law of Thermodynamics, and perhaps it’s the reason we perceive time as moving in only one direction. Information is about reducing entropy, so we can make deductions about something with a higher degree of certainty.
One of the ways we can do this is to encode it with the minimum number of bits required to represent a particular state from all possible ones. For example the colour of a given square on a chess board will be either black or white, so there’s just one bit of entropy – 0 or 1. The position of something on a chess board with 64 squares could be represented with 64 bits, but if we used the binary number system we can reduce this to just 7 bits, and communicate this more effectively. In another example, the next digit in the number 1.3333… would have zero bits of entropy, because we know for certain what the next digit will be.
This is roughly how information is encoded and exchanged between two points on a very basic Shannon-Weaver network. The subject of entropy and information theory is comprehensive and heavy on mathematics, but the following are useful formulae:

Shannon entropy is calculated by
H = log(nP), where nP is the number of equally possible messages.

The amount of information in the message can be determined by
I = -log(1/nM)
I = log(nM), where nS is the number of equally probable messages with the same number of bits

There’s a disadvantage to using the minimum number of bits, as redundancy provides a level of fault tolerance in a message. In a similar way, redundant characters in the English language, such as vowels, aren’t theoretically needed for communication, but a single mis-spelled character would render a word unreadable. Redundant characters enable us to guess what the most likely word was, and correct the error accordingly. For the same reason, compressed data, which has the redudant bits removed, is affected by errors to a higher degree.

Entropy is also important to cryptography, as its effectiveness depends on the amount of uncertainty. A 128-bit key length has a higher degree of entropy than a 64-bit one, and so the key is much harder to deduce. In contrast, if we knew the key for a five-character code lock was an English word, and there were roughly 128 possible five-letter words, there would be less than 8 bits of entropy.