I Could be TOTALLY Wrong, but ...: The Basic Idea of Lossy Compression

Since around 1980, when the Internet followed closely on the heels of personal computers, we live in a world in which music and images are sent back and forth in highly compressed form.

How is a text file stored?

A text file basically consists of characters. Long ago, the computer folks got together and decided that each character on a typewriter would be represented by a number. Here are a few of them:

“A”=65,	“B”=66,	“C”=67,	“D”=68,	“E”=69,	“F”=70,	“G”=71,	“H”=72,
“I”=73,	“J”=74,	“K”=75,	“L”=76,	“M”=77,	“N”=78,	“O”=79,	“P”=80,
		... and	so	on ...			“X”=88,
“Y”=89,	“Z”=90,

Which is all the capital letters. After this come a few odd things, namely "[", "\" and so on. Presently, we come to the lower-case letters:

“a”=97,	“b”=”98”,	...				...	“x”=120,
“y”=121,	“z”=122,	“{“=123,	“\|”=124,	...

and so on; you get the idea. The entire correspondence can be found here. So if you made a text file that contained the single word zab, it would contain essentially the numbers 122, 97, 98, appropriately organized so that there is absolutely no mistaking it for 1269798, so don't worry. Modern text files are actually word-processor files, so they contain, in addition, italics information, boldface information, etc, etc, so they're bigger. (They also contain a lot of information of interest mainly to Microsoft Corporation, which adds to their size a tiny bit, e.g. This file was created in Word; aren't we awesome?)

Lossless Compression

Even text files can be made smaller by studying them carefully. This is what happens when you use something like WinZip to compress your file. Such utilities use tricks to represent the most common letter in your text by very short codes, the next most common letter using a slightly longer code, and so on. Working backwards, your file can be expanded to be exactly the way it was. This sort of thing is called lossless compression, because the document can be reconstructed exactly at the receiving end. This procedure, called encoding, depends on creating very compact lists of numbers to represent lists of number that are not so compact.

Make no mistake: the procedure is highly complex, and consists of multiple methods, each of which makes the resulting file smaller and smaller.

GIF Files

Compuserve, a computer company, invented a method for compressing certain types of pictures using a lossless compression; this method is still widely used today. The compressed files have the extension "gif". Here's essentially how it's done.

(1) A picture is first processed so that it has only 128 colors. Many pictures look okay when the number of colors is reduced; for instance a chart, or diagram will probably only contain 5 or 6 colors to begin with. The colors are numbered 0, 1, 2, and so on, to 127. This is called the palette for the picture. (It's a great feature that each picture's palette is individualized. But still, the restriction of the number of colors is a sacrifice.)

(2) Next, the picture is divided into little 8x8 pixel blocks, and each block is encoded individually. The color number is assigned to each of the 64 pixels. Now it boils down to compressing these 64 numbers as much as possible. Suppose the rectangle of numbers looks like this:

11122233
11223333
12233232
12333343
13344434
...
and so on.

(3) At this point, a clever trick called run-length encoding is used. The numbers are stretched out in a single long line, starting at the top left, and going in a diagonal pattern. I'm not sure, but it could look something like this:

Following the arrows, you get the number list:
1,1,1,1,1,1,2,2,2,1,2,2,2,2,1,2,3,3,3,3,...
As you can see, this method of scanning the picture optimizes the likelihood of long strings of repeated colors. Such a list of numbers would be encoded something like:

1(6),2(3),1,2(4),1,3(4)

and so on. This means: "six 1's, followed by three 2's, followed by a single 1, then four 2's ..."

There is obviously great potential to get a very small file indeed, if there are large stretches all of a single color. If you ever had to store a GIF file of mostly white with a few black dots, you would have found it miniscule!

Lossy Compression of a list of numbers

In "lossy" compression (which means there is some loss; a typical computer type term...) the picture is approximated. It is such a good approximation that nobody really minds.

First, let's see how a single row of 16 numbers can be approximated. (Remember, this is a very rough explanation for laymen.)

Let's illustrate with a strip from a black-and-white photo. The actual method is 2-dimensional, and not 1-dimensional, but the principles are very similar. Suppose the strip of image resulted in

Remember, these number are supposed to represent colors, which usually vary smoothly in a photograph, for instance. Since these are single numbers, rather than triples, they represent a grayscale image. (Each pixel of a color image is represented by three numbers. But you can easily imagine that a color picture can be split into three grayscale pictures. If I show you how a grayscale image can be stored lossily, you can easily see how the full color picture is stored.) Just to get a feel for what these sixteen numbers might mean, here is a chart, where each number is represented by the height of a bar, instead of a pixel:

Now, each number is actually stored as bits; that is, a number consisting of a row of zeroes and ones only. Just keep this in mind.

First, we store the average of the entire list of numbers, which is ... 34, to the nearest integer. So we put that in as our first number, in storing the “picture”.

34.

Now, we subtract 34 from each number, which gives us a new list:

-33, -31, -29, -26, -23, -19, -15, -10, -4, 2, 9, 17, 25, 34, 43, 53.

Let’s average the left 8 numbers, and the right 8 numbers: we get -23 (roughly), and 23 (roughly). We store these, as well. So now, we have:

34, -23, 23.

Now, we subtract a -23 from the first 8 numbers, and 23 from the second 8 numbers. We get 16 much smaller numbers:

-10, -8, -6, -3, 0, 4, 8, 13, -26, -20, -13, -5, 3, 12, 21, 31

We average the numbers in groups of 4: we get (roughly) -7, 6, -16, and 17. We store these, too:

34, -23, 23, -7, 6, -16, and 17.

We subtract -7 from the first four numbers, 6 from the next four, and so on. We end up with even smaller numbers than before:

-3, -1, 1, 4, -6, -2, 2, 7, -10, -4, 3, 11, -14, -5, 4, 14

At this stage, we can simply stop. If we restore the entire list of 16 numbers, pretending that the list contained all zeros at this stage, you know we're only going to be off by, well, 14, at the most! We could do another step, and be off by 7 at the most.

The fun part would be to compare the "picture" we assemble using these numbers with the original list.

Start with all zeros, and add -7 to the first four, 6 to the next four, -16 to the next four, and 17 to the last four, we get
-7, -7, -7, -7, 6, 6, 6, 6, -16, -16, -16, -16, -16, 17, 17, 17, 17.

Now we add -23 to the first 8, and 23 to the last eight:
-30, -30, -30, -30, -17, -17, -17, -17, 7, 7, 7, 7, 40, 40, 40, 40,

Finally we add 34 to every number:
4, 4, 4, 4, 17, 17, 17, 17, 41, 41, 41, 41, 74, 74, 74, 74

Compare with the original file:
1, 3, 5, 8, 11, 15, 19, 24, 30, 36, 43, 51, 59, 68, 77, 87

As you can see, it is horrible! I was feeling pretty depressed, thinking that this was a terrible example to show you, but this file of 16 numbers is just too small for any lossy compression to be illustrated. I started again with 32 numbers, got an average, got two averages, got four averages, and got 8 averages, and threw out everything after that. (In other words, I pretended that after I took the last 8 averages, that I was left with all zeroes.)

Here is a graph of the original file (in blue), and the compressed file (in red). As you can see, the compressed data tends to look blocky, just as you've noticed JPEG files do.

Finally, I explain why this procedure results in even greater space savings than you might imagine.

Suppose the color intensity of the original picture varies from a pixel that's pure black (0) to one that is pure white (255). All the pixels must have values between these two extremes. The average, too, must lie between these two numbers. These sorts of numbers take 8 bits (of zeros and oness) to express in binary.

But, usually, the average is about 128. If a picture is such that the average value of its pixels is about 128, it will compress well, otherwise it will compress poorly.

Once you subtract the average out, the remaining list of numbers must be typically between -128 and 127, which can be represented by 8 bits, again.

Once the second set of averages are subtracted out, the numbers will lie between -64 and 64, which only need 7 bits! The next set of averages will lie between -32 and 32, which can be represented by 6 bits! Once you get to the point where you think that the itty bitty leftovers don't need to be stored, you don't use any additional space at all.

If your picture had 32 pixels, uncompressed, you would need 32×8 = 256 bits to store it.

Compressed using this primitive method described above, the first average is 8 bits, the next two averages are 7+7=14 bits;
the next four averages are 6+6+6+6=24 bits, and the next eight averages are 5+5+5+5+5+5+5+5=40 bits. So the whole package would only take up 126 bits. Additionally, it is possible to cheat by only storing those averages approximately! It is done using binary digits, but it is comparable to storing highly-rounded versions of the averages, which take even less space. It is done cautiously, based on how the eye perceives the image, and whether a minor misrepresentation of an average color of a block, or sub-block, makes a big difference to the appearance of the image.

With larger files, the savings are enormously greater; with photographs, if they vary smoothly across the picture, the savings will be great, but if there is a lot of detail, you just can't throw out the last remaining residues. More and more averages need to be retained, which all contribute to the size of the file.

The actual method, though basically similar to what was shown above, uses sines and cosines and trigonometry to make the calculations easier. This article in Wikipedia might make more sense, now that you have read this introduction. Even if you just look at some of the pictures, you could get a handle on how it's done, completely disregarding the mathematics.

Arch

I Could be TOTALLY Wrong, but ...

Friday, September 26, 2014

The Basic Idea of Lossy Compression

No comments:

Final Jeopardy

The Classical Music Archives

Strongbad!

My Blog List

Followers