Let’s Get Technical
by James Antonakos
Electronic Theories and Applications From A to Z
Let’s Get Technical
Getting More for Less —
A First Look at Digital Data Compression
Data compression has been
around for a long time. If
you’ve heard the expression, “A picture is worth a thousand
words,” then you are familiar with the
basic principle of data compression:
replacing one set of symbols with
another, smaller set. A high resolution photograph of an object is a
better description than a mere
thousand words can evoke.
Today — with practically everything represented by 1s and 0s in
some form or another — there is a
great need for digital data
compression. Why? The answer is
time and space. Oh, the Universe is
pretty old and really big, but we still
want to transmit data from one
place to another quickly and we
want to use as little storage space
as possible.
If you are downloading a file, you
want the download to take place as
fast as possible. The time required to
download the file depends — among
other things — on the Internet connection speed and the size of the file.
A file size of 2.5 MB will require more
time to download than a 245 KB file.
What if, however, there was a
way to convert the 2.5 MB file into a
245 KB file? Then, the transfer would
take place quickly and the required
storage space would be significantly
reduced. Even better, after the file is
downloaded, the compressed data
can be expanded back into the
original 2.5 MB.
First, let us think about logical
compression. This has to do with the
way we represent the data being
stored before compression. One
method may be better than another,
in terms of its built-in logical
compression. For example, suppose
a programmer has written code to
write sensor data to an output file.
The programmer used a simple loop
of code to write 16-bit ASCII strings
of “0” characters and “1” characters,
such as:
1001001111101010
which requires 16 bytes of file space.
If the programmer was to spend
a little more time in the output loop,
then the 16-bit string could be
converted into a four-character
hexadecimal string, such as:
93EA
Figure 1. A portion of the Morse Code alphabet.
F
o
r
E
l
e
c
t
r
o
n
i
c
s
NUTS & VOLTS
E
v
e
r
y
t
h
i
n
g
saving 12 bytes of
file storage space,
compressing the output file by 75%, and
giving us 4:1 compression. The price
we paid was to spend
more time on the
processing. With the
fast computers avail-
able today, we can afford the
processing time and make use of
very complex techniques to obtain
good compression rates.
As we just saw, logical
compression leads to physical
compression. A good example of a
logically-compressed code is Morse
Code. This code was invented by
Samuel Morse in 1844 for use with
telegraphs. Letters, digits, and
punctuation symbols are defined by
dots and dashes in the code. Figure
1 shows the Morse Code for the
letters A to Z.
Notice that the most frequently-used letter — E — is represented by a
single dot. Other popular letters,
such as A and I, require only two
symbols (dot-dash and dot-dot,
respectively). Letters not frequently
used, such as X, Q, and Z, contain
four symbols each in their codes.
Samuel Morse built data compression into his code by designing it so
that frequently used letters required
the fewest symbols and, thus, were
transmitted more quickly than the
symbols for less frequently used
letters.
You can experiment with digital
data compression yourself by
working through the following steps
to design your own binary version of
the Morse Code. In the example
described in these steps, a tone
generator and speaker are controlled
by a single bit. When the bit is high,
the speaker emits a tone. When the
bit is low, the speaker is quiet.
22
1. Assign binary patterns to dots and
dashes. Since a dash is longer than a
SEPTEMBER 2004