There are three components to any sound: frequency, amplitude, and
phase. These three factors completely define any and all sounds.
extremely complicated mechanical,
biological, and neurological process.
This is seen when parts of this system
are damaged. (A person may have
perfect hearing but not be able to
understand speech, for example.)
The best estimate is that hearing is
a constructive process. That is, the brain
takes data and applies rules and functions to that data to build a representation of the sonic world. These rules and
functions are unbelievably subtle and
complex. Mother Nature has spent
millions of years developing the process
of hearing. Unfortunately, she didn’t
leave any documentation behind. (One
of the more unusual aspects of hearing
is “objective tinnitus.” Tinnitus is a
ringing or noise heard that has no
exterior source. It’s fairly common.
Objective tinnitus is a ringing or noise
that can actually be heard coming from
the patient’s ear by a second person.)
Sound localization is usually
attributed to two factors as they apply
to both ears: phase and loudness.
Localization is poor in humans and is
usually limited to a “precision” of 10
degrees of angle or more. (Note,
loudness and pitch are the perceptions
associated with amplitude and frequency, respectively.) Higher frequency sounds are attenuated by the head
so there is an amplitude reduction
heard by the ear that is in the sound
shadow. This effect starts around 100
Hz and improves at higher frequencies.
Phase, as noted above, is related to
the time difference of when the sound
reaches the ear. Localization of low-frequency sounds is usually attributed
to binaural phase because the shadow
effect doesn’t work well at low frequencies. Phase is thought to contribute
little to sound localization at 10,000 Hz
but it increases at lower frequencies.
From about 1,000 Hz and below, most
localization is associated with phase.
It is useful to compare the ear’s
sensitivity to differences in loudness,
pitch, and binaural phase. (Note that
normal hearing spans 16 Hz to about
20,000 Hz with a maximum sensitivity
74 May 2007
or threshold of hearing starting at 0 dB
and reaching 140 dB, where sounds
are painful.) Most people can hear a
difference if the amplitude of a sine
wave changes by about 0.25 dB or
about 3% under ideal conditions. This
is quite variable depending upon the
frequency and initial amplitude of the
sound. But 3% is a fairly large change.
It is a similar story for frequency. In
this case, most people can hear a
frequency shift of about 0.2% (also
variable depending on frequency and
amplitude). This is better than the
amplitude sensitivity but it is still not
all that impressive.
Binaural phase is very different. If
two clicks are presented at the same
time through headphones, the sound
appears localized in the middle of the
head. This is not surprising to anyone
who has ever used headphones. By
delaying one click, the sound can be
made to move from side to side. This
is also not too surprising. What is
surprising is that if the click is delayed
“by as little as 0.000012 seconds, the
sound image moves towards the ear
which received the click first” (Ref 1).
This is only 12 microseconds! The fact
that any organic system can detect a
12 microsecond difference is nearly
beyond belief.
This ultra-sensitivity to phase
strongly suggests that it is an important
aspect in hearing. It seems unlikely that
such a system developed by chance.
However, that was binaural phase
sensitivity. Is there something similar
for monaural phase? Initially, the
answer seems to be no because
delaying an ordinary sound by about
40 ms is not perceived as an echo.
Nevertheless, this initial supposition
turns out not to be the case.
In 1978, for my Master’s thesis at
the University of Buffalo, I developed
a simple procedure that converted
any sound into a series of pulses
(somewhat similar to a nerve cell).
This was a monaural experiment using
one small open-air speaker in place
of headphones. The pulses were
identical in amplitude and width
(frequency components). Therefore,
no information could be conveyed by
the components of amplitude or
frequency. (Information requires a
change in the medium in order to be
transmitted. This is defined as bandwidth. Any non-varying medium, like a
DC voltage, has a bandwidth of zero
and cannot carry any information.)
The only parameter that varied
was the time interval between the
pulses. This is defined as phase.
Simply, the machine took sound and
removed all of the frequency and
amplitude parameters while retaining
only the phase parameter. Phase precision was controlled by adjusting the
minimum time allowable between the
pulses. A short period between the
pulses permitted a greater possible
number of pulses per second (
depending upon the input signal) and vice
versa. The results were consistent with
the binaural phase measurements.
In this case, the intelligibility of
speech was measured to determine
how much information could be carried by phase. This value turned out to
be 100% with high pulse rates. While
the elimination of amplitude and
frequency components distorted the
speech, the intelligibility was identical
to the control. As the pulse rate was
lowered, the intelligibility fell (as did
the bandwidth). This allowed the comparison of intelligibility to pulse rate.
The result was that there was a
1% change in intelligibility with a 14
microsecond change in the phase
(delay between pulses). It would seem
to support the notion that phase is
important in the hearing and speech
perception processes. (Note that 14
microseconds corresponds to an
acoustic path length of about 0.185”.
A sine wave with a period of 14
microseconds is equivalent to a signal
with a frequency of over 71,000 Hz.)
Speakers and Phase
So what does all this have to do