The following 4 part series of blogs overviews the phenomenon of masking. It is written for the musician, and not the audiologist.  The first three parts (upwards spread of masking, downwards spread of masking, and temporal masking) relate to the function and structure of the cochlea and associated neural structures, whereas the last part (phase) refers to the acoustics of any room.  Strictly speaking, phase issues are not related to “masking” in the typical sense but can be viewed as masking in a more general sense, since it can be responsible for a deletion of important information.

Despite the masking noise being low frequency (left side of the graph), its effects are felt well into the upper frequency regions due to “upwards spread of masking”. Figure courtesy of

Other than pure tones that have energy at only one frequency, all speech and all music is characterized by a relatively wide band of frequencies with time varying acoustic properties.  This is just a nice way of saying that speech and music have energy components which change from moment to moment in both their sound level and their frequencies.

When this sound energy enters the ear several things happen in a strict series of events.  High frequency sounds above 1500 Hz are enhanced by the presence of the pinna or outer ear.  This feeds into a resonance at roughly 2700 Hz of about 15-20 dB that is the result of the 25-30 mm long outer ear canal.  In the next stage, the eardrum and associated middle ear bones and structures attenuate (or lessen) the lower frequency sounds below 1000 Hz.  And to complicate things further, the middle ear stapedius reflexes attenuate sounds over about 85 -90 dB SPL.  Moving from the middle ear to the inner ear, the cochlea serves as a Fourier spectrum analyzer and breaks the sound vibrations into well-defined bands, much like the notes arrayed across the piano keyboard.  And finally, neurologically these bands of energy are transformed into electrical impulses that are routed up to the auditory cortex in the brain.

And did I say “finally”?  To make matters even more complicated, there is a feedback loop back to the cochlea which serves to enhance the amplitude of the lower level sounds.

All of this comprises a normal hearing and normally functioning hearing mechanism.

For someone with an inner hearing loss, such as due to aging or noise/music exposure, the sensitivity and other properties of the narrow band filters or piano notes, in the inner ear become altered as does the loss of the feedback loop that serves to enhance softer sounds.   These are two reasons why people with a minor amount of cochlear damage may say “Oh, I can hear you OK, its just that people mumble” and indeed for these people, speaking a bit louder or turning up the music volume slightly will substantially improve things.

An important feature of most forms of hearing loss is that the outer ear and middle ear still function normally- higher frequencies are enhanced (outer ear), and lower frequencies are attenuated (middle ear). 

In the human (and mammalian) cochlea, sound vibrations are set up as a piano keyboard with the high pitched treble notes represented near the “front end” of the cochlear spiral and the lower pitched bass notes nearer to the inside of the cochlear spiral.  This is like a piano keyboard that is backwards. 

The lower pitched bass notes near the left side of the piano keyboard are in a well-protected and enviable part of the cochlea- the nerve endings in the ear associated with these lower pitched sounds are situated in the inner most part and as such are the least prone to being damaged by a life time of loud vibrations.  In contrast, the higher pitched sounds are represented near the outer periphery and are the first to be damaged by a life time of loud sounds.

Courtesy of

It is as if, during an explosion, a person who is unfortunate enough to be sitting near a window of a house is much more susceptible to damage than those lucky souls who happen to be in the basement in a bomb shelter.

One feature of the cochlea is that higher pitched sounds need only to travel a short distance along the backwards piano keyboard and the relevant cochlear nerve endings are activated.  In contrast,  for the lower frequency bass notes, sound vibrations need to travel almost the entire length of the cochlea into the inner portions of the spiral, leaving in its wake vibrational disturbances.  It’s as if the lower bass notes disrupt the entire frequency range of the cochlea whereas the higher pitched notes have disturbances that are more localized to their frequency regions.

Another reason is that all signals have a masking pattern that is “asymmetrical”.  A signal at a certain frequency will of course mostly mask that particular frequency, but also a little bit for the lower frequency sounds and also still quite a bit for the adjacent higher frequency sounds.  That is, a 1000 Hz signal will also be picked up a bit in the channel or band for 990 Hz but a lot for the 1010 Hz channel or band.

For this reason, lower frequency bass notes tend to activate some elements of the higher frequency channels in the cochlea, whereas the opposite is not as true.  When lower pitched bass notes tend to activate some of the higher pitched channels we call this masking.

Specifically when lower pitched sound activates the higher pitched channels or filters, this is called upwards spread of masking.

This is not necessarily a bad thing.

For every sound, speech or music, that we hear, not only is there activation in the cochlea at the pitches associated with those sounds, but also lower pitched notes are included in this band of sound.  It’s as if a C on the piano also is transmitted to the brain on the same channel as the D or E above that.

When we do hear an E on the piano, it not only is made up of sounds associated with the note E but also with some of the energy that derives from the notes just below that E.

In the normally functioning ear, this is part of the beauty of music and speech- lower frequency sound energy erroneously but beneficially contributes to the perception of the note or sound that we are trying to perceive.

In an ear with a hearing loss due to aging or noise/music exposure, these lower pitched notes tend to spread up in frequency even more and in some cases cover up some of the important energy that would normally be perceived at that higher pitched note or sound.  People with this type of hearing loss may have greater than average difficulty hearing speech and (appreciating) music in noisier locations.  That is why people with cochlear (or sensory neural) hearing loss appear to have greater difficulty in noisier locations.

Too little upwards spread of masking is bad and too much upwards spread of masking is bad.  Just the right amount results in what we normally consider to be the correct perception and appreciation of the sound or music.

In part 2 of this blog series, the opposite (but not really… that’s a hint…) of upwards spread of masking will be discussed.

In part one of this blog series, the drawbacks of dropping a piezo-electric microphone were discussed. While dropping a microphone looks cool, the crystal in these microphones were very brittle and dropping these microphones would mean that they were “ex-microphones”

In more modern days, microphones tend to be either dynamic or capacitor.  And if you are over age 50, these capacitor microphones used to be called condenser microphones.

Three pin XLR connector for a capacitor microphone. Figure courtesy of

A dynamic microphone is actually quite simple in its design.  A metallic diaphragm, or a plastic diaphragm with magnetic substance on it sits in the center of a coil of wire.  The input to the microphone causes the diaphragm to move and this sets up a current in the surrounding coil of wire.  The current is sent to a pre-amplifier and is the input to the recording or PA system that is then used. This is actually something that could have been designed a million years ago if we knew about electricity back then.  There is nothing too special about a dynamic microphone.

Three pin connector of a capacitor microphone. Figures courtesy of

The diaphragm of a dynamic microphone is quite light, but relatively speaking is heavier than that of a capacitor microphone…and this has some importance for music and other quick percussive sounds.

In contrast, a capacitor microphone uses a transducer that is very much like a capacitor; hence the name.   A capacitor is a device where two parallel plates sit side by side.  The rear plate is charged and as such capacitor microphones requires a power source to keep it charged.  The front plate is a very light diaphragm that moves in synchrony with the input sound or music.   High levels of sound cause the plate to move closer to the charged rear plate and quieter levels have less diaphragm movement.  A current is transmitted from the charged rear plate that is a replica of the movement of the light microphone diaphragm.

Structure and design of a dynamic microphone. Figure courtesy of

There are two differences between a dynamic microphone and a capacitor microphone.  A capacitor microphone needs a power source and because of its lighter diaphragm, it can transduce higher levels of music with a better fidelity.  For speech and quieter non-percussive music, there should be no difference.

Typical maximum transduction limits on dynamic microphones are in the 112-115 dB SPL range and those of capacitor microphones are in the 130-135 dB SPL range.

Another difference is that a capacitor microphone, having a relatively light diaphragm, can respond quicker to a sudden percussive sound than a dynamic microphone.  For percussion the capacitor microphone diaphragm can move quickly but there can be some delay (caused by inertia) of the more massive diaphragm used in dynamic microphones.

While this is an issue with music, there is no difference for speech.  Even the most rapid speech sound such as the aspiration after a stop or an affricate, can be easily handled by a dynamic microphone.

And oh yes, why is an XLR cable connector called an XLR cable connector?

Since capacitor microphones require a power source (to keep the rear plate charged), there needs to be a power supply and this accounts for one of the three pins in the XLR connector, and the other two carry the signal.  When these first became available, the Cannon Corporation had a useful 3 pin connector in their X-series of connectors.  This explains the “X”.  The cable would frequently be pulled out of the microphone so a Latch (L) was added… and this explains the “L”.   After some use, the pins would frequently become bent and useless so the manufacturer encased the connector pin receptacle in Rubber (R) and this is where the “R” of XLR comes in.