Masking and Music – Downwards spread of masking – part 2

Marshall Chasin
March 28, 2017

The following 4 part series of posts, overviewing the phenomenon of masking, is written for the musician, and not the audiologist.  The first three parts (upwards spread of masking, downwards spread of masking, and temporal masking) relate to the function and structure of the cochlea and associated neural structures, whereas the last part (phase) refers to the acoustics of any room.  Strictly speaking, phase issues are not related to “masking” in the typical sense but can be viewed as masking in a more general sense, since it can be responsible for a deletion of important information. In part 1, the characteristics of upwards spread of masking were discussed.  In part 2, the phenomenon of downwards spread of masking is discussed.

 The 1950s and 1960s saw a lot of basic research into the characteristics of masking of all types with names such Dr. Lois Elliot and her colleagues, mostly at Northwestern University in Chicago featuring prominently. 

Downwards spread of masking is really the same thing as the upwards spread of masking phenomenon that was discussed in part 1 of this blog series… so why a separate blog and why a change in name?

That is a very good question!

In short, downwards spread of masking is upwards spread of masking created by lower frequency intermodulation distortion products.

Now the longer form, in English.

We do know that the cochlea in the inner ear, made up of thousands in nerve endings, serves to transmit sounds to the brain (and also to receive feedback).  And, from part 1, we know that these nerve endings are at well-defined locations- much like a backwards piano keyboard with the high pitched notes on the left side where sound enters from the middle ear and with the low pitched bass notes on the right which are embedded deeply inside several spiral turns of the cochlea.  Low pitched bass notes when they are perceived, need to travel the full length of the cochlea, passing the areas responsible for the higher pitched notes before they finally get to the low frequency location.   This is why low pitched bass notes tend to cover up or mask some of the neurological activity associated with more treble high pitched notes.  Some of this is good, but too much is bad.

This is just “upwards spread of masking” but backwards… sort of… Figure courtesy of

If indeed that is how the cochlea works, then how, as the phrase “downwards spread of masking” suggests, is this possible?  Do high pitched notes also mask out the lower pitched bass notes?

The answer is yes and no.

The “no” part of the answer is that low pitched bass notes still mask out higher pitched treble notes, but the “yes” part is that the inner ear creates distortion.

Distortion is a great word because it can be so confusing.  It is true that the word “distortion” connotes something that is negative.  A highly distorted recording sounds worse than a “better” recording.  But like “upwards spread of masking” where some is good, but too much is bad, some distortion is good … and too much (or too little) is bad.

Well, it turns out that having no distortion means that the intelligibility and clarity of a signal is significantly degraded. 

Distortion is not a random activity- distortion can be characterized mathematically and is well-defined.  For example, harmonic distortion of 1000 Hz, means that not only is 1000 Hz correctly transmitted, but that 2000 Hz and 3000 Hz, and…., energy is also transmitted to a certain extent.  That is, with harmonic distortion, it is the higher frequency (integer or odd integer) multiples of the “primary” that are also created.  If the amplitude of these higher frequency sound components is high then we say that there is a high harmonic distortion and if they are still present, but low in amplitude, then we say that there is low harmonic distortion. 

Our auditory systems are designed to have a certain level of harmonic distortion- too much is not good, and too little is also not good.  There is some data suggesting that one reason why senior citizens that have significant hearing loss (and also poorer overall vascular blood flow) is that their cochlear function is more linear- not enough distortion is created- with an associated loss of energy cues.

Another type of distortion that occurs in our normally distorting cochlea, is “intermodulation distortion”.

This type of distortion requires two or more closely spaced “primaries” or notes to be generated and not only will there be harmonic multiples of each of the two notes but also arithmetical combinations of the two notes such as    2f1-f2.  In this example, 2f1-f2 means that there will be an additional distortion product at 2 times the frequency of the first note (f1) – the frequency of the second note (f2).  If the two frequencies are 1000 Hz and 1100 Hz, then 2f1-f2 is 900 Hz.

This 900 Hz (intermodulation) distortion production is lower pitched than the initial 1000 Hz and this serves to mask the 1000 Hz (and possibly the 1100 Hz) note(s) by “upwards spread of masking”.

So “downwards spread of masking” is the result of intermodulation distortion of higher pitched components that create lower frequency sounds that in turn move upward to mask higher frequency sounds. 

I recall reading an article in the early 1980s where the author referred to this as being due to “combinatorial aggregate”.  This is a really neat phrase that I try to use as often as I can at parties but it really only means that these “combination tones” (or intermodulation distortion products) add up together (like an aggregate) to then contribute to upwards spread of masking.

And like “upwards spread of masking” too little or too much of “downwards spread of masking” can be detrimental.

It’s no wonder that nobody has yet been able to build a replica of the human cochlea!

In part 3 of this blog series, “temporal masking” will be discussed.

Leave a Reply