Speech is not a broadband signal… but music is

We tend to be biased, both in our training and in our technologies that we use. We tend to look at things based on spectra or frequencies.  Phrases such as “bandwidth” and long term average speech spectrum show this bias. The long term average speech spectrum, with is averaged over time, is indeed a broad bandwidth spectrum made up of lower frequency vowels and higher frequency consonants, but this is actual false.

At any point in time, speech is either low frequency OR high frequency, but not both.  A speech utterance over time, may be LOW FREQUENCY VOWEL then HIGH FREQUENCY CONSONANT then SOMETHING ELSE, but speech will never be both low frequency emphasis and high frequency emphasis at any one point in time. 

Speech is sequential.  One speech segment follows another in time, but never at the same time.

In contrast, music is broadband and very rarely narrow band. 

With the exception of percussion sounds, music is made up of a low frequency fundamental note (or tonic) and then a series of progressively higher frequency harmonics whose amplitudes and exact frequency location define the timbre.  It is impossible for music to be a narrow band signal.

It is actually a paradox that (1) hearing aids have one receiver that needs to have similar efficiency at the both the high frequency and the low frequency regions and (2) musicians “prefer” in-ear monitors and earphones that have more than one receiver.  If anything, it should be the opposite.  I would suspect that the musicians’ preference for more receivers (and drivers) is a marketing element where “more” may be perceived as “better”.

At any one point in time, musicians should be wearing a single receiver, single microphone, and single bandwidth in-ear monitor.  This will ensure that what is generated in the lower frequency region (the fundamental or tonic) will have a well-defined amplitude (and frequency) relationship with the higher frequency harmonics.  This can only be achieved with a truly single channel system. A “less is more solution”.

This same set of constraints does not hold for speech.  If speech contains a vowel (or nasal, collectively called a sonorant), it is true that there are well-defined harmonics that generate a series of resonances or formants but for good intelligibility one only needs to have energy up to about 3500 Hz.  Indeed, telephones only carry information up to 3500 Hz.  If speech contains a sibilant consonant, also known as obstruents (‘s’, ‘sh’, ‘th’,’f’,…) there are no harmonics and minimal sound energy below 2500 Hz.  Sibilant consonants can extend beyond 12,000 Hz, but never have energy below 2500 Hz.

Speech is either low frequency sonorant (with well-defined harmonics) or high frequency obstruent (no harmonics), but at any one point in time it’s one or the other, but not both.  Music must have both low and high frequency harmonics and the exact frequencies and amplitudes of the harmonics provide much of the definition to music.

This also has ramifications for the use of frequency transposition or shifting.

It makes perfect sense to use a form of frequency transposition or shifting for speech.  This alters the high frequency bands of speech where no harmonics exist.  Moving a band of speech (e.g. ‘s’) to a slightly lower frequency region will not alter any of the harmonic relationships.

But for music, which is defined only by harmonic relationships in both the lower and the higher frequency regions, frequency transposition or shifting will alter these higher frequency harmonics.

Clinically for a music program, if there are sufficient dead cochlear regions or severe sensory damage, reducing the gain in a frequency region is the correct approach, rather than changing the frequency for a small group of harmonics.


This is a guest blog based on the theme, that some is good, but too much is not good.  I have invited Pieter van ‘t Hof (pvthof@dynamic-ear.com) who is the Manager of Research and Development of Dynamic Ear Company to contribute this week.

Pieter van ‘t Hof


Suppose you are on a cocktail party which has just started. Only a couple of people around, sound levels are low. Someone comes up to you and says. “What do you have on your ears? why would you be wearing hearing protection? It is so quiet here.” And after a closer look: “By the way, they look awesome, like earrings!” You say: “Well, thank you! The thing is: I really like to wear my hearing protectors. They fit very comfortably in my ear. Because they are able to transfer humidity, my ears feel fresh and the sound is just as normal, only on a lower level. As a matter of fact, I’d rather be here with hearing protection than without, because I understand you better now…. Occlusion? No, I don’t feel occluded at all.”  

And then you could argue that you feel less fatigue when you go back home after wearing them. Furthermore, you hear conversation better at a cocktail party, because your brain can’t filter out the background noise that is omnipresent, but your hearing protection can. It reduces the noise to a lower level. The ratio between the speech signal and the noise is still the same, but at the lower level your brain is better able to discriminate. But this is far too technical … after all, its only a party!

The point is that this is how hearing protection should be known. It should be a luxury, like jewelry on your ear.  You shouldn’t need to remove your hearing protectors at work,because they are uncomfortable to wear or you can’t understand your co-worker. You should expect more from them. 

The problem with the NRR:

However, there is one big issue: it is called overprotection. Hearing protection products (HPD) are derated. You could design the best HPD that scores on all aspects, but if it doesn’t get a Noise Reduction Rating (NRR) of about 21 dB or more you are nowhere. The effective protection according to OSHA is the (NRR-7)/2. Extensive studies have been done on several classes of hearing protectors. And, yes, in some cases it is so hard to fit the products in the ear correctly that some derating might be needed. But this is not because that is just needed for all HPD, but this is because it was designed so poorly that most people aren’t able to place it in the ear. And now we have to wear much more protection than needed.

Universal size uniform attenuation earplugs

If you work in an environment with a time weighted average noise level of 99 dB(A), you would need 14 dB effective protection to bring the level back to the safe level of 85 dB(A). This means that you would need a product having an NRR of 35 dB. Only a couple of those products exist. In order to come to a product having this NRR, you would need to subtract 2 times the standard deviation from the average attenuation values (on each octave band value). If you would get to a standard deviation of 3 dB, which is low for such a product, you would need a HPD having an average attenuation of 43 dB. So you need 14 dB and what you get is 43 dB. If no earplugs are available of 35 dB, you would need to wear an earplug and on top of that, a set of earmuffs. 

No wonder, people don’t like to wear HPD at work. No wonder they take them off as soon as they can. They are very much over protected. 

But it could be different. It could feel like the cocktail party for the worker in the machine shop. It all starts with good design of HPD. The HPD should be so easy use that you will obtain a good acoustical seal, especially after training. Then derating is no longer needed. People could use a product with a 15 dB NRR in the example above and will be safe. They could communicate with their colleagues and their ears will feel great.