Speech is not a broadband signal… but music is.

Marshall Chasin
October 10, 2017

We tend to be biased, both in our training and in our technologies that we use. We tend to look at things based on spectra or frequencies.  Phrases such as “bandwidth” and long term average speech spectrum show this bias. The long term average speech spectrum, with is averaged over time, is indeed a broad bandwidth spectrum made up of lower frequency vowels and higher frequency consonants, but this is actual false.

At any point in time, speech is either low frequency OR high frequency, but not both.  A speech utterance over time, may be LOW FREQUENCY VOWEL then HIGH FREQUENCY CONSONANT then SOMETHING ELSE, but speech will never be both low frequency emphasis and high frequency emphasis at any one point in time. 

Speech is sequential.  One speech segment follows another in time, but never at the same time.

In contrast, music is broadband and very rarely narrow band. 

With the exception of percussion sounds, music is made up of a low frequency fundamental note (or tonic) and then a series of progressively higher frequency harmonics whose amplitudes and exact frequency location define the timbre.  It is impossible for music to be a narrow band signal.

It is actually a paradox that (1) hearing aids have one receiver that needs to have similar efficiency at the both the high frequency and the low frequency regions and (2) musicians “prefer” in-ear monitors and earphones that have more than one receiver.  If anything, it should be the opposite.  I would suspect that the musicians’ preference for more receivers (and drivers) is a marketing element where “more” may be perceived as “better”.

At any one point in time, musicians should be wearing a single receiver, single microphone, and single bandwidth in-ear monitor.  This will ensure that what is generated in the lower frequency region (the fundamental or tonic) will have a well-defined amplitude (and frequency) relationship with the higher frequency harmonics.  This can only be achieved with a truly single channel system. A “less is more solution”.

This same set of constraints does not hold for speech.  If speech contains a vowel (or nasal, collectively called a sonorant), it is true that there are well-defined harmonics that generate a series of resonances or formants but for good intelligibility one only needs to have energy up to about 3500 Hz.  Indeed, telephones only carry information up to 3500 Hz.  If speech contains a sibilant consonant, also known as obstruents (‘s’, ‘sh’, ‘th’,’f’,…) there are no harmonics and minimal sound energy below 2500 Hz.  Sibilant consonants can extend beyond 12,000 Hz, but never have energy below 2500 Hz.

Speech is either low frequency sonorant (with well-defined harmonics) or high frequency obstruent (no harmonics), but at any one point in time it’s one or the other, but not both.  Music must have both low and high frequency harmonics and the exact frequencies and amplitudes of the harmonics provide much of the definition to music.

This also has ramifications for the use of frequency transposition or shifting.

It makes perfect sense to use a form of frequency transposition or shifting for speech.  This alters the high frequency bands of speech where no harmonics exist.  Moving a band of speech (e.g. ‘s’) to a slightly lower frequency region will not alter any of the harmonic relationships.

But for music, which is defined only by harmonic relationships in both the lower and the higher frequency regions, frequency transposition or shifting will alter these higher frequency harmonics.

Clinically for a music program, if there are sufficient dead cochlear regions or severe sensory damage, reducing the gain in a frequency region is the correct approach, rather than changing the frequency for a small group of harmonics.


  1. Hmmm….Why then do I have 3 speakers in my stereo/multi channel speaker cabinets for listening to music. Maybe it is because one speaker won’t cover the entire spectrum of music. Balanced armature recievers do not necessarily cover the entire bandwidth for music. Two can. More than two is marketing. Earbuds with a single conventional speaker have a wide bandwidth capability but are too large for a conventional hearing aid, but are good for in the ear monitors with a belt pack.

  2. Isn’t the voice signal composed of harmonics? Doesn’t voice has a timbre? Why hearing impaired people should be deprived of perceiving it? Isn’t voice a musical instrument ? And in this case, does voice suddenly becomes a broadband sound when it comes to singing? Are the plosive burst completely deprived of low frequencies?

    1. Marshall Chasin Author

      Any vocal utterance whether its speech or singing, is “sequential”. The human voice does have significant low frequency information (vowels, nasals, liquids, and many fundamental energy of the sonorants) as well as high frequency obstruents such as fricatives, affricates, stops, and plosives, BUT at any one point in time, speech is either low frequency (sonorant) or high frequency (obstruent) but never both at the same time. At any one point in time, speech-vocal is narrow band (either low frequency or high frequency but never both).

      In contrast, instrumental music is always low frequency (fundamental) and higher frequency (harmonics) at the same time. Instrumental music is always broadband.

    2. Marshall Chasin Author

      All plosives (eg. stops and affricates) are high frequency and are devoid of harmonic structure (similar to fricatives). Only the sonorants (vowels, nasals, and /l/ and /r/ liquids) have harmonic structures.

Leave a Reply