What can “music and hearing aids” teach us about “speech and hearing aids”?

Clearly the most important stimulus that a hearing aid design engineer needs to be concerned about is speech.  Understandably music is a secondary concern.  There are many similarities between music and speech, so designing something for speech will not actually be that different than a circuit or a software program for music.

Both speech and music are wide band spectra and have information over a large range- speech is more limited in that there is no speech energy below the fundamental frequency (about 125 Hz for men and about 250 Hz for women) but both speech and music range up beyond the top note of the piano keyboard (>4000 Hz).  Both speech and music have intense parts (vowels and the forte notes) and both have quieter parts (siblants and pianissimo notes).

A major difference between speech and music- at least as far as hearing aids are concerned- is the overall intensity and the “crest factor”.  Even quiet music tends to be more intense than the more intense components of speech.  The crest factor (also called the Colgate factor…. that’s a joke by the way) is the difference between the peak of the signal and its average or RMS (root mean square).  Traditional estimates for the crest factors of speech are around 12 dB, meaning that the peaks of speech are about 12 dB more intense than the average or RMS or the long term speech spectrum.  This derives from some work done by Sivian and White back in the 1930s as well as more recent work by Cox and her colleagues in 1988.  The crest factor is important for the analysis of hearing aids in a test box.

When we test the function of a hearing aid according to the appropriate ANSI standard (S3.22-2003) we use the crest factor of 12 dB all of the time.  The OSPL90 value – 77 dB is the reference test gain…. Well, 77 dB is 65 dB (average speech level) + 12 dB… the 12 dB rears its head!

In contrast to speech, musical instruments are less damped than the speech output of the human vocal tract, so the crest factor is typically greater- 18-20 dB.  That is, the peaks are peakier relative to the average than speech.  In previous blogs I have discussed the – 6 dB rule which states that for a music program the OSPL90 and the gain need to be set 6 dB lower than a speech-in-quiet program.  The crest factor for music is about 6 dB greater than for speech.

But perhaps we are wrong.

The analyses of the crest factor use an analyzing window of 125 msec.  This makes sense historically because of the limits of our auditory system.  However when dealing with hearing aids, it is not our auditory system but the limits of the analog-to-digital converter in the hearing aids that can be over driven by inputs that are too intense.  There is nothing about 125 msec that is relevant to the operating characteristics of the analog-to-digital converter.  This component of the hearing aid receives input from a number of sources regardless of its temporal characteristics.

The following table (1) shows some measured crest factors for some speech samples and for some music samples.  These were obtained using a spectral analysis program called Adobe Audition (previously called Cool Edit) and a comparison was made between the peak amplitude and the total RMS power (found under the “Statistics” section for those who want to see for themselves).

Stimulus Peak Amplitude-Total RMS Power Crest Factor
Speech #1 -0.92 – -21.97 21.05
Speech #2 -5.53 – -17.99 12.46
Speech #3 -3.65 – – 17.6 13.95
Music #1 -8.62 – – 19.35 10.73
Music #2 -5.0 – – 15.28 10.28
Music #3 -0.98 – – 22.65 21.67
Music #4 -2.45 – – 21.88 19.43

Table 1.  The crest factor determined using a 125 msec analyzing window for several speech samples and for several music samples.

It is clear that some speech samples have crest factors on the order of 12 dB but one clearly is in excess of 20 dB.  The same variability can be seen with music.  Some crest factors are on the order of 20 dB, but some are also around 10 dB.  So even with a 125 msec analyzing window the crest factor is more variable than suggested in the literature (including in my own work).

The next table (2) shows the same stimulus (#2 as seen in the above table) but the crest factors are calculated as a function of analyzing window length (from 500 msec down to 25 msec).

Stimulus 500 400 300 200 125 100 50 25
Speech #2 12.46 12.48 12.46 12.45 12.46 13.22 16.68 16.68

Table 2: Crest factor calculations for speech stimulus #2 measured with varying time analysis windows from 500 msec down to 25 msec.

It appears that the crest factor varies as a function of how it is measured.  Sudden peaks can be greater if measured with a 50 msec time window than a 125 msec one.  A crest factor, added to the stimulus level of a person’s own voice at the level of their own hearing aid can be in excess of what modern day analog-to-digital converters can handle without distortion. (To be fair, not all crest factors for speech increased by this much [4 dB in this example] but many did).

Depending on how the crest factor is measured excessive levels can be obtain even for speech.  So what can “music and hearing aids” teach us about “speech and hearing aids”?  Not a lot but it does suggest that the same improvements necessary to optimize the fidelity of amplified music may be required in order to optimize the quality of one’s own voice when talking.

About Marshall Chasin

Marshall Chasin, AuD, is a clinical and research audiologist who has a special interest in the prevention of hearing loss for musicians, as well as the treatment of those who have hearing loss. I have other special interests such as clarinet and karate, but those may come out in the blog over time.