What can music tell us about speech?

Marshall Chasin
October 2, 2012

Over the last month or so I have been receiving emails (and the occasional phone call, and even one fax… remember those?) about a recent post on my blog about what music can tell us about speech.  Given some of the questions I feel that it’s worth-while clarifying my statements.  Also, I have written an article about this issue that will appear in next month’s issue of Hearing Review (www.hearingreview.com, if you don’t receive a free hard copy in the mail).

The study of speech as an input to a hearing aid has provided a wealth of information so that we can consider the realm of music as an input to a hearing aid, but few people consider the converse.

Here is the short form and then we will go to the slightly longer form:

We were wrong when it comes to saying that the crest factor of speech is 12 dB.

And now here is the longer form:

If you recall, the crest factor is the difference in decibels between the instantaneous peak in a signal and its RMS (or average) value.  Traditionally the crest factor of speech has been taken to be 12 dB (the peaks are 12 dB greater than the RMS value) and this has been based on the work of Sivian and White (1933).  Sivian and White wisely chose an analyzing bandwidth of 125 msec or 1/8 of a second and this was quite reasonable since the time constants of our auditory system are on the order of 125 msec.  Shorter time analysis windows would not make sense.

It turns out that the calculation of the crest factor is a function of the time analysis window- the shorter the window length, the greater is the instantaneous peak that is measured such that the crest factor is higher.

But wait a minute. We just discussed that it doesn’t make sense to have an analysis window less than 125 msec.  Keep in mind that this figure is related to the limitations and temporal characteristics of our auditory system.  We are not talking about our auditory system here.  We are talking about what is presented to the microphone and analog-to-digital (A/D) converter of a hearing aid.  These hearing aid components have not such “125 msec restriction”.  Modern hearing aid microphones and other hearing aid components can respond with very short delays and windows.  The 125 msec window of analysis becomes meaningless.

Before we go on, let’s review how the crest factor enters our every-day clinical lives.  We see this whenever we calculate the reference test gain on a hearing aid when it is tested electro-acoustically.  The reference test gain is the OSPL-90 – “77 dB”.  If the OSPL-90 is 117 dB SPL then the reference test gain is 40 dB.  Measures such as the frequency response, distortion, and internal noise are performed at the reference test gain setting for the hearing aid.

Why 77 dB?  This is 65 dB SPL + 12 dB.  The 65 dB SPL value is average conversational speech at 1 meter and the 12 dB is the crest factor.  So, the crest factor is ubiquitous in our clinical lives.

This typically rears its ugly head when we are dealing with instrumental music as an input to a hearing aid.  Because of the lower levels of inherent damping in hard walled musical instruments the crest factor is greater- the instantaneous peaks are greater than those which would emanate from the highly damped human vocal tract.  Traditional measures of the crest factor for musical instruments have been quoted as closer to 18-20 dB than that for speech (12 dB).

The following table shows the crest factor calculation for the same speech sample (and this is where music informs us about speech).  Across the top is the length of the window of analysis in msec and across the bottom is the respective crest factor in dB.  A value of 12.46 dB for a 125 msec analysis window is typically what is used with testing hearing aids but when it comes to an issue of an input to a hearing aid values on the order of 50 msec are not unusual (with associated crest factors of almost 17 dB).

500 400 300 200 125 100 50 25
12.46 12.48 12.46 12.45 12.46 13.22 16.68 16.68

Given a sufficiently short time analysis window, the instantaneous peak is greater relative to the RMS of the speech signal such that the crest factor now is on the order of 16-17 dB and not merely 12 dB.

The immediate response is “So what?!”  Even 65 dB SPL + 17 dB is only 82 dB SPL and this is far lower than the upper limit of modern hearing aids to transduce inputs without appreciable distortion- it is far less than 95 dB SPL.  And people that say this are absolutely correct, except when we are talking about the level of the hearing aid wearer’s own voice.

Average conversational speech at 1 meter is indeed on the order of 65 dB SPL but a person’s own voice (since it is only inches from their own hearing aid microphones) can easily be 80-85 dB SPL.  And 85 dB SPL + 17 dB = 102 dB SPL.  This input value is far in excess of the capability of most modern hearing aids.

A hearing aid wearer’s own voice therefore can easily overdrive the capability of the A/D converter with modern technology.

Solutions that address this with music as an input to a hearing aid- less sensitive microphones; analog compressors prior to the A/D converter; and auto-ranging dynamic ranges within the A/D converter- would therefore be of great use for speech as well.  Not someone else’s speech but a hearing aid wearer’s own speech.

The study of music and hearing aids can tell us a lot about the study of speech and hearing aids… at least for the hearing aid wearer’s own voice- and this is true regardless of whether the hearing aid wearer sings or just talks.

Leave a Reply