I know that it is a great pastime for all of us to take a few moments and go through our old journals to see if we missed any important articles. Well, I ran in to this old article which I had marked for future reading and even had downloaded a pdf.  It was sitting in a file called READ THIS. Obviously I don’t follow directions well. It is called “Impact of three hours of discotheque music on pure-tone thresholds and distortion product otoacoustic emissionsand appeared in the October 2010 issue of JASA.

Courtesy of www.serioussame.wikia.com

My initial reaction was “I am actually not sure how this 2010 publication could have seen the light of day” given the previous publication of a ream of articles in the mid-2000s about how the effects of music and noise exposure resolve over a number of hours (TTS) if only cochlear phenomena are examined, but that there can be remaining neural pathologies which do not resolve. Using cochlear measures is like using a sledge hammer where a fine toothed comb is probably a better tool.

Puretone thresholds and otoacoustic emission results assess the sensitivity and the function of the cochlea but are insensitive to permanent neural pathologies that may be the result of loud music or loud noise.

Then, my second reaction started to sink in.  Well, if we do see changes in these blunt sledgehammer-like measures, then something major may be going on and perhaps we shouldn’t throw the baby out with the bathwater.

Courtesy of www.hair.allwomenstalk.com

In this article, subjects were exposed to 3 hours of disco music. The effects of this rather brief exposure were measured with high resolution pure tone TTS measures in the 3500-4500 Hz region as well as optoacoustic emission testing using the same “before” and “after” TTS paradigm. 

The intent of the article was not really to find an effect (which would have been evident at the neural level manifest as a delayed wave I or an altered SP/AP ratio). Rather, it was to see if otoacoustic emissions (which are relatively easy to do as opposed to the more laborious and subjective pure tone thresholds) were sufficiently correlated.

Indeed they were correlated once some calibration issues were controlled. This bodes well for using high resolution otoacoustic emission testing to assess some parameters of temporary threshold shift. Of course, the puretone threshold elevation and the changes in otoacoustic emission testing resolved after a period of time.

The point is that we have something that we can use other than puretone threshold shifts to assess temporary hearing loss or TTS, and we are OK as long as we understand the limitations of TTS measures.

I could have titled this blog “Don’t throw the baby out with the bathwater”, which was exactly what I was doing.  And this is such an awful phrase, I promise not to ever use it again!

One of the larger embarrassments of the 1960s and 1970s, other than bell bottoms, was the systemic sexism in the broadcast industry. Using pseudoscience, women were told that they could not be broadcasters because the pitch of their voices was too high. Specifically, if the speaker’s fundamental frequency was high, as in a woman’s or child’s voice, then the harmonics were spaced further apart and this sparse lack of energy and auditory cues made their voice less intelligible. 

The first part of the statement is true (wider spaced harmonics) but there is little evidence for the second part (less intelligibility).

The following illustrative example may be helpful: 

In a man’s voice, such as mine, the fundamental frequency is 125 Hz so the harmonics are at 250 Hz, 375 Hz, 500 Hz, 625 Hz, 750 Hz, 875 Hz, 1000 Hz, and so on- there are harmonics at integer multiples of 125 Hz.  For this man’s voice, up to 1000 Hz there are 8 energy cues (the fundamental and 7 harmonics).

In a women’s voice, the fundamental frequency is 250 Hz so the harmonics are at 500 Hz, 750 Hz, 1000 Hz, and so on – there are harmonics at integer multiples of 250 Hz. For this woman’s voice, up to 1000 Hz there are only 4 energy cues (the fundamental and 3 harmonics).

So far, this is all true given these two examples. Now the train goes off the rail.

Based on this science, broadcast executives argued (or perhaps implicitly felt) a man’s voice has twice as much energy (8 vs. 4) so should be twice as intelligible (or at least be more intelligible) than a woman’s voice. This is where the pseudoscience comes in.

Courtesy of www.pinterest.com

More specifically, the question needs to be addressed is how far apart can harmonics be spaced before this contributes to a degradation in intelligibility or even a degradation in quality?

We seem to be able to understand women’s and children’s voices just as well as men’s voices, and in many cases the intelligibility may be better. Once we are dealing with a hearing loss this may be an issue but even then, the necessary experiments have never been done.

A July 2015 study was published in the Journal of the Acoustical Society of America (Express Letters) with the telling name “The phonological function of vowels is maintained at fundamental frequencies up to 880 Hz” by Daniel Friedrichs and his colleagues.

In this study, the authors systematically created synthetic vowel sounds with the required formants that were always at the correct frequency location, but with the underlying harmonic structure being more and more sparsely distributed.  It is really the underlying energy of the harmonics that define the formant structure- formants don’t really exist on their own other than the vocal tract acting to “filter” the various harmonics by amplifying some of them (near the formant frequency) and attenuating or lessening others (that are further from the formant frequency).

Courtesy of www.odysseyonline.com

In the vowel [a] as in ‘father’ for example, the first formant is at 500 Hz. The authors started with a low frequency pitch or fundamental frequency and gradually increased the fundamental frequency (and spacing of the harmonics) up to 880 Hz. At this point, people could no longer identify the vowel- there just wasn’t sufficient underlying harmonic structure to auditorilly define the various formants. At a fundamental frequency of 880 Hz (even above the falsetto range), the first harmonic would be at 1760 Hz and the second one at 2640 Hz. Given that the highest possible frequency of a first formant (F1) is 500 Hz (for low vowels), there was no harmonic information to define F1 and barely anything to define F2, yet people were able to correctly identify the vowel with such a sparce harmonic structure.

Women’s voices have a fundamental frequency from about 150 Hz to 300 Hz and this is much lower than this rarefied 880 Hz break-down point so indeed the behavior of the executives in the broadcast industry in the 1960s and 1970s was indeed systemic sexism.