One of the larger embarrassments of the 1960s and 1970s, other than bell bottoms, was the systemic sexism in the broadcast industry. Using pseudoscience, women were told that they could not be broadcasters because the pitch of their voices was too high. Specifically, if the speaker’s fundamental frequency was high, as in a woman’s or child’s voice, then the harmonics were spaced further apart and this sparse lack of energy and auditory cues made their voice less intelligible.
The first part of the statement is true (wider spaced harmonics) but there is little evidence for the second part (less intelligibility).
The following illustrative example may be helpful:
In a man’s voice, such as mine, the fundamental frequency is 125 Hz so the harmonics are at 250 Hz, 375 Hz, 500 Hz, 625 Hz, 750 Hz, 875 Hz, 1000 Hz, and so on- there are harmonics at integer multiples of 125 Hz. For this man’s voice, up to 1000 Hz there are 8 energy cues (the fundamental and 7 harmonics).
In a women’s voice, the fundamental frequency is 250 Hz so the harmonics are at 500 Hz, 750 Hz, 1000 Hz, and so on – there are harmonics at integer multiples of 250 Hz. For this woman’s voice, up to 1000 Hz there are only 4 energy cues (the fundamental and 3 harmonics).
So far, this is all true given these two examples. Now the train goes off the rail.
Based on this science, broadcast executives argued (or perhaps implicitly felt) a man’s voice has twice as much energy (8 vs. 4) so should be twice as intelligible (or at least be more intelligible) than a woman’s voice. This is where the pseudoscience comes in.
More specifically, the question needs to be addressed is how far apart can harmonics be spaced before this contributes to a degradation in intelligibility or even a degradation in quality?
We seem to be able to understand women’s and children’s voices just as well as men’s voices, and in many cases the intelligibility may be better. Once we are dealing with a hearing loss this may be an issue but even then, the necessary experiments have never been done.
A July 2015 study was published in the Journal of the Acoustical Society of America (Express Letters) with the telling name “The phonological function of vowels is maintained at fundamental frequencies up to 880 Hz” by Daniel Friedrichs and his colleagues.
In this study, the authors systematically created synthetic vowel sounds with the required formants that were always at the correct frequency location, but with the underlying harmonic structure being more and more sparsely distributed. It is really the underlying energy of the harmonics that define the formant structure- formants don’t really exist on their own other than the vocal tract acting to “filter” the various harmonics by amplifying some of them (near the formant frequency) and attenuating or lessening others (that are further from the formant frequency).
In the vowel [a] as in ‘father’ for example, the first formant is at 500 Hz. The authors started with a low frequency pitch or fundamental frequency and gradually increased the fundamental frequency (and spacing of the harmonics) up to 880 Hz. At this point, people could no longer identify the vowel- there just wasn’t sufficient underlying harmonic structure to auditorilly define the various formants. At a fundamental frequency of 880 Hz (even above the falsetto range), the first harmonic would be at 1760 Hz and the second one at 2640 Hz. Given that the highest possible frequency of a first formant (F1) is 500 Hz (for low vowels), there was no harmonic information to define F1 and barely anything to define F2, yet people were able to correctly identify the vowel with such a sparce harmonic structure.
Women’s voices have a fundamental frequency from about 150 Hz to 300 Hz and this is much lower than this rarefied 880 Hz break-down point so indeed the behavior of the executives in the broadcast industry in the 1960s and 1970s was indeed systemic sexism.