Steve Bornstein, Ph.D., C.C.C./Audiology
Associate Professor
University of New Hampshire
Department of Communication Sciences and Disorders
Abstract
Children with Central Auditory Nervous System (CANS) Dysfunction have been observed to potentially have several deficits, such as difficulty with temporal tasks, degraded speech, time-compressed speech, and auditory pattern recognition. However, perhaps the greatest overall deficit is the ability to perceive speech when there is background or competing stimuli. One reason for this difficulty probably is that important speech perceptual cues may be missing or inconsistent, thus preventing a child with a compromised Central Auditory Nervous System from being able to develop strong perceptual saliency or phonological awareness of the speech code. One possible way of overcoming the reduction of speech cues in background noise would be the use of Sound Field Systems. In Part 1 of this two-part article difficulties faced by children with CANS Dysfunction, factors influencing the clarity of speech, multiplicative distortion effects, and the relationship to general speech acoustics and perception will be discussed as a theoretical rationale for the use of Sound Field Systems with children who have CANS dysfunction. In Part 2, specifics of the speech spectrum, the concepts of intrinsic and extrinsic redundancy, and further discussion of multiplicative distortion effects will be presented, as well as research evidence supporting the use of Sound Field Systems with children who have CANS dysfunction.
Introduction
A classic principle put forth by Mark Ross has been that a child’s peripheral hearing loss itself is the genesis of all the other difficulties that a child with hearing loss may have [1,2,3]. This simple, but elegant and accurate, principle implies that all other rehabilitative efforts would be negated or minimized if speech were not first at a level of detectability.
This principle can also be extended to children who have Central Auditory Nervous System (CANS) Dysfunction – often referred to as Central Auditory Processing Disorder (CAPD). Although the problem with CANS Dysfunction is not one of detectability in quiet environments, detectability is still an issue when children are listening in background noise. Although several deficient processes have been put forth as causing the overall listening and processing abilities of children with CANS Dysfunction, it is known that children and adults with CANS Dysfunction have a primary difficulty understanding speech in background noise. This difficulty of listening in noise is at least in part a problem with diminishment or lack of audibility of speech cues due to masking effects. For a child with CANS Dysfunction the cues in speech acoustics responsible for accurate perception may be audible in quiet, but might not be when there is background noise.
There have been various deficits associated with CANS Dysfunction such as poor auditory pattern recognition, various temporal difficulties, and problems with degraded acoustic signals [4]. However, even if the most elegant intervention plans are designed to try to alleviate these deficits the effects of these plans will be minimized if a child’s auditory system is trying to decode speech in background noise.
With the preceding two paragraphs in mind, the purpose of this paper is to discuss speech acoustics and perception, factors influencing speech acoustics in typical environments, the effects of these factors on speech perception, and the theoretical rationale for the use of Sound Field FM systems in educational environments.
Factors Influencing the Clarity of Speech
There are several factors that will influence the clarity of speech for children with CANS Dysfunction: the intensity of the background noise, reverberation, distance, a speaker’s rate of speech, other acoustic distortions such as reduction or elimination primarily of high frequencies, temporal masking, the intrinsic distortion within the auditory system itself, and the anatomical and/or physiological loss of redundant pathways within the Central Auditory Nervous System (loss of “intrinsic redundancy”). It is also important to remember the “Multiplicative Distortion” principle described in detail by Harris [5].
The “Multiplicative Distortion” Principle
Harris [5] stated that the combined effects of distortions are more than a simple additive function. For example, suppose someone has the capability of correctly identifying 100% of monosyllabic words under ideal conditions. Now suppose that a level of background noise alone is introduced that would reduce the person’s performance so that they are only able to identify 80% of the words. Then suppose that speech is made faster, such as with time-compressed speech without any background noise or any other distortions, to a point where they are again only able to identify 80% of the words. If one then introduced that same level of noise and that same level of time compression together, one would logically expect the person’s identification score to decrease by an additive total of 40%, or a 60% score. In actuality what happens is that the score would be reduced by more than 40%, due to the effects being “multiplicative”, and with the effects probably being greater for people with sensorineural hearing loss and people with CANS Dysfunction.
Harris [5] used the term multiple-cueing to describe acoustic redundancy in speech, meaning the simultaneous presence of different acoustic stimuli, any one of which alone can carry intelligibility. He preferred using the term “multiple cueing” for acoustic redundancies and using the term “redundancy” for linguistic and contextual redundancies (which also help to decode a spoken message). Harris also demonstrated that when different types of distortions are combined, each of which may have only little or no effect, the cumulative effect on speech intelligibility will be greater than the sum of the individual effects (the multiplicative hypothesis). Harris showed that this combination can produce a dramatic decrease in speech intelligibility. For example, using what seem like crude methods, because more sophisticated signal manipulations were not possible at the time, Harris had different talkers speak hyponasal alone, talking alone while eating, talking fast alone, having speech interrupted alone, having speech reverberated alone, and in various combinations.. Percent correct scores using sentences were obtained for 29-75 listeners. For one talker just using hyponasal speech, listeners scored approximately 99%, and when this talker was eating alone listeners scored approximately 94%. However when this talker spoke hyponasally and was eating, this combination yielded a score of approximately only 40%. Another example was when the talker spoke rapidly alone yielding a score of approximately 98%, but paired with the hyponasal speech (99% alone) the intelligibility score decreased to approximately 69%. With another talker combining the fast speech (mean score of 98% alone) with speech while eating alone (mean score of 94%) the mean score was only 15%. Harris also presented other interesting findings. Of particular interest is when three distortions were combined. For example, for one talker speaking hyponasally and fast together resulted in very little decrement with a mean listener intelligibility score of approximately 88%. However, adding a 4-second reverberation time, which alone produced a mean 89% score, resulted in a mean score of only 26%. Another interesting finding from this study was that although effects were similar for some talkers, effects were different for other talkers. In other words, the particular speech characteristics, or distortions, introduced by a certain talker might decrease speech intelligibility in a listener while a different talker might not produce the same level of distortions. This implies that the potential benefits seen with Sound Field Systems may also be partly determined by the teacher(s) in combination with other acoustic distortions created by the acoustic environment.
Lacroix, Harris, and Randolph [6] corroborated Harris’ earlier findings examining 400 young men with normal hearing using a sentence intelligibility test that was low-pass filtered alone, time-compressed alone, interrupted alone, and noise-masked alone, as well as in different combinations of these distortions. Again, a simple additive combination of distortions did not predict the actual scores, but rather the effect was multiplicative. For example they found that low-pass filtering alone had no effect on speech recognition until higher frequencies were eliminated down to 1000 Hz. They also found that the mean scores for the other distortions in isolation, using the parameters they incorporated, ranged from 88.5% (1000 Hz low-pass filtering) to 93.3% (interrupted speech) compared to 100% scores without distortions. When combining the distortions of low-pass filtered speech and interrupted speech an additive effect would predict an 11.5% reduction for the low-pass filtering plus a 6.7% reduction for the interrupted speech therefore yielding an additive reduction of 18.2% in the speech recognition score. However, in this study the actual reduction when combining these two distortions was 64.2% – a multiplicative effect.
Lacroix and Harris [7] also presented evidence corroborating Harris’ earlier findings regarding multiplicative distortions. They examined sentence intelligibility in persons with normal hearing with artificially imposed frequency limitations by using low-pass filtering , and persons with high-frequency hearing loss, when the speech was distorted by acceleration, by noise, or by interruption, both in isolation and in combinations. Using the parameters they incorporated they found that when information was missing above 3000 Hz the predicted diminishment in speech recognition scores by adding all three distortions would be 4.7%, but the actual decrease was 7.5%. When information was missing above 2000 Hz the predicted decrease was 3.1% and the actual decrease was 13.5%. When information was missing above 1000 Hz the predicted decrease was 10.8% but the actual decrease was 32.7%.
The classic study demonstrating this in children with sensorineural hearing loss was by Finitzo-Hieber and Tillman [8] using conditions of background noise and reverberation. As an example of the deleterious effects of combining distortions, in this study children with hearing loss on average were able to identify 88% of monosyllabic words under ideal conditions. However under a fairly typical classroom setting of a +6 dB signal-to-noise ratio (S/N Ratio) and 0.4 second Reverberation Time (RT) the average score was 55%. Under a not unusual situation of a 0 dB S/N Ratio and a 1.2 second RT the average score was 15%.
Further evidence of multiplicative effects were shown by Bornstein [9], Bornstein and Musiek [10], and Wilson, et al. [11]
While the various factors previously mentioned will influence speech perception in children with CANS Dysfunction, the only 3 that can be directly controlled are noise, reverberation, and distance, and these can be controlled by either Personal or Sound Field Systems. It is known clinically and has been shown in studies that people with sensorineural hearing loss are affected more by background noise than typically hearing people [12] [13] [14]. Generally accepted values for the signal-to-noise ratio needed for chi
ldren with sensorineural hearing to achieve maximum performance is around +26 dB while for children with typical hearing it is around +12 dB [3]. It has also been shown that speech perception begins to significantly deteriorate at Reverberation Times of 0.4 seconds and higher for children with sensorineural hearing loss [8]. It should be noted that a 0.4 second reverberation time is better than what is achieved in most schools.
To this author’s knowledge there is no similar data for children with CANS Dysfunction. However, given the clinical reports and other research data available on children with CANS Dysfunction it is a logical conclusion that they also require greater signal-to-noise ratios than typically hearing children. It is also clear that when combining poor signal-to-noise ratios with other distortions that there should be a dramatic decrease in speech recognition ability above and beyond a simple additive effect.
Overall Speech Intensity, Signal-To-Noise Ratio, and Distance
It must be recognized that speech intensity is not a simple value in many respects. A typical statement regarding speech intensity would be “The average intensity of average overall conversational speech spoken with an average vocal effort at a distance of three feet by a male talker is approximately 65 dB SPL”. Thus, the different and changing intensities of speech should be looked at as being on a continuum and not dichotomously. For example, it is typically stated that female voices are 3 dB less intense than male voices, and children’s voices are 3 dB less than female voices. But obviously, this will change depending upon the exact speaker and their vocal effort. Therefore, it is also generally stated that the intensity of speech ranges from 60-70 dB SPL. However this also does not adequately represent the complex patterns of intensity changes in average ongoing speech where the overall intensity might be as low as 35 dB SPL and as high as 80 dB SPL irrespective of whether the talker is a male, a female, or a child. The most important point to realize is that the acoustical variables comprising speech are very variable and along a continuum, and therefore children with peripheral or CANS Dysfunction do not receive a consistent acoustic signal due to their auditory systems and/or continuously changing environmental factors.
However, for argument’s sake, let’s adopt a value of 65 dB SPL being the overall intensity of speech measured at a distance of three feet. What does this mean for the signal-to-noise ratio faced by children in classrooms? Of course the noise levels in classrooms also vary widely depending on factors such as their construction, where they are located, number of children in the room, and the teacher’s style. However, a ballpark value of 60 dBA SPL is a good approximation of many real-life situations [3] [15] [16] [17]. This would mean that even at a short distance of 3 feet, the signal-to-noise ratio would be +5 dB, well below what is needed for children with CANS dysfunction to even begin to approach their maximum performance. But the situation becomes even worse when one factors in the “Inverse Square Law”. In simple terms, the Inverse Square Law states that every time one doubles the distance from a sound source the intensity theoretically decreases by 6 dB. Therefore, if speech is at 65 dB SPL at a distance of three feet, this means if a child is 6 feet away from a talker, the talker’s intensity would be 59 dB and there would be a -1 dB S/N Ratio.
Below are more examples:
Distance | Intensity | Background Noise | S/N Ratio |
3 feet | 65 dBA SPL | 60 dBA SPL | +5 dB |
6 feet | 59 dBA SPL | 60 dBA SPL | -1 dB |
12 feet | 53 dBA SPL | 60 dBA SPL | -7 dB |
24 feet | 47 dBA SPL | 60 dBA SPL | -13 dB |
It is not uncommon at all for students to have to listen to teachers or other students at distances of 12 feet or 24 feet, putting them in a situations where the overall intensity level of the background noise is greater than the overall intensity level of the speech, specifically a -7 dB S/N Ratio and a -13 dB S/N Ratio respectively.
Thus it can be seen that in typical classrooms signal-to-noise ratios are very often not great enough for children with peripheral hearing loss and probably children with CANS Dysfunction to function maximally without the use of either Personal or Sound Field FM or Infrared systems. This situation becomes even more difficult for children when combining distortions especially since the effects are usually multiplicative and not additive.
Although the deleterious effects of acoustic distortions on overall speech perception present a compelling rationale for the use of Sound Field Systems for children with CANS dysfunction, the case becomes even more compelling when considering the specifics of the speech spectrum. This will be discussed in Part 2, as well as research evidence that exists supporting the use of Sound Field Systems with children who have CANS dysfunction.
Steve Bornstein, Ph.D., C.C.C./A is an associate professor at the University of New Hampshire in the Department of Communication Sciences and Disorders. He earned his Ph.D. from the University of Connecticut. Prior to coming to UNH he was on the faculties at the University of Memphis and Columbia University. Steve previously was also an adjunct associate professor at Dartmouth Medical School in the Department of Otolaryngology. He has had interest, and has published, both in areas related to Central Auditory Nervous System function and Aural Rehabilitation. He currently has a developing interest in the applications of fMRI to the study of Central Auditory Nervous System function. The department at UNH has a nascent research program in brain imaging, particularly in the area of motor speech.
References
- Ross M. Implications of Audiologic Success. J. Am. Acad. Audiol. 3, 1-4 (1992)
- Ross M. The Detection Factor. New York League for the Hard of Hearing Review 3, 8-10, 14 (1989)
- Ross M, Brackett D, Maxon AB. Auditory management principles In: Ross M, Brackett D, Maxon AB, editors. Assessment and Management of Mainstreamed Hearing-Impaired Children: Principles and Practices, Austin: Pro-Ed: 1991, 181-219
- American Speech-Language-Hearing Association Task Force on Central Auditory Processing Consensus Development. Central auditory processing: Current status of research and implications for clinical practice. American Journal of Audiology 5(2), 41-54 (1996)
- Harris JD. Combinations of Distortion in Speech: the Twenty-Five Percent Factor by Multiple-Cueing. Arch. Otolaryngol. 72, 227-232 (1960)
- Lacroix P.G. Harris J.D. & Randolph K.J. Multiplicative Effects on Sentence Comprehension for Combined Acoustic Distortions. J. Speech Hear. Res. 22, 259-269 (1979)
- Lacroix P.G. & Harris J.D. Effects of High-Frequency Cue Reduction on the Comprehension of Distorted Speech. J. Speech Hear. Dis. 44(2), 236-246 (1979)
- Finitzo-Hieber T. & Tillman. TW. Room Acoustics Effects on Monosyllabic Word Discrimination Ability for Normal and Hearing-Impaired Children. J. Speech Hear. Res. 21, 440-458 (1978)
- Bornstein S.P. Time-Compression and Release from Masking in Adults and Children. J. Am. Acad. Aud. 5, 89-98 (1994)
- Bornstein S.P. & Musiek F.E. Recognition of Distorted Speech in Children with and Without Learning Problems. J. Am. Acad. Audiol. 3,22-32 (1992)
- Wilson R.H. Preech J.P. Salamon D.L Sperry J.L. & Bornstein S.P. Effects of Time Compression and Time Compression Plus Reverberation on the Intelligibility of Northwestern University Auditory Test No. 6. J. Am. Acad. Audiol. 5, 269-277 (1994)
- Crandell C. Smaldino J. & Flexer C. Sound Field Amplification: Applications to Speech Perception and Classroom Acoustics. Clifton Park: Thompson Delmar Learning (2005
- Killion M. SNR Loss: I Can Hear What People Say but I Can’t Understand Them. Hear Rev. 4, 10, 12, 14 (1997)
- Johnson C. Children’s Phoneme Identification in Reverberaton and Noise. J. Speech Lang .Hear. Res. 43, 144-157 (2000)
- Bess F. Sinclair J. & Riggs D. Group Amplification in Schools for the Hearing Impaired. Ear Hear. 5, 138-144 (1986)
- Crandell C. & Smaldino, J. An Update of Classroom Acoustics for Children with Hearing Loss. Volt. Rev. 1, 4-12 (1995)
- American National Standards Institute. Acoustic Performance Criteria, Design Requirements and Guidelines for Classrooms. ANSI S12.6-2002. New York: American National Standards Institute (2002)