Time Compressed Speech

Dr. Frank Musiek
June 7, 2017

Herbert Jay Gould, Ph.D.
School of Communication Sciences and Disorders, The University of Memphis

 

Compressed speech has been studied since the early 1950s (Garvey 1953) and has been suggested as a tool for the diagnosis of APD since at least 1977 (Beasley and Freeman 1977).
The general consensus is that abnormal performance on compressed speech material reflects a deficit in the contralateral hemisphere(Keith 2002). Time compressed speech has been used by a number of individuals as part of the central test battery and has been recommended as part of the test battery by ASHA (ASHA Task Force on Central Auditory Processing Consensus Development 1996). It was not part of the recommended battery in a more recent consensus conference (Jerger and Musiek 2000). However, that conference recommended temporal functions be assessed and at least one author has interpreted this to possibly include compressed speech (Keith 2002). In order to understand what compressed speech indicates about the auditory system it is important to understand the methods that have been used for compressing speech.

Compressed speech that is not altered in frequency is created by removing short durations of the signal and splicing the remaining parts together. A 60% compressed signal would therefore have 60% of the original signal removed (Fairbanks, Everitt et al. 1954). The unit of time removed will affect the intelligibility and typically intervals between .01 and .08 seconds are discarded. Intervals longer than .16 seconds can remove phonemes and result in lower intelligibility (Fairbanks and Kodman 1957; Lee 1972).

A method of visualizing time compression is illustrated by a string of characters A1,A2,A3,A4,B1,B2,B3,B4,C1,C2,C3,C4. Each letter in the string represents a syllable and each number within the string represents a sub-portion of the acoustic signal of the syllable.
If sections A3, B3 and C3 are removed, 25% of the signal will be missing, but will have left gaps were these sections are removed. This would be similar to the study by Millerand Licklider (Miller and Licklider 1950). Attaching the A, B or C4 sections to the respective end of the A, B or C 2 sections to remove the gap results in a time-compressed signal. The resulting overall signal is shorter; however, the remaining portions are acoustically intact and unchanged.

In 1953 Garvey performed the above on reel-to-real tape recordings carefully measuring, cutting and splicing tape in the above manner to create the first time compressed tape (Garvey 1953). This process was tedious and the Fairbanks compressor was developed at the University of Illinois in 1954 (Fairbanks, Everitt et al. 1954). The compressor had a rotating drum with 4 playback heads situated at 90 degree intervals. As a tape with the original material passed over the rotating drum, the heads would engage and disengage the original material. The length of time between one head disengaging and the next head engaging resulted in a discard of small sections of the original recording. The relative rates of the original tape and the rotational speed of the drum provided the compression ratio.

Time compression creates several difficulties with acoustic signal. The first of these problems is that the cut is random to the acoustic signal. This means that more often than not it is at a non-zero cross point and that the abutted portions of the signal have different amplitude characteristics. This resulted in a series of low-level clicks in the signal and a resultant reduction in the signal to noise ratio. As the compression ratio increases, the SNR decreases. The rotary head devices such as the Fairbanks compressor provided some advantage in this problem as the head gradually engaged and disengaged with the tape providing a slightly ‘gated’ effect with a small rise and fall time to each segment. Digital processing for compressed speech, which started in 1972, again had the hard cut that created distortion (Lee 1972). This was remedied by applying rise/fall gates to the retained signal sections.

A second problem is also associated with the random nature of the cut and splice operation. This problem is that no two compressions of a taped signal are the same. The classic normative data for compressed speech is for the NU-6 list recorded by Rintelmann (Beasley, Schwimmer et al. 1972; Konkle, Beasley et al. 1977). However, these norms are only valid for the original set of tapes and direct copies of the tape. To my knowledge, the original tapes are no longer in existence, and it is probable that any tape copies of the original have ghosted through making them unusable (I may still have a digitized copy in my laboratory). I have recreated various compressions of the Rintelmann NU6 recording and have never obtained an exact match to the originals.

The advent of computers and digital media for acoustic signals has altered methods for compressing speech material. Now sections can be removed selectively. For instance, steady state portions of the vowel can be removed or periods of silence between words can be removed. Similarly, the sections can be made so that the realigned portions are matched in phase and amplitude eliminating the click that occurred previously. This means that there no longer needs to be an interaction of signal to noise ratio with compression. Longer pauses between words can be shortened thereby reducing the overall duration of the speech without reduction in content. These types of compression cannot be compared to the old compression techniques in that the SNR is typically much better and intelligibility is improved. Finally, time compressed speech most likely is not a measure of the auditory systems temporal resolution ability as the retained sections are not temporally altered. It is more likely that it is assessing the individual’s ability to perform an auditory closure on the missing information.

 

References

  1. ASHA Task Force on Central Auditory Processing Consensus Development (1996). “Central Auditory Processing: Current Status of Research and Implications for Clinical Practice ” Am Jour Audiology 5: 42-54.
  2. Beasley, D., S. Schwimmer, et al. (1972). “Intelligibility of time compressed CNC monosyllables.” Journal of Speech and Hearing Research 15: 340-350.
  3. Fairbanks, G., W. Everitt, et al. (1954). “Method for Time or Frequency Compression-Expansion of Speech.” TRANS IRE-PGA 2: 7-11.
  4. Fairbanks, G. and F. Kodman (1957). “Word Intelligibility as a Function of Time Compression.” Jour. Acoust. Soc. Am. 29: 636-641.
  5. Garvey, W. (1953). “The intelligibility of Speeded Speech.” Journal of Experimental Psychology 45(2): 102-108.
  6. Jerger, J. and F. Musiek (2000). “Report of the Consensus Conference on the Diagnosis of Auditory Processing Disorders in School-Aged Children.” Journal of the Academy of Audiology 11: 467-474.
  7. Keith, R. W. (2002). “Standardization of the time compressed sentence test.” Journal of Educational Audiology 10: 15-20.
  8. Konkle, D., D. Beasley, et al. (1977). “Intelligibility of time-altered speech in relation to chronological aging.” Journal of Speech and Hearing Research 20: 108-115.
  9. Lee, F. (1972). “Time Compression and Expansion of Speech by the Sampling Method.” Journal of the Audio Engineering Society 20(9): 738-743.
  10. Miller, G. A. and J. C. R. Licklider (1950). “The Intelligibility of Interrupted Speech.” J Acoust Soc Am 22(2): 167-173.
  1. ‘abnormal performance on compressed speech material’ sounds like you or your subjects recite what I’ll call Soviet-cut edited sound and do awesome at recreating every pop, pitch break and pacing jerk. This promises to irk fans of compressors as we use them today in mixing and for room balance. Here I’d wondered what was in store in relationship to Podcast Addict’s (pitch-adjusted) speed dial, code switching, and the changing speech of people.

Leave a Reply