Suprathreshold Speech Recognition

Suprathreshold refers to speech presented above the auditory threshold of the listener. Speech recognition is generally defined as the percentage of words or sentences that can be accurately heard by the listener. For example, a patient who could correctly repeat 40 out of 50 words presented would have 80% speech recognition. Because speech is a complex and continually varying signal requiring multiple auditory discrimination skills, it is not possible to accurately predict an individual's speech recognition from the pure-tone audiogram (Marshall and Bacon, 1981). Measurement of suprathreshold speech recognition allows clinicians to assess a patient's speech communication ability in a controlled and systematic manner. The results can help clinicians distinguish between different causes of hearing loss and plan and evaluate audiological rehabilitation programs.

Speech is a complex acoustic signal that varies from moment to moment: from shouting to whispering, from clear speech in quiet to difficult to understand speech in high ambient noise. Figure 1 shows the expected frequency and intensity of speech sounds for speech spoken at a conversational level in quiet. In general, the vowel sounds contain lower frequency information and are more intense, while consonant sounds contain higher frequency information and are produced at a lower intensity level. Which sounds are audible depends on the listener's audiometric thresholds as well as on the speaker and the level at which the words are spoken. For example, the sounds of shouted speech would be shifted to a higher intensity level and have a slightly different pattern across frequency (Olsen, 1998). Figure 1 also shows audiograms for two different listeners. The first listener has a moderately severe hearing loss. This liso-1-1-i-





m b




C (D


125 250 500 1000 2000 4000 8000

Frequency (Hz)

Figure 1. Audiogram showing expected frequency and intensity of speech sounds. Illustrative hearing thresholds are also shown for a listener with a moderately severe hearing loss (circles) and for a listener with normal hearing in the low frequencies falling to a mild loss in the high frequencies (triangles). In each case, speech sounds falling below the hearing threshold (i.e., at higher intensity levels) are audible to the listener; speech sounds falling above the hearing threshold (i.e., at lower intensity levels) are inaudible.

tener would likely hear few, if any, conversational speech sounds and would be expected to have very poor speech recognition. The second listener has normal hearing in the low frequencies falling to a mild hearing loss in the high frequencies. This listener might hear some, but not all, of the speech sounds. The inability to hear high-frequency consonants would likely result in less than 100% speech recognition for a conversational-level signal.

Most commonly, the material used to measure speech recognition is a list of monosyllabic words. Typically each word is preceded by a carrier phrase, such as ''Say the word-.'' Most available monosyllabic word lists are open set; that is, the listener is not restricted to a predetermined list of possible responses. A number of standard lists have been developed with vocabulary levels appropriate for adults (Hirsh et al., 1952; Tillman and Carhart, 1966) or children. The lists exclude words that would be unfamiliar to most people, and word selection is balanced to maintain similar levels of difficulty across lists. Additionally, each list is phonetically balanced; that is, the sounds in the words occur in the same proportion as in everyday speech. Test sensitivity is enhanced by using a larger number of items, such as 50 instead of 25 words (Thornton and Raffin, 1978).

Sentence tests are also available. These tests are more like ''real speech'' and thus presumably able to provide a closer estimate of real-life communication performance. However, sentence tests incorporate additional factors besides simple audibility. With sentences, the listener may be relying on linguistic, prosodic, or contextual cues in addition to auditory information. To limit contextual cues, sentence lists are available that use neutral context (Kalikow, Stevens, and Elliot, 1977) or linguistically meaningless word combinations (Speaks and Jerger, 1965). Sentences also place greater demands on higherlevel processes such as auditory memory, which may be a particular problem in older listeners (Chmiel and Jerger, 1996).

For standard clinical testing, the participant is seated in a sound booth and listens to speech presented to one ear at a time through earphones. Speech may also be presented through a speaker, although this does not provide information specific to each ear. Such sound field testing can be used to quantify the effects of amplification. Recorded speech materials are preferred for consistency, although speech recognition tests are also administered using monitored live voice, during which the tester speaks to the participant over a microphone while monitoring her vocal strength. The entire list of words is presented at the same level. After each word, the participant responds by repeating or writing the word. The speech recognition score is then expressed as the percentage of correct words at the given presentation level in each ear.

Although most often presented in quiet, these materials may also be administered in noise. Many recordings include a background of multitalker babble that mimics a more realistic listening situation; this increases the degree to which test results characterize the listener's

Suprathreshold Speech Recognition 549

Performance Intensity Function
Figure 2. Representative performance-intensity functions, expressed as percent of words correct at each presentation level, for a listener with normal hearing and for three listeners with different types of hearing loss.

everyday communication abilities. For example, listeners with sensorineural loss require a more favorable signal-to-noise ratio than listeners with normal hearing or conductive hearing loss (Dubno, Dirks, and Morgan, 1984).

When administering and interpreting suprathreshold speech recognition tests it is important to consider not only the test environment but also the physical, linguistic, cognitive, and intellectual abilities of the listener. If the listener is unable to respond verbally or in writing, tests are available where the listener can choose among a set of picture responses (Ross and Lerman, 1970). Although most often used with children, these tests are also appropriate for adults with spoken word deficits, including dysarthria or apraxia. One limitation of such closed-set tests is that the chance of guessing correctly is higher when only a fixed number of choices is available. However, scoring accuracy may be higher than with open-set tests because there are fewer chances for misinterpretation of the response. For listeners who are not proficient in English, recorded materials are available in a number of other languages (provided, of course, that the tester has sufficient knowledge of the test language to interpret responses).

An important consideration is the presentation level. If multiple levels are tested, the percentage correct increases with increasing presentation level in a characteristic pattern (Fig. 2). This is referred to as the performance intensity (PI) function. The rate of improvement depends on the test material as well as patient characteristics. Easier material (e.g., sentences containing contextual cues) results in a greater rate of improvement with increases in level than more difficult material (e.g., nonsense words). The presentation level at which the listener achieves a highest score is referred to as the PB max, or maximum score for phonetically balanced words. A normal-hearing listener typically achieves 100% speech recognition at levels 30-40 dB above the SRT. Sensorineural hearing loss may restrict the PB max to below 100%. Listeners with conductive hearing loss generally achieve 100% recognition, although they require a higher presentation level than would a normal-

hearing listener. PI functions for listeners with retro-cochlear loss may demonstrate disproportionately low scores as well as a phenomenon called rollover, in which performance first improves with increasing presentation level, and then degrades as the presentation level continues to increase.

In the clinic, speech recognition testing is often done at only one or two levels in each ear to minimize test time. One common approach is to select one or more levels relative to the speech reception threshold. Selection of the specific presentation level is generally based on providing adequate speech audibility, particularly at frequencies containing important consonant information. An alternative approach is to present speech at the level the listener deems most comfortable. Because the listener's most comfortable level may not be the same level at which she obtains a maximum score, testing exclusively at the most comfortable level can lead to erroneous conclusions about auditory function (Ullrich and Grimm, 1976; Beattie and Warren, 1982).

In summary, measurement of suprathreshold speech recognition is an important part of an audiometric examination. Test results can be affected by a number of factors, including the participant's pure-tone sensitivity, the amount of distortion produced by the hearing loss, the presentation level of the speech, the type of speech material, the presence or absence of background noise, and even the participant's age. A detailed understanding of these factors is important when interpreting test results and drawing conclusions about an individual's overall communication ability.

—Pamela E. Souza References

Beattie, R. C., and Warren, V. G. (1982). Relationships among speech threshold, loudness discomfort, comfortable loud-ness, and PB max in the elderly hearing impaired. American Journal of Otolaryngology, 3, 353-358. Chmiel, R., and Jerger, J. (1996). Hearing aid use, central auditory disorder, and hearing handicap in elderly persons. Journal of the American Academy of Audiology, 7, 190202.

Dubno, J. R., Dirks, D. D., and Morgan, D. E. (1984). Effects of age and mild hearing loss on speech recognition in noise. Journal of the Acoustical Society of America, 76, 87-96. Hirsh, I. J., Davis, H., Silverman, S. R., Reynolds, E. G., Eldert, E., and Benson, R. W. (1952). Development of materials for speech audiometry. Journal of Speech and Hearing Disorders, 17, 321-337. Kalikow, D. N., Stevens, K. N., and Elliot, L. L. (1977). Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. Journal of the Acoustical Society of America, 61, 13371351.

Marshall, L., and Bacon, S. P. (1981). Prediction of speech discrimination scores from audiometric data. Journal of Speech and Hearing Research, 2, 148-155. Olsen, W. O. (1998). Average speech levels and spectra in various speaking/listening conditions: A summary of the Pearson, Bennett, and Fidell (1977) report. American Journal of Audiology, 7, 1-5.

Ross, M., and Lerman, J. (1970). Picture identification test for hearing-impaired children. Journal of Speech and Hearing Research, 13, 44-53.

Speaks, C., and Jerger, J. (1965). Performance-intensity characteristics of synthetic sentences. Journal of Speech and Hearing Research, 9, 305-312.

Thornton, A. R., and Raffin, M. J. M. (1978). Speech-discrimination scores modeled as a binomial variable. Journal of Speech and Hearing Research, 21, 507-518.

Tillman, T. W., and Carhart, R. (1966). An expanded test for speech discrimination utilizing CNC monosyllabic words: Northwestern University Auditory Test No. 6 (USAF School of Aerospace Medicine Technical Report). Brooks Air Force Base, TX.

Ullrich, K., and Grimm, D. (1976). Most comfortable listening level presentation versus maximum discrimination for word discrimination material. Audiology, 15, 338-347.

Further Readings

Dubno, J. R., and Dirks, D. D. (1993). Factors affecting performance on psychoacoustic and speech-recognition tasks in the presence of hearing loss. In G. A. Studebaker and I. Hochberg (Eds.), Acoustical factors affecting hearing aid performance (pp. 235-253). Needham Heights, MA: Allyn and Bacon.

Hall, J. W., and Mueller, H. G. (1997). Speech audiometry. In Audiologists' desk reference. Vol. I. Diagnostic audiology: Principles, procedures, and practices (pp. 113-174). San Diego, CA: Singular Publishing Group.

Kirk, K. I., Pisoni, D. B., and Miyamoto, R. C. (1997). Effects of stimulus variability on speech perception in listeners with hearing impairment. Journal of Speech, Language, and Hearing Research, 40, 1395-1405.

Olsen, W. O., and Matkin, N. D. (1991). Speech audiometry. In W. F. Rintelmann (Ed.), Hearing assessment (pp. 39140). Austin, TX: Pro-Ed.

Olsen, W. O., Van Tasell, D. J., and Speaks, C. E. (1997). Phoneme and word recognition for words in isolation and in sentences. Ear and Hearing, 18, 175-188.

Penrod, J. P. (1994). Speech threshold and recognition/ discrimination testing. In J. Katz (Ed.), Handbook of clinical audiology (pp. 147-164). Baltimore: Williams and Wilkins.

Stach, B. A., Davis-Thaxton, M., and Jerger, J. (1995). Improving the efficiency of speech audiometry: Computer-based approach. Journal of the American Academy of Audiology, 6, 330-333.

Thibodeau, L. M. (2000). Speech audiometry. In R. J. Roeser, M. Valente, and H. Hosford-Dunn (Eds.), Audiology diagnosis (pp. 281-310). New York: Thieme.

Was this article helpful?

0 0


Is there a cause or cure for autism? The Complete Guide To Finally Understanding Autism. Do you have an autistic child or know someone who has autism? Do you understand the special needs of an autistic person?

Get My Free Ebook


  • abdullah paterson
    What is suprathreshold speech discrimination?
    8 months ago

Post a comment