Speech Perception
Part A
Work in teams that will allow you to compare the performance of two different automatic speech systems built into the “assistants” that are available on mobile phones (for example, Siri or Google). Your task will be to test their performance under a variety of different circumstances. Have each phone assistant recognize the following sentences under each of the circumstances listed below and track their accuracy.
What factors contribute to good ASR performance? Under what conditions is the ASR most likely to “break”?
Sentences:
1. Where is the nearest Chinese restaurant?
2. Where can I buy some insoles for plantar fasciitis?
3. Is it better to braise or to roast beef?
4. I’d like to find an expert in computational ichthyology who can untangle kite lines.
Circumstances:
A. A native speaker of English in a quiet room, speaking slowly.
B. A speaker of English with a heavy non-standard accent in a quiet room, speaking slowly.
C. A native speaker of English in a quiet room, speaking quickly.
D. A speaker of English with a heavy non-standard accent in a quiet room, speaking quickly.
E. A native speaker of English in an environment with heavy white noise (near a fan or a busy road)
F. A native speaker of English in a coffee shop with conversations occurring at other tables.
G. A native speaker of English in a coffee shop with other people speaking simultaneously at the same table as the speaker.
H. A speaker of English with a heavy non-standard accent in a coffee shop with other people speaking simultaneously at the same table as the speaker.
Part B
If you are interested to know more about the insides of ASR systems, you can watch the following 2017 lecture by Preethi Jyothi. (It is approximately an hour and a half long.)
https://www.microsoft.com/en-us/research/video/automatic-speech-recognition-overview/