Speech Technology for Health: From Technical Foundations to Applications

Hands-On Session

The hands-on session will include a detailed walk through of training ASR for dysarthric speech recognition. The goal is to provide attendees clear steps in using ASR toolkits for speech recognition in health, taking dysarthria as a case study.

ESPnet toolkit introduction & installation
ESPnet data preparation steps
TORGO recipe
How to configure & train ASR
How to do decoding (inference)

Concepts and scenarios to use transfer learning
What you should prepare
Where to get the source model
How to fine-tune (transfer)
How to do decoding (inference)

Concepts of data augmentation
Speed perturbation
SpecAugment
How to apply these in ESPnet

TORGO - word and sentence utterances from control speakers and speakers with Cerebral Palsy (CP) or Amyotrophic lateral sclerosis (ALS)
UASpeech - isolated words from control speakers and speakers with Cerebral Palsy (CP)