Hands-On Session

The hands-on session will include a detailed walk through of training ASR for dysarthric speech recognition. The goal is to provide attendees clear steps in using ASR toolkits for speech recognition in health, taking dysarthria as a case study.



  • ESPnet toolkit introduction & installation
  • ESPnet data preparation steps
  • TORGO recipe
  • How to configure & train ASR
  • How to do decoding (inference)


  • Concepts and scenarios to use transfer learning
  • What you should prepare
  • Where to get the source model
  • How to fine-tune (transfer)
  • How to do decoding (inference)


  • Concepts of data augmentation
  • Speed perturbation
  • SpecAugment
  • How to apply these in ESPnet


  • TORGO - word and sentence utterances from control speakers and speakers with Cerebral Palsy (CP) or Amyotrophic lateral sclerosis (ALS)

  • UASpeech - isolated words from control speakers and speakers with Cerebral Palsy (CP)