Speech Technology for Health: From Technical Foundations to Applications

Lectures

Syllabus:

Overview

Source-filter model of speech production
Prosody, source, and spectrum based acoustic features
Speaker states and traits

Foundational Knowledge

Low-level descriptors
Multi-leveled aggregation and modeling
Toward end-to-end representation

Health Applications

Speech paralinguistics for assessment and diagnosis
Emerging topics

Speaker Bio: Chi-Chun Lee (Jeremy) is an Associate Professor at the Department of Electrical Engineering of the National Tsing Hua University (NTHU), Taiwan. He received his B.S. degree and Ph.D. degree in Electrical Engineering from the University of Southern California (USC), USA in 2007 and 2012. He is an IEEE senior member. His research interests are in speech and language, affective computing, health analytics, and behavior signal processing. He is an associate editor for the IEEE Transaction on Affective Computing (2020-), the IEEE Transaction on Multimedia (2019-2020), the Journal of Computer Speech and Language (2021-). He is a TPC member for APSIPA IVM and MLDA committee. He serves as an area chair for INTERSPEECH 2016, 2018, 2019, senior program committee for ACII 2017, 2019, publicity chair for ACM ICMI 2018, late breaking result chair for ACM ICMI 2023, sponsorship and special session chair for ISCSLP 2018, 2020, and a guest editor in Journal of Computer Speech and Language on special issue of Speech and Language Processing for Behavioral and Mental Health. He led a team to the 1st place in Emotion Challenge in INTERSPEECH 2009 and won the 1st place in Styrian Dialect and Baby Sound challenge in INTERSPEECH 2019. He is a co-author on the best paper award/finalist in major conferences such as INTERSPEECH, IEEE EMBC, and APSIPA ASC and the most cited paper published in 2013 in Journal of Speech Communication. He is the recipient of the Foundation of Outstanding Scholar's Young Innovator Award (2020), the CIEE Outstanding Young Electrical Engineer Award (2020), the IICM K. T. Li Young Researcher Award (2020).

Syllabus:

Overview

What is speech enhancement (SE)
Deep learning (DL) based SE
SE for assistive hearing technologies

Foundational Knowledge

Traditional SE approaches
DL-based SE approaches
SE for assistive hearing technologies

Health Applications

Goal-driven SE for assistive hearing technologies
Multimodal assistive hearing technologies

Speaker Bio: Yu Tsao (Senior Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from National Taiwan University, Taipei, Taiwan, in 1999 and 2001, respectively, and the Ph.D. degree in electrical and computer engineering from the Georgia Institute of Technology, Atlanta, GA, USA, in 2008. From 2009 to 2011, he was a Researcher with the National Institute of Information and Communications Technology, Tokyo, Japan, where he engaged in research and product development in automatic speech recognition for multilingual speech-to-speech translation. He is currently a Research Fellow (Professor) and Deputy Director with the Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan. He is also a Jointly Appointed Professor with the Department of Electrical Engineering at Chung Yuan Christian University, Taoyuan City, Taiwan. His research interests include assistive oral communication technologies, audio coding, and bio-signal processing. He is currently an Associate Editor for the IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING and IEEE SIGNAL PROCESSING LETTERS. He received the Academia Sinica Career Development Award in 2017, National Innovation Awards in 2018-2021, Future Tech Breakthrough Award 2019, and Outstanding Elite Award, Chung Hwa Rotary Educational Foundation 2019–2020. He is a coauthor of a paper that received the 2021 IEEE Signal Processing Society (SPS) Young Author Best Paper Award.

Syllabus:

Overview

What are text to speech synthesis (TTS) and voice conversion (VC)
The analysis-mapping-synthesis pipeline
Technological advancements of TTS and VC

Foundational Knowledge

Acoustic modeling of TTS
Acoustic transformation of VC
Vocoding

Health Applications

Electrolaryngeal speech voice conversion
Other health-related TTS and VC applications

Speakers Bio: Hsin-Min Wang received the B.S. and Ph.D. degrees in electrical engineering from National Taiwan University, Taipei, Taiwan, in 1989 and 1995, respectively. In October 1995, he joined the Institute of Information Science, Academia Sinica, Taipei, Taiwan, where he is currently a Research Fellow. He also holds a joint appointment as a Professor in the Department of Computer Science and Information Engineering at National Cheng Kung University. He was an Associate Editor of IEEE/ACM Transactions on Audio, Speech and Language Processing from 2016 to 2020. He currently serves as an Editorial Board Member of APSIPA Transactions on Signal and Information Processing. His major research interests include spoken language processing, natural language processing, multimedia information retrieval, machine learning and pattern recognition. He was a General Co-Chair of ISCSLP2016 and ISCSLP2018, a Technical Co-Chair of ISCSLP2010, O-COCOSDA2011, APSIPAASC2013, ISMIR2014, and ASRU2019, and an area chair of INTERSPEECH2019 and INTERSPEECH2020. He received the Chinese Institute of Engineers Technical Paper Award in 1995 and the ACM Multimedia Grand Challenge First Prize in 2012. He was an APSIPA distinguished lecturer for 2014–2015. He is a member of IEEE, ISCA, and ACM.

Yi-Chiao Wu received the B.S. and M.S. degrees in engineering from the School of Communication Engineering, National Chiao Tung University, Hsinchu, Taiwan, in 2009 and 2011, respectively, and the Ph.D. degree from the Graduate School of Informatics, Nagoya University, Nagoya, Japan, in 2021. He worked at Realtek, ASUS, Academia Sinica, and Nagoya University for six years. He is currently working as a Postdoc Researcher at the Institute of Information Science, Academia Sinica, Taipei, Taiwan. His research focuses on speech generation applications based on machine learning methods, such as voice conversion and speech enhancement.

Syllabus:

Overview

What is speech recognition
Challenges for ASR
ASR in health application

Foundational Knowledge

Overview of an ASR system
Different acoustic modeling techniques
Different language modeling techniques

Health Applications

Low resource scenario
Speaker specific vs independent model

Speaker Bio: Prasanta Kumar Ghosh received his Ph.D. in Electrical Engineering from University of Southern California (USC), Los Angeles, USA in 2011. Prior to that he obtained his M.Sc.(Engineering) in Electrical Communication Engineering from Indian Institute of Science (IISc), Bangalore and B.E.(ETCE) in Electronics from Jadavpur University, Kolkata in 2006 and 2003 respectively. He has been a Research Intern at Microsoft Research India, Bangalore in the areas of audio-visual speaker verification from March to July in 2006. During 2011-2012 he was with IBM India Research Lab (IRL) as a researcher. Currently, he is an associate professor in the department of Electrical Engineering (EE) at IISc.
Prasanta Kumar Ghosh was awarded the INSPIRE faculty fellowship from Department of Science and Technology (DST), Govt. of India in 2012. He received the best M.Sc. (Engg.) thesis award for the year 2006-07 in the Electrical Sciences division at IISc. He was awarded Center of Excellence in Teaching's award for excellence in teaching in the category of EE for the year 2010-11 in USC. He has also received Prof. Priti Shankar Teaching Award for Assistant Professor in 2017 from Indian Institute of Science (IISc), Bangalore. His research interests include human centered signal processing, engineering model and technology development with applications to education and health care.