Past, Present and Future of Speech Technologies

Tuesday, May 28, 2024

Speech Technologies started with a dream of machines that could speak and understand humans. From the inception of VODER at the NYC Worlds Fair in 1939 to the current proliferation of speech technologies throughout our everyday lives, the impact has been profound. Technologies such as voice search, voice to voice translation, automatic captions of videos, meetings, summarization of calls, audio mining, speaker verification, etc are pervasive in our daily lives.
How did this start, how has it evolved, and what is the future? In this talk, I’ll give my personal view of where we are and where we are going, describe a little bit about my career and contributions in this journey, and conclude with some thoughts about how the current Large Language Models revolution will affect speech science, and how speech technologies can affect society in positive and negative ways.

In this talk, I’ll also muse about the role of foundational research vs applied research, the interplay between the two based on my experience, and also the role of senior leads in preparing the next generation of scientists, whether this happens in academic or industry environments.

 

Post Talk Link:  Click Here

Passcode: @sQn4h#m

Speaker/s

Pedro J. Moreno started his career in speech science with a MS in Telecommunications Engineering from Universidad Politécnica de Madrid. He then interned for 2 years at Bell Laboratories working in ASR and speech to speech translation. After that he was awarded a Fulbright Scholarship to continue his Ph.D. Studies at Carnegie Mellon University. Pedro started his professional career at HP Labs and then joined Google Research in 2004. Pedro has most recently led the ASR R&D Team at Google (100 researchers and engineers). In his more than 20 years of experience in the field of speech science, he has led projects in: Speech to Speech real-time translation Noise Robustness in ASR Multimedia search engines (SpeechBot) Multimedia Machine Learning (Audio, Music, etc) Internationalization of ASR, leading the large expansion of languages of Google Voice search to more than 100 languages Development of contextual modeling in ASR Development of multilingual and universal ASR systems Development of speech technologies for impaired speakers Research into the use of speech input for LLMs GenAI detection for trust and safety His research focus has always been the application of foundational ideas to products to solve user needs.

Related