Speech

 With the growing use of voice data in areas like virtual assistants, healthcare, and education, building inclusive and privacy-conscious speech technologies has become more important than ever. Our research focuses on developing a range of tools and models that support diverse speech applications—such as speech anonymization, stutter detection, and emotion recognition—while prioritizing accessibility, fairness, and real-world usability.

To achieve this, we combine digital signal processing techniques with deep learning models, enabling precise control over how speech is analyzed, transformed, and interpreted. This hybrid approach allows us to design systems that protect speaker identity while preserving key speech characteristics like emotion, prosody, and unique vocal traits—especially for speakers affected by age, health conditions, or speech disfluencies such as stuttering. We also leverage perception-inspired loss functions to guide model training in ways that reflect how humans perceive voice quality and variation.

Through extensive evaluations, real-world testing, and user-centered design, we aim to create speech technologies that perform robustly across different languages, accents, and speaking styles. Our goal is to enable secure, ethical, and inclusive voice-based interactions that serve a wide range of users and applications.

 

Collaborations

We collaborate with leading universities, research institutes, and companies on various speech-related projects, including:

 

Selected Publications

For more information or speech anonymization demo access, contact:  Suhita Ghosh

Last Modification: 02.05.2025 -
Contact Person: Webmaster