Speech
With the growing use of voice data in areas like virtual assistants, healthcare, and education, building inclusive and privacy-conscious speech technologies has become more important than ever. Our research focuses on developing a range of tools and models that support diverse speech applications—such as speech anonymization, stutter detection, and emotion recognition—while prioritizing accessibility, fairness, and real-world usability.
To achieve this, we combine digital signal processing techniques with deep learning models, enabling precise control over how speech is analyzed, transformed, and interpreted. This hybrid approach allows us to design systems that protect speaker identity while preserving key speech characteristics like emotion, prosody, and unique vocal traits—especially for speakers affected by age, health conditions, or speech disfluencies such as stuttering. We also leverage perception-inspired loss functions to guide model training in ways that reflect how humans perceive voice quality and variation.
Through extensive evaluations, real-world testing, and user-centered design, we aim to create speech technologies that perform robustly across different languages, accents, and speaking styles. Our goal is to enable secure, ethical, and inclusive voice-based interactions that serve a wide range of users and applications.
Collaborations
We collaborate with leading universities, research institutes, and companies on various speech-related projects, including:
-
Speech Anonymization:
Projects: Emonymous, AnonymPrevent, Medinym
Partners: TU Berlin, DFKI Berlin, Charité – Universitätsmedizin Berlin, OVGU Junior Professorship Mobile Dialogsysteme
-
Stutter Detection/Correction:
Projects: Anti-Stotter
Partners: Otto-Friedrich-Universität Bamberg, PsyAiHance GmbH, Berlin
Selected Publications
- Anonymizing Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example
Suhita Ghosh, Melanie Jouaiti, Arnab Das, Yamini Sinha, Tim Polzehl, Ingo Siegert and Sebastian Stober.
In: Interspeech 2024 - Kos, Greece, September 1-5, 2024 -
Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion
Suhita Ghosh, Arnab Das, Yamini Sinha, Ingo Siegert, Tim Polzehl and Sebastian Stober.
In: Interspeech 2023 - Dublin, Ireland, August 20-24, 2023 - Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses
Suhita Ghosh, Tim Thiele, Frederick Lorbeer, Frank Dreyer and Sebastian Stober.
In: NeurIPS Workshop 2024 (Audio Imagination: Workshop on AI-Driven Speech, Music, and Sound Generation) - Vancouver, Canada, December 10-15, 2024