Speech Anonymization
With the growing use of voice data in areas like virtual assistants and remote healthcare, speech anonymization has become essential for protecting speaker privacy. The goal is to mask personal identity in spoken audio while still keeping the message clear and useful for downstream tasks like transcription or emotion detection.
Our research explores new techniques for anonymizing speech that go beyond traditional methods. These advanced approaches aim not only to hide identity but also to preserve important elements of speech such as emotion, prosody (rhythm and tone), and unique vocal traits, especially in voices affected by age or medical conditions. To achieve this, we combine digital signal processing techniques with deep learning models , allowing precise control over audio transformations while adapting to the complexity of real-world speech. We also use perception-inspired loss functions that guide training based on how humans naturally perceive differences in voice, helping improve quality, clarity, and emotionally expressiveness.
Through extensive evaluation and user studies, we show that our methods significantly improve both privacy protection and speech quality across a wide range of voices, languages, and use cases—without compromising intelligibility. This work opens the door to more secure and ethical applications of voice technology in healthcare, accessibility, and everyday digital interactions.
Selected Publications
- Anonymizing Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example
Suhita Ghosh, Melanie Jouaiti, Arnab Das, Yamini Sinha, Tim Polzehl, Ingo Siegert and Sebastian Stober.
In: Interspeech 2024 - Kos, Greece, September 1-5, 2024 -
Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion
Suhita Ghosh, Arnab Das, Yamini Sinha, Ingo Siegert, Tim Polzehl and Sebastian Stober.
In: Interspeech 2023 - Dublin, Ireland, August 20-24, 2023 - Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses
Suhita Ghosh, Tim Thiele, Frederick Lorbeer, Frank Dreyer and Sebastian Stober.
In: NeurIPS Workshop 2024 (Audio Imagination: Workshop on AI-Driven Speech, Music, and Sound Generation) - Vancouver, Canada, December 10-15, 2024