Hello, I am Rithesh Kumar, a Senior Research Scientist on the Speech AI team at Adobe Research. I lead speech generation research focused on controllable text-to-speech synthesis, automatic dubbing, and speech editing. My work centers on scaling diffusion models and developing efficient distillation algorithms for multilingual audio generation.

Previously, I was Technical Lead for Audio Research at Descript Inc., where I built and shipped 4+ text-to-speech models powering the flagship Overdub and Regenerate features—enabling ultra-realistic voice cloning and text-based audio corrections.

As a firm believer in the bitter lesson, my research goal is to develop scalable multi-modal models that bridge the reasoning and general intelligence of LLMs with diffusion models that can synthesize natural signals like audio, video, and images at the highest quality.

Currrently, I live in Toronto, Ontario 🇨🇦.

Research to Product

Firefly/Text-to-Avatar
I developed distilled audio diffusion models powering zero-shot voice generation for avatars, enabling controllable text-to-speech across 25+ dialects at broadcast-quality 48 KHz resolution
Deployed in production (beta) in 2025
Firefly/Translate Video and Audio
I developed audio diffusion models for voice translation, preserving speaker identity across 25+ languages and dialects with controllable accent synthesis.
Deployed in production (GA) early 2025
Descript/Regenerate
I developed the Audio Codec, Audio Language Models and voice cloning technology that power Regenerate, enabling users to turn awkward audio cuts into smooth, natural-sounding edits.
Deployed in production in 2023
Descript/AI Voices
I developed and shipped 4+ research models behind the flagship Overdub feature (now AI Voices), enabling voice cloning and speech editing at full 44.1 KHz resolution.
First deployed in production in 2018
Education

I completed my MSc in Computer Science (specializing in Artificial Intelligence) at the Mila lab in Université de Montréal supervised by Yoshua Bengio. During my MSc, I had the excellent opportunity to intern at Lyrebird and Microsoft Research - Montréal.

Earlier, I graduated from SSN College of Engineering (affiliated to Anna University) with a Bachelors in Computer Science and Engineering. I spent the final 2 years of my undergrad learning about deep learning, spending a summer at the Serre Lab in Brown University and collaborating with Prof. Yoshua Bengio at the Mila lab.

Select Publications

DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li, Rithesh Kumar, Zeyu Jin
Poster Presentation - ICML 2025
High-Fidelity Audio Compression with Improved RVQGAN
Rithesh Kumar*, Prem Seetharaman*, Alejandro Luebs, Ishaan Kumar, Kundan Kumar
Poster Presentation (Spotlight) - NeurIPS 2023
VampNet: Music Generation via Masked Acoustic Token Modeling
Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar, Bryan Pardio
Poster Presentation - ISMIR 2023
Chunked Autoregressive GAN for Conditional Waveform Synthesis
Max Morrison, Rithesh Kumar, Kundan Kumar, Prem Seetharaman, Aaron Courville, Yoshua Bengio
Poster Presentation - ICLR 2022
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
Kundan Kumar*, Rithesh Kumar*, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brébisson, Yoshua Bengio, Aaron Courville
Poster Presentation - NeurIPS 2019
ObamaNet: Photo-realistic lip-sync from text
Rithesh Kumar, Jose Sotelo, Kundan Kumar, Alexandre de Brébisson, Yoshua Bengio
Oral Presentation - Machine Learning for Creativity and Design Workshop (NeurIPS 2017)
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Aaron Courville, Yoshua Bengio
Poster Presentation - ICLR 2017
Select Projects

Reproducing Neural Discrete Representation Learning
Rithesh Kumar, Tristan Deleu, Evan Racah
Final project - Representation Learning
Reproducing Handwriting Synthesis and Prediction
Rithesh Kumar
Open source project
Reproducing What You Get Is What You See: Visual Markup Decompiler
Rithesh Kumar, Rithesh Rohan, U. Sivashanmugam Undergraduate Thesis