Rithesh Kumar

Hello, I am Rithesh Kumar, a Senior Research Scientist on the Speech AI team at Adobe Research. I lead speech generation research focused on controllable text-to-speech synthesis, automatic dubbing, and speech editing. My work centers on scaling diffusion models and developing efficient distillation algorithms for multilingual audio generation.

Previously, I was Technical Lead for Audio Research at Descript Inc., where I built and shipped 4+ text-to-speech models powering the flagship Overdub and Regenerate features—enabling ultra-realistic voice cloning and text-based audio corrections.

As a firm believer in the bitter lesson, my research goal is to develop scalable multi-modal models that bridge the reasoning and general intelligence of LLMs with diffusion models that can synthesize natural signals like audio, video, and images at the highest quality.

Currrently, I live in Toronto, Ontario 🇨🇦.

Research to Product

Firefly/Text-to-Avatar

I developed distilled audio diffusion models powering zero-shot voice generation for avatars, enabling controllable text-to-speech across 25+ dialects at broadcast-quality 48 KHz resolution

Deployed in production (beta) in 2025

Firefly/Translate Video and Audio

I developed audio diffusion models for voice translation, preserving speaker identity across 25+ languages and dialects with controllable accent synthesis.

Deployed in production (GA) early 2025

Descript/Regenerate

I developed the Audio Codec, Audio Language Models and voice cloning technology that power Regenerate, enabling users to turn awkward audio cuts into smooth, natural-sounding edits.

Deployed in production in 2023

Descript/AI Voices

I developed and shipped 4+ research models behind the flagship Overdub feature (now AI Voices), enabling voice cloning and speech editing at full 44.1 KHz resolution.

First deployed in production in 2018

Education

I completed my MSc in Computer Science (specializing in Artificial Intelligence) at the Mila lab in Université de Montréal supervised by Yoshua Bengio. During my MSc, I had the excellent opportunity to intern at Lyrebird and Microsoft Research - Montréal.

Earlier, I graduated from SSN College of Engineering (affiliated to Anna University) with a Bachelors in Computer Science and Engineering. I spent the final 2 years of my undergrad learning about deep learning, spending a summer at the Serre Lab in Brown University and collaborating with Prof. Yoshua Bengio at the Mila lab.

Select Publications

DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis

Yingahao Aaron Li, Rithesh Kumar, Zeyu Jin

Poster Presentation - ICML 2025

High-Fidelity Audio Compression with Improved RVQGAN

Rithesh Kumar*, Prem Seetharaman*, Alejandro Luebs, Ishaan Kumar, Kundan Kumar

Poster Presentation (Spotlight) - NeurIPS 2023

VampNet: Music Generation via Masked Acoustic Token Modeling

Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar, Bryan Pardio

Poster Presentation - ISMIR 2023

Chunked Autoregressive GAN for Conditional Waveform Synthesis

Max Morrison, Rithesh Kumar, Kundan Kumar, Prem Seetharaman, Aaron Courville, Yoshua Bengio

Poster Presentation - ICLR 2022

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Kundan Kumar*, Rithesh Kumar*, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brébisson, Yoshua Bengio, Aaron Courville

Poster Presentation - NeurIPS 2019