Curriculum Vitae - Quan Wang

Current Occupation	Contact
Senior Staff Software Engineer & Tech Lead Manager	Email: quanrpi@gmail.com
Google DeepMind	Web: wangquan.me
New York, NY	LinkedIn: www.linkedin.com/in/wangquan

News

Enroll in my online course on Speaker Recognition (English) and Speaker Diarization (English) on Udemy.
Enroll in my Speaker Recognition course (Chinese) on JiQiZhiXin.
My award-winning book “Voice Identity Techniques: From core algorithms to engineering practice” (Chinese) can be purchased here.

Current Role

Dr. Quan Wang is currently working on Gemini Audio in the GenAI department of Google DeepMind.

Prior to GenAI, Dr. Quan Wang led the Hotword Modeling team and the Speaker, Voice & Language team at Google DeepMind. The teams delivered a diverse set of server-side and on-device speech models to Google’s product ecosystem, including “Hey Google” spoken keyword spotting, voice match, language recognition, spoofed speech detection, speech enhancement, speaker diarization, and multilingual speech recognition. The server-side models power numerous speech features in Google Search, YouTube, Google Cloud, and Google Assistant, used by billions. The on-device models are deployed on billions of Android phones, tablets, Chromebooks, cars, and wearables across the globe.

Media Coverage

Book on Voice Identity Techniques: [博文视点] [语音杂谈] [机器之心] [载思考] [声纹圈]
On-device multilingual speech recognition for Pixel 8: [The Verge] [TechCrunch] [Android Authority] [9to5Google]
Speaker label for Recorder app:
- Official: [Google AI Blog] [Google AI Official tweet]
- English: [Android Authority] [9to5Google] [Research Snipers] [Real Mi Central] [Chrome Unboxed] [XDA] [engadget] [TechCrunch]
- Italian: [TuttoAndroid]
- Greek: [SecNews]
- Chinese: [机器之心]
Quick Phrases: [9to5Google] [Droid Life] [The Verge] [Voicebot.ai]
On-device language identification for Live Caption/Translate: [Google Pixel Blog]
Google Cloud Speaker ID
- Official: [Google Cloud Blog] [Homepage] [Google Cloud official tweet] [YouTube]
- English: [SiliconANGLE] [Techzine] [TheRegister] [BiometricUpdate]
- Dutch: [TechZine]
- Chinese: [TensorFlow公众号] [HiNet] [iThome]
VoiceFilter-Lite:
- Official: [Google AI Blog] [Google AI official tweet]
- English: [TheNextWeb] [apk9to5] [Somag News] [Silicon Canals] [AndroidFist] [eyerys] [voicebot.ai] [kouragoal.com] [Medium]
- Chinese: [机器之心] [搜狐] [新浪] [声纹圈] [Google News App] [新经网] [iThome] [HiNet]
- Spanish: [Nuevo Periodico] [nobbot]
- Turkish: [teknotalk] [webtekno] [TechInside] [Trendoa.com]
- Japanese: [webbigdata.jp]
- Russian: [Neurohive]
- Arabic: [aitnews]
Diarization and UIS-RNN:
- Official: [Google AI Blog] [Google AI official tweet]
- English: [VentureBeat] [SiliconAngle] [InfoQ] [OpenSourceForU] [Futurism]
- Chinese:
  - Full article: [量子位] [cnBeta] [EEPW电子产品世界] [Sina新浪科技] [iThome] [osChina开源中国] [机器之心快讯] [网易科技] [ChinaEmail中国邮箱网] [智东西快讯] [报价宝] [HiNet] [IT经理网] [贤集网]
  - Included in: [机器之心AI每日精选] [人工智能半月刊] [智东西早报]
- Russian: [dev.by]
- Italian: [tom’s HARDWARE]
- Vietnamese: [GENK]
- Japanese: [WebBigData]
VoiceFilter:
- English: [VentureBeat]
- Chinese: [机器之心] [新智元] [搜狐] [简书Tech blog]
- Russian: [Tproger]
Joint ASR frontend: [声纹圈]
Token-level SCD loss: [声纹圈]
Synth2Aug: [声纹圈]
SpeakerStew: [声纹圈] [语音杂谈]
Multispeaker Text-to-Speech: [机器之心] [Two Minute Papers]
Translatotron2:
- Official: [Google AI Blog]
- English: [VentureBeat] [voicebot.ai] [slator] [Analytics India Magazine] [MarkTechPost]
- Chinese: [TensorFlow公众号] [AI前线]
Translatotron:
- Official: [Google AI Blog] [Google AI official tweet]
- English: [VentureBeat] [TechCrunch] [CNET] [Android Central] [Engadget] [Gadgets] [Android Police] [slator]
- Chinese: [量子位] [cnBeta] [机器之心] [新智元]
Bolo Android App:
- Official: [Official Site] [Google Play] [Sundar’s tweet] [Google AI Education]
- English: [Techcrunch] [Venturebeat] [IndiaTimes] [NDTV] [CNN] [9to5Google]
- Chinese: [新浪科技] [博客园] [智能手机网] [科技新报]
Multi-language on Google Home: [Launch blog] [Google AI Blog]
Multi-user voice match on Google Home:
- Text-dependent: [Launch blog]
- Text-independent: [Blog on more smart speakers] [Enrollment UI launch blog] [engadget]
ASVspoof: [Google Blog] [9to5Google] [IBC365] [Digital Information World]
ICASSP 2018 speaker recognition papers: [机器之心]
MLFont:
- Official: [GoogleFonts official tweet]
- Chinese: [GooFan] [LanDianNews] [cnBeta]
AGSM: [HighBeam Research] [Issues in Computer Engineering: 2013 Edition]
Semantic Context Forests: [HighBeam Research]
COSBOS: [Technology Org] [PRWeb]

Education

2010/08 – 2014/10, Ph.D., Rensselaer Polytechnic Institute, NY, USA
- Signal Analysis and Machine Perception Laboratory (SAMPL)
- Department of Electrical, Computer, and Systems Engineering (ECSE)
- Advisor: Prof. Kim L. Boyer
- Thesis: Exploiting Geometric and Spatial Constraints for Vision and Lighting Applications
- GPA: 4.0/4.0
2006/08 – 2010/08, B.Eng. in Automation, Tsinghua University, Beijing, China
- Department of Automation, Class of Fundamental Sciences
- Advisor: Prof. Qionghai Dai
- Thesis: Implementation and Study of Light-Field-Based 3D Object Retrieval System
- Major GPA: 91.3/100

Work Experience

2015/11 – Current, Senior Staff Software Engineer and Tech Lead Manager, Google DeepMind, New York City, NY, USA
- Manager: Bhuvana Ramabhadran
- “OK Google” voice search & actions
- Speaker identification and speaker diarization
- Language recognition and diarization
- VoiceFilter source separation
- Learning-based font loading (MLFont)
2014/11 – 2015/10, Machine Learning Scientist, Amazon, Cambridge, MA, USA
- Manager: Dr. Shiv Vitaladevuni
- Amazon Firefly: Optical character recognition
- Amazon Echo: Speech recognition
2013/05 – 2013/08, Research Intern, IBM Almaden Research Center, San Jose, CA, USA
- Manager: Dr. Tanveer Syeda-Mahmood
- Automated segmentation and heart disease detection from echocardiogram images
- The Medical Sieve project (in Java)
2012/05 – 2012/08, Research Intern, Siemens Corporate Research, Princeton, NJ, USA
- Manager: Dr. Dijia Wu and Dr. Shaohua Kevin Zhou
- Learning-based automatic knee cartilage segmentation in 3D MR images (in C++)
2009/06 – 2009/07, Intern Programmer, Northking Technology Corporation, Beijing, China
- The development of the Business Operation System of Northking Technology Corporation (with JSF framework)

Awards

SLT 2024 Best Paper Finalist (top 2% paper, 9/373 submitted, 9/170 accepted)
- For the paper “GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting”.
2024 AI 2000 Most Influential Scholar Honorable Mention
- Awarded by AMiner.org in recognition of outstanding and vibrant contributions in the field of Speech Recognition between 2014-2023.
ASRU 2023 Best Paper Finalist (top 3% paper, 12/435)
- For the paper “Improved long-form speech recognition by jointly modeling the primary and non-primary speakers”.
Annual Best Content Contribution Award, 2022
- Awarded by The Circle of Voiceprint
Top 100 case studies of the year, 2021
Distinguished Author of Year 2020
- Awarded by Publishing House of Electronics Industry (PHEI).
The Allen B. Dumont Prize, 2015
- This prize is awarded to a graduate student who has demonstrated high scholastic ability and has made a substantial contribution to that field.

Invited Talks & Tutorials

Tutorial at ASRU 2025
- “Scalable, personalized, and empathetic speech systems for everyone, everywhere” [Slides]
Keynote talk at Speech and Audio in the Northeast (SANE) 2024
- “Speaker diarization at Google: From modularized systems to LLMs” [YouTube] [Slides]
Invited talk at MIT CSAIL, 2024
- “Advances in Speaker Diarization at Google”
Tutorial at Odyssey 2022
- “Speaker Diarization: A Journey from Unsupervised to Supervised Approaches”
Summit of Top 100 Global Software Case Studies, 2021
- “Building the Product Ecosystem for Voice and Language Recognition” (声纹与语种识别的产品生态构建)

Publications

[Google Scholar]

Books

Quan Wang, “Voice Identity Techniques: From core algorithms to engineering practice” (Chinese), Publishing House of Electronics Industry (PHEI), September 2020. [GitHub] [JD] [TMall] [DangDang]

Journal Publications

Quan Wang, Ignacio Lopez Moreno, “Version Control of Speaker Recognition Systems”, Journal of Systems and Software, Volume 216, 2024. [link] [PDF] [software]
Quan Wang, Kim L. Boyer, “The active geometric shape model: A new robust deformable shape model and its applications”, Computer Vision and Image Understanding, Volume 116, Issue 12, December 2012, Pages 1178-1194, ISSN 1077-3142, doi:10.1016/j.cviu.2012.08.004. [link] [PDF] [slides] [software]
Quan Wang, Xinchi Zhang, Kim L. Boyer, “Occupancy distribution estimation for smart light delivery with perturbation-modulated light sensing”, Journal of Solid State Lighting 2014 1:17, ISSN 2196-1107, doi:10.1186/s40539-014-0017-2. [link] [PDF] [software]
Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling, “ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech”, Computer Speech & Language, Volume 64, doi:10.1016/j.csl.2020.101114. [link] [PDF]

Conference Publications

Dhruuv Agarwal, Harry Zhang, Yang Yu, Quan Wang, “State-of-the-Art Dysarthric Speech Recognition with MetaICL for on-the-fly Personalization”, submitted. [PDF]
Pai Zhu, Quan Wang, Dhruuv Agarwal, Kurt Partridge, “LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting”, Interspeech, 2025. [PDF]
Harry Zhang, Kurt Partridge, Pai Zhu, Neng Chen, Hyun Jin Park, Dhruuv Agarwal, Quan Wang, “GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples”, Interspeech, 2025. [PDF]
Pai Zhu, Jacob W. Bartel, Dhruuv Agarwal, Kurt Partridge, Hyun Jin Park, Quan Wang, “GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting”, IEEE Spoken Language Technology Workshop (SLT), 2024. [PDF] [Best Paper Finalist]
Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang, “Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting”, SynData4GenAI Workshop, 2024. [PDF]
Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang, “Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model”, SynData4GenAI Workshop, 2024. [PDF]
Pai Zhu, Dhruuv Agarwal, Jacob W. Bartel, Kurt Partridge, Hyun Jin Park, Quan Wang, “Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments”, SynData4GenAI Workshop, 2024. [PDF]
Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao, “DiarizationLM: Speaker Diarization Post-Processing with Large Language Models”, Interspeech, 2024. [PDF] [model] [demo]
Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, Quan Wang, “On the Success and Limitations of Auxiliary Network Based Word-Level End-to-End Neural Speaker Diarization”, Interspeech, 2024.
Beltrán Labrador, Pai Zhu, Guanlong Zhao, Angelo Scorza Scarpati, Quan Wang, Alicia Lozano-Diez, Alex Park, Ignacio López Moreno, “Personalizing Keyword Spotting with Speaker Information”, 2023. [PDF]
Guanlong Zhao, Yongqiang Wang, Jason Pelecanos, Yu Zhang, Hank Liao, Yiling Huang, Han Lu, Quan Wang, “USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models”, 2023. [PDF]
Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, Quan Wang, “Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network”, 2023. [PDF]
Guru Prakash Arumugam, Shuo-yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, Quan Wang, Shaan Bijwadia, “Improved long-form speech recognition by jointly modeling the primary and non-primary speakers”, IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023. [PDF] [Best Paper Finalist]
Tom O’Malley, Shaojin Ding, Arun Narayanan, Quan Wang, Rajeev Rikhye, Qiao Liang, Yanzhang He, Ian McGraw, “Conditional Conformer: Improving Speaker Modulation for Single and Multi-User Speech Enhancement”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
Beltrán Labrador, Guanlong Zhao, Ignacio López Moreno, Angelo Scorza Scarpati, Liam Fowl, Quan Wang, “Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. [PDF]
Guanlong Zhao, Quan Wang, Han Lu, Yiling Huang, Ignacio Lopez Moreno, “Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. [PDF]
Quan Wang, Yiling Huang, Han Lu, Guanlong Zhao, Ignacio Lopez Moreno, “Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering”, arXiv:2210.13690 [eess.AS], 2022. [PDF]
Tom O’Malley, Arun Narayanan, Quan Wang, “A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation”, Interspeech, 2022. [PDF]
Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O’Malley, Ian McGraw, “Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition”, Interspeech, 2022. [PDF]
Jason Pelecanos, Quan Wang, Yiling Huang, Ignacio Lopez Moreno, “Parameter-Free Attentive Scoring for Speaker Verification”, Odyssey: The Speaker and Language Recognition Workshop, 2022. [PDF]
Quan Wang, Yang Yu, Jason Pelecanos, Yiling Huang, Ignacio Lopez Moreno, “Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech”, Odyssey: The Speaker and Language Recognition Workshop, 2022. [PDF] [model] [demo]
Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ian McGraw, “Closing the Gap between Single-User and Multi-User VoiceFilter-Lite”, Odyssey: The Speaker and Language Recognition Workshop, 2022. [PDF]
Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen, “CVSS Corpus and Massively Multilingual Speech-to-Speech Translation”, Conference on Language Resources and Evaluation (LREC), 2022. [PDF] [data] [Google AI Blog]
Wei Xia, Han Lu, Quan Wang, Anshuman Tripathi, Ignacio Lopez Moreno, Hasim Sak, “Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022. [PDF] [code]
Arun Narayanan, Chung-Cheng Chiu, Tom O’Malley, Quan Wang, Yanzhang He, “Cross-Attention Conformer for Context Modeling in Speech Enhancement for ASR”, IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021. [PDF]
Tom O’Malley, Arun Narayanan, Quan Wang, Alex Park, James Walker, Nathan Howard, “A Conformer-Based ASR Frontend For Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation”, IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021. [PDF]
Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ian McGraw, “Multi-user VoiceFilter-Lite via Attentive Speaker Embedding”, IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021. [PDF]
Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ding Zhao, Yiteng (Arden) Huang, Arun Narayanan, Ian McGraw, “Personalized Keyphrase Detection using Speaker and Environment Information”, Interspeech, 2021. [PDF]
Roza Chojnacka, Jason Pelecanos, Quan Wang, Ignacio Lopez Moreno, “SpeakerStew: Scaling to Many Languages with a Triaged Multilingual Text-Dependent and Text-Independent Speaker Verification System”, Interspeech, 2021. [PDF] [model] [demo]
Jason Pelecanos, Quan Wang, Ignacio Lopez Moreno, “Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition”, Interspeech, 2021. [PDF]
Yiling Huang, Yutian Chen, Jason Pelecanos, Quan Wang, “Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech”, IEEE Spoken Language Technology Workshop (SLT), 2021. [PDF]
Shaojin Ding, Ye Jia, Ke Hu, Quan Wang, “Textual Echo Cancellation”, IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021. [PDF]
Quan Wang, Ignacio Lopez Moreno, Mert Saglam, Kevin Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein, “VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition”, Interspeech, 2020. [PDF] [website] [Google AI Blog]
Shaojin Ding, Quan Wang, Shuo-yiin Chang, Li Wan, Ignacio Lopez Moreno, “Personal VAD: Speaker-Conditioned Voice Activity Detection”, Odyssey: The Speaker and Language Recognition Workshop, 2020. [PDF]
Li Wan, Prashant Sridhar, Yang Yu, Quan Wang, Ignacio Lopez Moreno, “Tuplemax Loss for Language Identification”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. [PDF] [poster]
Quan Wang, Hannah Muckenhirn, Kevin Wilson, Prashant Sridhar, Zelin Wu, John Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio Lopez Moreno, “VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking”, Interspeech, 2019. (ORAL) [PDF] [samples]
Aonan Zhang, Quan Wang, Zhenyao Zhu, John Paisley, Chong Wang, “Fully Supervised Speaker Diarization”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. [PDF] [code] [poster] [Google AI Blog]
Yutian Chen, Yannis Assael, Brendan Shillingford, David Budden, Scott Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask, Ben Laurie, Caglar Gulcehre, Aäron van den Oord, Oriol Vinyals, Nando de Freitas, “Sample Efficient Adaptive Text-to-Speech”, International Conference on Learning Representations (ICLR 2019). [PDF] [samples] [poster] [Google AI Blog]
Ye Jia, Yu Zhang, Ron J. Weiss, Quan Wang, Jonathan Shen, Fei Ren, Zhifeng Chen, Patrick Nguyen, Ruoming Pang, Ignacio Lopez Moreno, Yonghui Wu, “Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”, Advances in neural information processing systems (NeurIPS 2018). [PDF] [samples] [poster] [Google AI Blog]
Quan Wang, Carlton Downey, Li Wan, Philip Andrew Mansfield, Ignacio Lopez Moreno, “Speaker Diarization with LSTM”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018. [PDF] [poster] [code] [wiki]
Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno, “Generalized End-to-End Loss for Speaker Verification”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018. (ORAL) [PDF] [slides] [wiki]
F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan, “Attention-Based Models for Text-Dependent Speaker Verification”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018. [PDF] [poster]
Alejandro Luebs, Bastiaan Kleijn, Felicia Lim, Florian Stimberg, Jan Skoglund, Quan Wang, Thomas Walters, “Wavenet Based Low-Rate Speech Coding”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018. [PDF] [poster]
Quan Wang, Xinchi Zhang, Kim L. Boyer, “3D Scene Estimation with Perturbation-Modulated Light and Distributed Sensors”, 10th IEEE Workshop on Perception Beyond the Visible Spectrum (PBVS). (ORAL) [PDF]
Quan Wang, Yan Ou, A. Agung Julius, Kim L. Boyer and Min Jun Kim, “Tracking Tetrahymena Pyriformis Cells using Decision Trees”, 21st International Conference on Pattern Recognition (ICPR), Pages 1843-1847, 11-15 Nov. 2012. [PDF] [shotgun] [poster] [software]
Quan Wang, Dijia Wu, Le Lu, Meizhu Liu, Kim L. Boyer, and Shaohua Kevin Zhou, “Semantic Context Forests for Learning-Based Knee Cartilage Segmentation in 3D MR Images”, MICCAI 2013: Workshop on Medical Computer Vision. (ORAL) [PDF] [poster] [slides] [software]
Quan Wang, Xin Shen, Meng Wang, Kim L. Boyer, “Label Consistent Fisher Vectors for Supervised Feature Aggregation”, 22nd International Conference on Pattern Recognition (ICPR), 2014. [PDF] [poster] [software] [demo]
Quan Wang, Xinchi Zhang, Meng Wang, Kim L. Boyer, “Learning Room Occupancy Patterns from Sparsely Recovered Light Transport Models”, 22nd International Conference on Pattern Recognition (ICPR), 2014. (ORAL) [PDF]
Quan Wang, Kim L. Boyer, “Feature Learning by Multidimensional Scaling and its Applications in Object Recognition”, 26th SIBGRAPI Conference on Graphics, Patterns and Images (Sibgrapi). IEEE, 2013. (ORAL) [PDF] [slides] [software]
Tanveer Syeda-Mahmood, Quan Wang, Patrick McNeillie, David Beymer, Colin Compas, “Discriminating Normal and Abnormal Left Ventricular Shapes in Four-Chamber View 2D Echocardiography”, International Symposium on Biomedical Imaging (ISBI), 2014.
Quan Wang, Yu Wang, Zuoguan Wang, “Online Smart Face Morphing Engine with Prior Constraints and Local Geometry Preservation”, International Workshop on Multimodal pattern recognition of social signals in human computer interaction (MPRSS 2014). (ORAL) [PDF]
Xinchi Zhang, Quan Wang, Kim L. Boyer, “Illumination Adaptation with Rapid-Response Color Sensors”, SPIE Optical Engineering + Applications, 2014. (ORAL) [PDF]

Technical Reports and Theses

Gemini Team, “Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context”, 2024. [PDF]
Jin Shi, Quan Wang, Yeming Fang, Gang Feng, Zhengying Chen, Jason Pelecanos, Ignacio Lopez Moreno, Andrea Chu, Pedro Moreno Mengibar, “Utterance Augmentation for Speaker Recognition”, Technical Disclosure Commons, Defensive Publications Series, 2020. [link] [PDF]
Quan Wang, Yiran Mao , “Learning Better Font Slicing Strategies from Data”, Technical Disclosure Commons, Defensive Publications Series, 2017. [link] [PDF] [wiki]
Philip Andrew Mansfield, Quan Wang, Carlton Downey, Li Wan, Ignacio Lopez Moreno, “Links: A High-Dimensional Online Clustering Method”, arXiv:1801.10123 [stat.ML], 2018. [PDF]
Quan Wang, “GMM-Based Hidden Markov Random Field for Color Image and 3D Volume Segmentation”, arXiv:1212.4527 [cs.CV], 2012. [PDF]
Quan Wang, “HMRF-EM-image: Implementation of the Hidden Markov Random Field Model and its Expectation-Maximization Algorithm”, arXiv:1207.3510 [cs.CV], 2012. [PDF]
Quan Wang, “Kernel Principal Component Analysis and its Applications in Face Recognition and Active Shape Models”, arXiv:1207.3538 [cs.CV], 2012. [PDF]
Quan Wang, “Exploiting Geometric and Spatial Constraints for Vision and Lighting Applications”, Rensselaer Polytechnic Institute Ph.D. dissertation, 2014.
Quan Wang, “Implementation and Study of Light-Field-Based 3D Object Retrieval System”, Tsinghua University Undergraduate Thesis, 2010. [PDF] [poster] [demo]

Acknowledged by

Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang Wei Han Ankur Bapna Michiel Bacchiani, “Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations”, [cs.SD]. [demo] [PDF]
Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz, “Translatotron 2: Robust direct speech-to-speech translation”, arXiv preprint arXiv:2107.08661 [cs.CL]. [PDF]
Soumi Maiti, Hakan Erdogan, Kevin Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey, “End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings”, arXiv:2105.02096 [cs.SD]. [PDF]
ASVspoof 2019: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan, 2019. [PDF]
Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui Wu, “Direct speech-to-speech translation with a sequence-to-sequence model”, arXiv:1904.06037 [cs.CL]. [PDF]
Aonan Zhang, “Composing Deep Learning and Bayesian Nonparametric Methods”, Ph.D. Dissertation, 2019. [PDF]
Yu Wang, “A broadly applicable three-dimensional neuron reconstruction framework based on deformable models and software system with parallel GPU implementation”. Ph.D. Dissertation, 2011.

Patents

Quan Wang, Dijia Wu, Meizhu Liu, Le Lu, Kevin Shaohua Zhou, Automatic spatial context based multi-object segmentation in 3D images [PDF]
Quan Wang, David Beymer, Patrick McNeillie, Tanveer Syeda-Mahmood, Discriminating between normal and abnormal left ventricles in echocardiography [PDF]
Quan Wang, Xinchi Zhang, Kim L. Boyer, Occupancy sensing smart lighting systems [PDF]
Quan Wang, Thibaud Senechal, Daniel Makoto Willenson, Shuang Wu, Yue Liu, Shiv Naga Prasad Vitaladevuni, David Paul Ramos, Qingfeng Yu, Text detection using features associated with neighboring glyph pairs [PDF]
Quan Wang, Ignacio Lopez Moreno, Li Wan, Improving speaker verification across locations, languages, and/or dialects [PDF]
Quan Wang, Hasim Sak, Ignacio Lopez Moreno, Alan Sean Papir, Li Wan, Neural Networks for Speaker Verification [PDF]
Quan Wang, Yash Sheth, Ignacio Lopez Moreno, Li Wan, Speaker diarization using an end-to-end model [PDF]
Quan Wang, Ye Jia, Zhifeng Chen, Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Ignacio Lopez Moreno, Fei Ren, Yu Zhang, Patrick An Phu Nguyen, Synthesis of speech from text in a voice of a target speaker using neural networks [PDF]
Quan Wang, Pu-sen Chao, Diego Melendo Casado, Ignacio Lopez Moreno, Text independent speaker recognition [PDF]
Quan Wang, Prashant Sridhar, Ignacio Lopez Moreno, Hannah Muckenhirn, TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING [PDF]
Quan Wang, Chong Wang, Aonan Zhang, Zhenyao Zhu, Fully Supervised Speaker Diarization [PDF]

Academic Service

Reviewing

Journals:

Conferences:

NeurIPS, 2025
Interspeech, 2025
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
IEEE Spoken Language Technology Workshop (SLT), 2021, 2022
International Joint Conference on Artificial Intelligence (IJCAI) 2019
VISAPP International Conference on Computer Vision Theory and Applications 2014
SIBGRAPI Conference on Graphics, Patterns, and Images 2013-2014

Other

Senior member - IEEE
Organizer - SANE 2025
Session chair - ASRU 2025: Automatic Speech Recognition
Session chair - Interspeech 2021: Language and Accent Recognition
Interviewee - Interspeech 2020 Tutorial: Neural Models for Speaker Diarization in the Context of Speech Recognition

Teaching and Mentoring

Online Courses

Udemy (English): Speaker Recognition - By Award Winning Textbook Author
Udemy (English): A Tutorial on Speaker Diarization
机器之心 (Chinese): 声纹识别：从理论到编程实战

Students

Yu-Neng Chuang, 2025 Googler summer intern, Ph.D.
Wei Xia, 2021 Google summer intern & 2021 Google Student Researcher, Ph.D.
Shaojin Ding, 2019 & 2020 Google summer intern, Ph.D.
Aonan Zhang, 2018 Google summer intern & 2018 Google Student Researcher, Ph.D.
Hannah Muckenhirn, 2018 Google summer intern, Ph.D.
F A Rezaur Rahman Chowdhury, 2017 Google summer intern, Ph.D. (cohost)
Carlton Downey, 2017 Google summer intern, Ph.D.
Xinchi Zhang, 2013-2014 undergraduate student at Smart Lighting Engineering Research Center

Teaching Assistant

2011/01 – 2012/12, Teaching Assistant, Rensselaer Polytechnic Institute, Troy NY, USA
- Spring 2011, Embedded Control [ENGR 2350], by Prof. Russell P. Kraft
- Spring 2011, Real-Time Applications in Control & Communications [ECSE 4760], by Prof. Russell P. Kraft
- Fall 2011, Introduction to Engineering Analysis [ENGR 1100], by Prof. Mark W. Olles
- Spring 2012, Electric Circuits [ECSE 2010], by Prof. Jeffrey Braunstein
- Spring 2012, Biological Image Analysis [ECSE 4960], by Dr. Jens Rittscher and Dr. Dirk Padfield
- Fall 2012, Modeling and Analysis of Uncertainty [ENGR 2600], by Prof. Charles J. Malmborg