Yihan Du 杜伊涵

Postdoctoral Researcher

ECE, UIUC

About me

I am a postdoc at UIUC, advised by Prof. R. Srikant (IEEE Fellow). My research interests focus on machine learning, including reinforcement learning (RL), online learning (in particular, bandits) and multi-task learning. Recently, I am interested in the application of RL and bandits in LLMs (e.g., RLHF and DPO), and diffusion models for decision making.

Previously, I obtained my Ph.D. degree from IIIS, Tsinghua University (headed by Prof. Andrew Chi-Chih Yao) in June 2023, advised by Prof. Longbo Huang. I visited Cornell University in person in Fall 2022, working with Prof. Wen Sun, and was a research intern at MSR Asia during January-May 2020, mentored by Dr. Wei Chen (ACM/IEEE Fellow, Director of MSR Asia Theory Center). In my research works, I collaborate with industry including Nvidia and Microsoft.

I will join the ESD pillar at Singapore University of Technology and Design (SUTD) as a tenure-track assistant professor in August 2025. I am actively looking for Ph.D. students with full scholarship (2025 Fall or 2026 Spring), research interns, and visiting scholars. If you are interested in working with me, feel free to email me with your CV, and several sentences describing your background and available time period.

I am a hands-on mentor. I co-mentored two undergraduate students with my Ph.D. advisor before, and both projects were published on top conferences NeurIPS and ICLR (the student is the first author).

Email: yihandu@illinois.edu; duyihan1996@gmail.com.

Download my CV here (last update: September 2024).

Interests

RL, online learning (in particular, bandits)
Application of RL and bandits in LLMs, e.g., RLHF and DPO
Diffusion models for decision making
Multi-task/Meta/Representation learning

Education

Ph.D. in Computer Science, September 2018 - June 2023

Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University
B.E. in Computer Science, September 2014 - June 2018

Xiamen University

Preprints

Publications

Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant, “Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization,” International Conference on Machine Learning (ICML), 2024. [pdf] [arXiv]

Yihan Du, R. Srikant, Wei Chen, “Cascading Reinforcement Learning,” International Conference on Learning Representations (ICLR), 2024 (spotlight, top 5%). [pdf] [arXiv]

Yu Chen#, Yihan Du, Pihe Hu, Siwei Wang, Desheng Wu, Longbo Huang, “Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation and Human Feedback,” International Conference on Learning Representations (ICLR), 2024 (#graduate student mentored with my Ph.D. advisor). [pdf] [arXiv]

Nuoya Xiong#, Yihan Du, Longbo Huang, “Provably Safe Reinforcement Learning with Step-wise Violation Constraints,” Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2023 (#undergraduate student mentored with my Ph.D. advisor). [pdf] [arXiv]

Yihan Du, Longbo Huang, Wen Sun, “Multi-task Representation Learning for Pure Exploration in Linear Bandits,” International Conference on Machine Learning (ICML), 2023. [pdf] [arXiv]

Yihan Du, Siwei Wang, Longbo Huang, “Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR and Worst Path,” International Conference on Learning Representations (ICLR), 2023. [pdf] [arXiv]

Yihan Du, Wei Chen, Yuko Kuroki, Longbo Huang, “Collaborative Pure Exploration in Kernel Bandit,” International Conference on Learning Representations (ICLR), 2023. [pdf] [arXiv]

Yihan Du, Wei Chen, “Branching Reinforcement Learning,” International Conference on Machine Learning (ICML), 2022. [pdf] [arXiv]

Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang, “Continuous Mean-Covariance Bandits,” Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2021. [pdf] [arXiv]

Yihan Du, Yuko Kuroki, Wei Chen, “Combinatorial Pure Exploration with Bottleneck Reward Function,” Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2021. [pdf] [arXiv]

Yihan Du, Siwei Wang, Longbo Huang, “A One-Size-Fits-All Solution to Conservative Bandit Problems,” Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021. [pdf] [arXiv]

Yihan Du*, Yuko Kuroki*, Wei Chen, “Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback,” Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021 (*equal contribution). [pdf] [arXiv]

[*alphabetical order] Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao, “Combinatorial Pure Exploration for Dueling Bandit,” International Conference on Machine Learning (ICML), 2020. [pdf] [arXiv]

Yihan Du, Siwei Wang, Longbo Huang, “Dueling Bandits: From Two-dueling to Multi-dueling,” Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2020. [pdf] [arXiv]

Selected Awards

China Computer Federation (CCF) Agent and Multi-Agent System Doctoral Dissertation Award, by CCF Multi-Agent System Committee, June 2024 (the only recipient nationwide)

Tsinghua Outstanding Doctoral Dissertation Award, by Tsinghua University, June 2023 (the only recipient among CS graduates at IIIS, Tsinghua University)

Beijing Outstanding Graduate, by Beijing Municipal Education Commission, June 2023 (the only recipient among CS graduates at IIIS, Tsinghua University)

China National Scholarship for Ph.D. Students, by Ministry of Education of China, October 2022 (the only recipient among CS students at IIIS, Tsinghua University)

Toyota Scholarship, by Toyota and Tsinghua University, October 2021

Outstanding Graduate, by Xiamen University, June 2018

Invited Talks

“Why is RLHF Data-Efficient in Policy Optimization”

UC Riverside CS, February 2024
SUTD ESD, January 2024
NTU CCDS, January 2024
Colorado School of Mines, January 2024
NJIT CS, December 2024
China Computer Federation (CCF) Agent and Multi-Agent System Seminar, June 2024

“Risk-aware Online Decision Making”

TrustML Young Scientist Seminar, RIKEN AIP, May 2023
MLOPT Idea Seminar, UW-Madison, April 2023

“Combinatorial Pure Exploration for Dueling Bandit,” CCF Doctoral Forum in Theoretical Computer Science, June 2021

Academic Service & Activities

Reviewer
Conference: ICML, NeurIPS, ICLR, AAAI, AISTATS, UAI, RLC

Journal: Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Journal of Machine Learning Research (JMLR), Transactions on Networking (ToN), Transactions on Machine Learning Research (TMLR), Transactions on Network Science and Engineering (TNSE)

Technical Program Committee (TPC) Member
INFOCOM, WiOpt

Teaching Assistant
Stochastic Network Optimization (taught in English), graduate course at IIIS, Tsinghua University, Spring 2021
Introduction to Computer Science (taught in English), undergraduate course for Yao Class, Tsinghua University, Fall 2019

Contact

yihandu@illinois.edu; duyihan1996@gmail.com
Coordinated Science Laboratory, 1308 W Main Street, Urbana, IL 61801, United States