About Me
I’m a PhD candidate in Tsinghua University, advised by Jianyong Wang. Meanwhile, I am a Pre-Career Scholar in Shanghai Innovation Institute. I have been working as a research intern in Microsoft Research Asia since Jul 2023, mentored by Li Dong.
My research interests span two main directions. The first focuses on the architecture and pre-training of LLM, including attention mechanism design, and scalable inference and generation for long-context modeling. The second centers on multimodal world models, with an emphasis on unified multimodal architectures and autoregressive video generation, as well as their applications in embodied intelligence and real-time video interaction.
Email: syt23@mails.tsinghua.edu.cn, sunyutao@sii.edu.cn Links: [GitHub] [Google Scholar]
Selected Publications
Preprint
- Universal YOCO for Efficient Depth Scaling
Yutao Sun*, Li Dong*, Tianzhu Ye, Shaohan Huang, Jianyong Wang, Furu Wei.
arXiv:2604.01220, 2026.
[pdf] - Geometric Autoencoder for Diffusion Models
Hangyu Liu, Jianyong Wang, Yutao Sun.
arXiv:2603.10365, 2026.
[pdf][code] - Efficient attention mechanisms for large language models: A survey
Yutao Sun*, Zhenyu Li*, Yike Zhang*, Tengyu Pan*, Bowen Dong*, Yuyi Guo, Jianyong Wang. arXiv:2507.19595, 2025. - Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun*, Li Dong*, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei.
arXiv:2307.08621, 2023.
[pdf][code]
Conference
- Multimodal Latent Language Modeling with Next-Token Diffusion
Yutao Sun*, Hangbo Bao*, Wenhui Wang*, Zhiliang Peng*, Li Dong*, Shaohan Huang, Jianyong Wang, Furu Wei.
International Conference of Machine Learning (ICML), Spotlight, 2026.
[pdf][code] - VibeVoice: Expressive Podcast Generation with Next-Token Diffusion
Zhiliang Peng*, Jianwei Yu*, Wenhui Wang*, Yaoyao Chang*, Yutao Sun*, Li Dong*, Yi Zhu, Weijiang Xu, Hangbo Bao, Zehua Wang, Shaohan Huang, Yan Xia, Furu Wei.
International Conference on Learning Representations (ICLR), Oral, 2026.
[pdf][code] - Differential Transformer
Tianzhu Ye*, Li Dong*, Yuqing Xia*, Yutao Sun*, Yi Zhu, Gao Huang, Furu Wei.
International Conference on Learning Representations (ICLR), Oral, 2025.
[pdf][code] - You Only Cache Once: Decoder-Decoder Architectures for Language Models
Yutao Sun*, Li Dong*, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei.
Neural Information Processing Systems (NeurIPS), Oral, 2024.
[pdf][code]
Education
- Ph.D., Tsinghua University (2023/08 ~ )
- Undergrauate, Tsinghua University (2018/08 ~ 2023/07)
- Computer Science and Technology (2020/08 ~ 2023/07)
- Mathematics and Physics (2018/08 ~ 2020/07)
- Taiyuan No.5 Middle School (2015/08 ~ 2018/07)
- Participated in Physics Olympics and achieved nothing
Honors & Awards
- (11/2025) 84 Sholarship, Tsinghua University
- (09/2025) BYD Sholarship, Tsinghua University
- (06/2023) Outstanding Graduate & Thesis, Tsinghua University
- (09/2022) Tang Jun-Yuan Scolarship, Tsinghua University
- (09/2020) Academic & Social Work Excellence Award, Tsinghua University
Teaching Experience
- Teaching Assistant in Data Mining (2025 Fall, 2026 Fall)
- Teaching Assistant in Object-Oriented Programming (2024 Spring)
- Teaching Assistant in Software Engineering (2022 Spring, 2022 Fall, 2023 Fall)