About Me
I’m a second-year PhD student in Tsinghua University, advised by Jianyong Wang. Meanwhile, I’m working as a research intern in Microsoft Research Asia, mentored by Li Dong. My research interest is the LLM backbone, long sequence’s modeling and inference, and Multimodal LLM.
Email: syt23@mails.tsinghua.edu.cn
Links: [GitHub] [Twitter] [Google Scholar]
Education
- Ph.D. student, Tsinghua University (2023/08 ~ )
- Undergrauate student, Tsinghua University (2018/08 ~ 2023/07)
- Computer Science and Technology (2020/08 ~ 2023/07)
- Mathematics and Physics (2018/08 ~ 2020/07)
- Taiyuan No.5 Middle School (2015/08 ~ 2018/07)
- Participated in Physics Olympics and achieved nothing
Selected Publications
Preprint
- Differential Transformer
Tianzhu Ye*, Li Dong*, Yuqing Xia*, Yutao Sun*, Yi Zhu, Gao Huang, Furu Wei.
arXiv:2307.08621, 2023.
[pdf][code] - Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun*, Li Dong*, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei.
arXiv:2307.08621, 2023.
[pdf][code] - Structured Prompting: Scaling In-Context Learning to 1,000 Examples
Yaru Hao*, Yutao Sun*, Li Dong, Zhixiong Han, Yuxian Gu, Furu Wei.
arXiv:2212.06713, 2022.
[pdf][code]
Conference
- You Only Cache Once: Decoder-Decoder Architectures for Language Models
Yutao Sun*, Li Dong*, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei.
Neural Information Processing Systems (NeurIPS), Oral, 2024.
[pdf][code] - A Length-Extrapolatable Transformer
Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, Furu Wei. Association for Computational Linguistics (ACL), Long paper, 2023.
[pdf][code] - Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, Furu Wei. Findings of Association for Computational Linguistics (Findings of ACL), Long paper, 2023.
[pdf]
Talks
- (08/2024) YOCO at Unify, SDU, and Danqi’s group in Princeton
- (07/2023) RetNet at BAAI, DAMO Academy, and HanLab in MIT
Honors & Awards
- (06/2023) Outstanding Graduate & Thesis, Tsinghua University
- (09/2022) Tang Jun-Yuan Scolarship, Tsinghua University
- (09/2020) Academic & Social Work Excellence Award, Tsinghua University
Teaching Experience
- Teaching Assistant in Object-Oriented Programming (2024 Spring)
- Teaching Assistant in Software Engineering (2022 Spring, 2022 Fall, 2023 Fall)