About Me

I am a fourth-year undergraduate student at Zhejiang University, majoring in Computer Science & Technology and minoring in ACEE at the Chu Kochen Honors College. I am currently an intern at MiniMax, where I work on LLM pretraining and model architecture design.

My current research interests focus on designing novel model architectures, particularly efficient attention mechanisms, and on scaling large language models. Previously, I worked on interpretability and trustworthy machine learning, motivated by a desire to uncover the underlying “physics” of neural networks and to ensure their controllability and reliability. My research direction has shifted from DECONSTRUCTION to CONSTRUCTION, with the belief that “What I cannot create, I do not understand”. The next paradigm in artificial intelligence will not emerge from merely refining existing frameworks, but from building fundamentally new ones.

My long-term goal is to develop efficient, scalable models with genuine agency.

News

[2026. 2] Started my internship at MiniMax, working on the Pretraining Team.
[2026. 1] One Paper Accepted by ICLR 2026.
[2025. 7] Started a summer research program on Triton-based kernel optimization and efficient attention mechanisms, under the mentorship of Jingyang Yuan.
[2025. 3] Start my research internship at UW–Madison, under the guidance of Prof. Sharon Yixuan Li.
[2024. 9] One Paper Accepted by NeurIPS 2024.
[2024. 7] Start my visit at Rutgers University.
[2024. 5] One Paper Accepted by ICML 2024.
[2023. 5] Start my remote internship at Shanghai Jiao Tong University.

Yang Xu[徐旸]

News