About Me
I am a third-year undergraduate student at Zhejiang University, majoring in Computer Science and Technology. I am interested in deep learning, particularly in interpreting models and developing theoretical foundations for them. My long-term goal is to understand the mechanisms of neural networks, transforming deep learning from a “black box” approach into a rigorous science. I am also interested in topics such as alignment and trustworthiness in AI systems, with the aim of ensuring that AI technologies are beneficial to society.
Currently, I am focusing on mechanistic interpretability in large language models (LLMs), a rapidly growing area of research that seeks to reverse-engineer the internal operations of neural networks to uncover how they function. Unlike learning theory, which emphasizes formal and mathematical frameworks, mechanistic interpretability takes an approach closer to the “physics” of LLMs. I believe that both formal theoretical approaches and mechanistic discoveries—viewed through a physical lens—are essential for achieving a comprehensive understanding of neural networks.
I am very fortunate to be mentored by Prof. Hao Wang, who provides me with invaluable guidance and support. I am also grateful to Prof. Quanshi Zhang, who mentors me in the fascinating world of deep learning and interpretability, offering many opportunities to explore this field. I am also very grateful for the support and encouragement from many seniors and peers, who help me grow and improve continuously.
News
- [2024. 9] One Paper Accepted by NeurIPS 2024.
- [2024. 7] Start my visit at Rutgers University.
- [2024. 5] One Paper Accepted by ICML 2024.
- [2023. 5] Start my remote internship at Shanghai Jiao Tong University.
Selected Papers
Layerwise Change of Knowledge in Neural Networks
Xu Cheng*, Lei Cheng*, Zhaoran Peng, Yang Xu, Tian Han, Quanshi Zhang.
Proceedings of the 41st International Conference on Machine Learning (ICML), PMLR 235:8038-8059, 2024.- Towards the Dynamics of a DNN Learning Symbolic Interactions
Qihan Ren*, Junpeng Zhang*, Yang Xu, Yue Xin, Dongrui Liu, Quanshi Zhang.
Neural Information Processing Systems (NeurIPS), 2024.- Originally second author for theoretical contributions; authorship adjusted after merging experimental paper’s first author.
- Tracking the Feature Dynamics in LLM Training: A Mechanistic Study
Yang Xu, Yi Wang, Hao Wang.
arXiv preprint arXiv:2412.17626, 2024. [PDF]