About Me

I am a third-year undergraduate student at Zhejiang University, majoring in Computer Science and Technology. I am interested in deep learning, particularly in interpreting models and developing theoretical foundations for them. My long-term goal is to understand the mechanisms of neural networks, transforming deep learning from a “black box” approach into a rigorous science. I am also interested in topics such as alignment and trustworthiness in AI systems, with the aim of ensuring that AI technologies are beneficial to society.

Currently, I am focusing on mechanistic interpretability in large language models (LLMs), a rapidly growing area of research that seeks to reverse-engineer the internal operations of neural networks to uncover how they function. Unlike learning theory, which emphasizes formal and mathematical frameworks, mechanistic interpretability takes an approach closer to the “physics” of LLMs. I believe that both formal theoretical approaches and mechanistic discoveries—viewed through a physical lens—are essential for achieving a comprehensive understanding of neural networks.

I am currently a research intern at the University of Wisconsin–Madison, where I have the great privilege of working with Prof. Sharon Yixuan Li, whose insightful guidance has been deeply inspiring. I am also very fortunate to be mentored by Prof. Hao Wang, who continues to provide me with invaluable support throughout my research journey. Prior to this, I had the great pleasure of working with Prof. Quanshi Zhang, who first introduced me to the fascinating world of deep learning and interpretability, and offered me many opportunities to explore this field. In addition, I am deeply grateful for the support and encouragement from many seniors and peers, who have helped me grow both academically and personally.

News

  • [2025. 3] Start my research internship at UW–Madison, under the guidance of Prof. Sharon Yixuan Li.
  • [2024. 9] One Paper Accepted by NeurIPS 2024.
  • [2024. 7] Start my visit at Rutgers University.
  • [2024. 5] One Paper Accepted by ICML 2024.
  • [2023. 5] Start my remote internship at Shanghai Jiao Tong University.

Selected Papers

  • Layerwise Change of Knowledge in Neural Networks [PDF]
    Xu Cheng*, Lei Cheng*, Zhaoran Peng, Yang Xu, Tian Han, Quanshi Zhang.
    Proceedings of the 41st International Conference on Machine Learning (ICML), PMLR 235:8038-8059, 2024.

  • Towards the Dynamics of a DNN Learning Symbolic Interactions [PDF]
    Qihan Ren*, Junpeng Zhang*, Yang Xu, Yue Xin, Dongrui Liu, Quanshi Zhang.
    Neural Information Processing Systems (NeurIPS), 2024.
    • Originally second author for theoretical contributions; authorship adjusted after merging experimental paper’s first author.
  • Tracking the Feature Dynamics in LLM Training: A Mechanistic Study [PDF]
    Yang Xu, Yi Wang, Hengguan Huang, Hao Wang.
    arXiv preprint arXiv:2412.17626, 2024.