About Me

I am a fourth-year undergraduate student at Zhejiang University, majoring in Computer Science & Technology and minoring in ACEE, Chu Kochen Honors College.

Recently, my research philosophy has undergone a significant evolution from Deconstruction to Construction. Previously, I focused on Interpretability and Trustworthy ML, driven by a desire to decipher the “physics” of neural networks and ensure their controllability.

Now, my research interests have shifted toward new model architectures (e.g., Sparse/Linear Attention, DeltaNet) and new learning paradigms (e.g., Continual Learning, Test-time Learning), grounded in two core beliefs:

  • Understanding through construction, not just deconstruction. As Richard Feynman famously said: “What I cannot create, I do not understand.”
  • The next paradigm won’t emerge from reverse-engineering Transformers alone. Relying solely on that path risks trapping our understanding in the local minima of existing frameworks.

While this represents a substantial shift in direction and I am still in an exploratory phase, I believe my previous research experience provides valuable priors for this new terrain. My long-term goal is to build models with true agency and efficiency.

Papers and Preprints

  • Layerwise Change of Knowledge in Neural Networks [PDF]
    Xu Cheng*, Lei Cheng*, Zhaoran Peng, Yang Xu, Tian Han, Quanshi Zhang.
    Proceedings of the 41st International Conference on Machine Learning (ICML), PMLR 235:8038-8059, 2024.

  • Towards the Dynamics of a DNN Learning Symbolic Interactions [PDF]
    Qihan Ren*, Junpeng Zhang*, Yang Xu, Yue Xin, Dongrui Liu, Quanshi Zhang.
    Neural Information Processing Systems (NeurIPS), 2024.

  • Tracking the Feature Dynamics in LLM Training: A Mechanistic Study [PDF]
    Yang Xu, Yi Wang, Hengguan Huang, Hao Wang. arXiv preprint arXiv:2412.17626, 2024.

  • Simulating and Understanding Deceptive Behaviors in Long-Horizon Interactions [PDF] Yang Xu*, Xuanming Zhang*, Samuel Yeh, Jwala Dhamala, Ousmane Dia, Rahul Gupta, Sharon Li. arXiv preprint arXiv:2510.03999, 2024. (NeurIPS 2025 Workshop @ResponsibleFM Accepted, ICLR 2026 Under Review).