Ph.D. Candidate, Electrical and Computer Engineering
University of California, Santa Cruz
Advisor: Jason K. Eshraghian
Previously, I worked on spiking neural networks, contributing to
snnTorch,
SpikingJelly, and building
SpikeGPT.
My research has since shifted to scalable and efficient sequence modeling architectures,
and how to scale them.
I received my Bachelor's degree from the University of Electronic Science and Technology of China (2023).
Find me on GitHub, Google Scholar, and X (Twitter).
Email: ridger@ucsc.edu
I am interested in building scalable and efficient sequence modeling architectures as an alternative to standard Transformers. On the architecture side, I have joined the development of linear attention and recurrent models that achieve Transformer-level quality at a fraction of the cost:
What I care about most is touching scaling with my own hands. My personal scaling trajectory covers three orders of magnitude in compute:
For each of these runs, I watched every checkpoint from the very first to the last, witnessing a model go from random to intelligence. That is what I am really enjoying. The journey is the reward.
Please refer to publications for the full list.
This website is adapted from Tianyu Gao's design, which is in turn adapted from Gregory Gunderson.