Zirui's Homepage
About me
I’m Zirui “Ray” Liu, an incoming assistant professor from the Department of Computer Science at UMN. Previously I graduated from Computer Science at Rice University, co-advised by Dr. Xia “Ben” Hu and Prof. Vladimir Braverman.
My research is broadly focused on Efficient ML/MLSys, LLM, GraphML. Recently I am mostly interested in Large Language Models and their applications. Below are some topics I am working on or exploring:
-
Efficent LLM: Making LLMs more accessible through hardware-aware approaches, like optmizing compression, architecture, implementation, training/deployment strategies. I strongly believe that a good system design requires a deep understanding of the workload, along with a thorough knowledge of the capabilities and limitations of current LLMs.
-
Fundamental development of LLM: like improving long-context/reasoning/retrieval/memory ability; designing experiments to understand LLMs.
-
Extending Transformers beyond Text: exploring the interplay between Transformers and various domains, like Graphs, Proteins, Gene, and Healthcare.
Feel free to reach out if you would like to collaborate on MLSys, LLM, or on-device ML research.
📧📧 Recruiting 2025 Spring/Fall Ph.D. students and interns: I am looking for self-motivated students with strong background in HPC, NLP, and ML. Feel free to drop me a line to ziruiliu dot recruit at gmail dot com together with resume and transcripts if you are interested.
News
-
KVCache Compression Benchmark accepted at EMNLP24. If you want to know the research landspace of this area, take a look at the paper and code.
-
Introduced a rare disease question-answering (ReDis-QA) dataset to assess the chatbot ability of diagnosing rare diseases.
-
🔥🔥 Our KIVI largely inspires the KV Cache quantization system in Huggingface. Code is available here; And our Self-Extend is highlighted during Google I/O session. Code is available here.
-
Our KIVI, Self-Extend, and Compress-then-prompt are accepted by ICML 2024. Self-Extend has been selected as Spotlight (3.5%) at ICML2024!
-
Our Memory-Efficient LLM fine-tuning work is covered by Rice CS New.
-
Three paper accepted to Neurips 2023.
-
Two paper accepted to TMLR.
-
Twos papers accepted to ICML 2023.
-
One paper accepted to MLSys 2023.
-
Two papers accepted to Neurips 2022, DreamShard and GNN Benchmark (Benchmark track).
Publications
Please refer to publications or Google Scholar.