Zirui's Homepage

About me

I’m Zirui “Ray” Liu, an assistant professor from the Department of Computer Science at UMN. Previously I graduated from Computer Science at Rice University, where I worked with Prof. Xia Hu and Prof. Vladimir Braverman.

I am mostly interested in Large Language Models and their applications, focusing on enabling them to combine and process information from diverse sources and domains. For that reason I deeply care about efficiency, reasoning, long-context ability, and understanding their inner working mechanism. I also enjoy extending foundation models to other domains, exploring the interplay between different source of data.

📧📧 Recruiting: I am always looking for PhD students and research interns with strong coding skills. Feel free to drop me a line to ziruiliu dot recruit at gmail dot com together with resume, transcripts, and a short description of why you’d like to work with me.

News

Four paper accepted at EMNLP 2025 (1 Oral, 3 Findings). Three paper accepted at Neurips 2025 (1 Oral, 2 Poster). Kudo to my students and collaborators.
2025/7. We are organizing the VISION workshop at ICCV about industrial inspection.
Gave a talk (Paper, Record, Slide) at ASAP seminar about the impact of numerical precision to LLM reasoning evaluation.
Received NSF CIRC planning, NSF NAIRR Pilot, UMN DSI Internal Funding, and Adobe Gifts. Thanks NSF, UMN DSI, and Adobe!
One paper accepted at ICML 2025. Previously In KIVI we observed K cache has outlier channels, while V doesn’t. In this ICML 2025 paper, we found that this observation is caused by RoPE.
Giving one tutorial at AAAI 2025 about KV Cache Optimization. Slide can be found here
One paper accepted at CVPR 2025 about Structured Pruning
Two paper accepted at ICLR 2025. One about LLM-based file system and one about zero-order fine-tuning of LLMs
KVCache Compression Benchmark accepted at EMNLP24. If you want to know the research landspace of this area, take a look at the paper and code.
Introduced a rare disease question-answering (ReDis-QA) dataset to assess the chatbot ability of diagnosing rare diseases.
🔥🔥 Our KIVI largely inspires the KV Cache quantization system in Huggingface. Code is available here; And our Self-Extend is used in Llama.cpp, implemented by KerasNLP, and highlighted during Google I/O session. Code is available here.
Our KIVI, Self-Extend, and Compress-then-prompt are accepted by ICML 2024. Self-Extend has been selected as Spotlight (3.5%) at ICML2024!

Publications

Please refer to publications or Google Scholar.

latest posts

Aug 23, 2025	Implementing Flash-Attention with Softmax Offset (Sinks)
Nov 22, 2024	Rounding Errors of FP16 and its impact in Machine Learning
Nov 22, 2024	LLMs are secretly lossless text compressor and how to use it like one