Zirui's Homepage
 
 About me
I’m Zirui “Ray” Liu, an assistant professor from the Department of Computer Science at UMN. Previously I graduated from Computer Science at Rice University, where I worked with Prof. Xia Hu and Prof. Vladimir Braverman.
I am mostly interested in Large Language Models and their applications, focusing on enabling them to combine and process information from diverse sources and domains. For that reason I deeply care about efficiency, reasoning, long-context ability, and understanding their inner working mechanism. I also enjoy extending foundation models to other domains, exploring the interplay between different source of data.
📧📧 Recruiting: I am always looking for PhD students and research interns with strong coding skills. Feel free to drop me a line to ziruiliu dot recruit at gmail dot com together with resume, transcripts, and a short description of why you’d like to work with me.
News
-  Four paper accepted at EMNLP 2025 (1 Oral, 3 Findings). Three paper accepted at Neurips 2025 (1 Oral, 2 Poster). Kudo to my students and collaborators. 
-  2025/7. We are organizing the VISION workshop at ICCV about industrial inspection. 
-  Gave a talk (Paper, Record, Slide) at ASAP seminar about the impact of numerical precision to LLM reasoning evaluation. 
-  Received NSF CIRC planning, NSF NAIRR Pilot, UMN DSI Internal Funding, and Adobe Gifts. Thanks NSF, UMN DSI, and Adobe! 
-  One paper accepted at ICML 2025. Previously In KIVI we observed K cache has outlier channels, while V doesn’t. In this ICML 2025 paper, we found that this observation is caused by RoPE. 
-  Giving one tutorial at AAAI 2025 about KV Cache Optimization. Slide can be found here 
-  One paper accepted at CVPR 2025 about Structured Pruning 
-  Two paper accepted at ICLR 2025. One about LLM-based file system and one about zero-order fine-tuning of LLMs 
-  KVCache Compression Benchmark accepted at EMNLP24. If you want to know the research landspace of this area, take a look at the paper and code. 
-  Introduced a rare disease question-answering (ReDis-QA) dataset to assess the chatbot ability of diagnosing rare diseases. 
-  🔥🔥 Our KIVI largely inspires the KV Cache quantization system in Huggingface. Code is available here; And our Self-Extend is used in Llama.cpp, implemented by KerasNLP, and highlighted during Google I/O session. Code is available here. 
-  Our KIVI, Self-Extend, and Compress-then-prompt are accepted by ICML 2024. Self-Extend has been selected as Spotlight (3.5%) at ICML2024! 
Publications
Please refer to publications or Google Scholar.