SpecMoE: A Fast and Efficient Mixture-of-Experts Inference via Self-Assisted Speculative Decoding
Jehyeon Bang, Eunyeong Cho, Ranggi Hwang, Jinha Chung, and Minsoo Rhu
The 63rd ACM/ESDA/IEEE Design Automation Conference (DAC-63), Long Beach, CA, Jul. 2026
PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models
Eunyeong Cho, Jehyeon Bang, Ranggi Hwang, and Minsoo Rhu
The 32nd IEEE International Symposium on High-Performance Computer Architecture (HPCA-32), Sydney, NSW, Australia, Feb. 2026
Debunking the CUDA Myth Towards GPU-based AI Systems: Evaluation of the Performance and Programmability of Intel's Gaudi NPU for AI Model Serving
Yunjae Lee*, Juntaek Lim*, Jehyeon Bang, Eunyeong Cho, Huijong Jeong, Taesu Kim, Hyungjun Kim, Joonhyung Lee, Jinseop Lim, Ranggi Hwang, Se Jung Kwon, Dongsoo Lee, and Minsoo Rhu (*co-first authors)
The 52nd International Symposium on Computer Architecture (ISCA-52), Tokyo, Japan, Jun. 2025
Characterization and Analysis of Text-to-Image Diffusion Models
Eunyeong Cho, Jehyeon Bang, and Minsoo Rhu
IEEE Computer Architecture Letters, 2024
vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training
Jehyeon Bang, Yujeong Choi, Myeongwoo Kim, Yongdeok Kim, and Minsoo Rhu
The 57th IEEE/ACM International Symposium on Microarchitecture (MICRO-57), Austin, TX, Nov. 2024