Large Language Model Accelerator Comparison

For use in publications and presentations please cite this data collection as follows:
Jinhao Li, Jiaming Xu, Shan Huang, Yonghua Chen, Wen Li, Jun Liu, Yaoxiu Lian, Jiayi Pan, Li Ding, Hao Zhou, Yu Wang, Guohao Dai. "Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective". Available: https://arxiv.org/pdf/2410.04466
[Update]: For neural network accelerator comparison, please refer to https://nicsefc.ee.tsinghua.edu.cn/project.html
[Update 2025.6]: Add more than 100 works and correct the results for some ASIC with quantization method.

    Select all
    CPU
    GPU
    FPGA
    ASIC
    PIM/NDP
    Select all
    Quantization
    Sparsity
    Fast Decoding
    Operator Optimization
    Hetergeneous Cooperation
    Homogeneous Cooperation
    Select all
    2022
    2023
    2024
    2025
    2026
    2027

* Peak performance calculated by TDP while real performance calculated by the measured power.
Title
Author
Affiliation
Fabric
Energy Efficiency
Precision
Year
URL

Submit your results or have any problems: kimholee@sjtu.edu.cn
Authored by: Jinhao Li, Jiaming Xu, Shan Huang, Yonghua Chen, Wen Li, Ding Li, Jun Liu, Yaoxiu Lian, Jiayi Pan, Ding Li, Hao Zhou, Yu Wang, Guohao Dai