논문 리뷰

1.HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

post-thumbnail

2.Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs

post-thumbnail