Faster Quantized Inference with XNNPACK

eetocs·2022년 9월 23일

XNNPACK tensorflow tflite

0

TFLite + XNNPACK이 int8 연산에서도 가속화를 지원하기 시작함 (21.09.09)

https://blog.tensorflow.org/2021/09/faster-quantized-inference-with-xnnpack.html

Quantized XNNPACK support operation
- ADD
- Conv_2d (Fused Relu, RELU_N1_T0_1, RELU7)
- DepthWise_Conv_2d
- Dequantize
- ELU
- Fully_connected
- Logistic
- Max_POOL_2d
- MEAN
- MUL
- PAD
- Quantize
- RESIZE_BILINEAR
- SUB
- Transpose_conv
xnn_enable_qs8=true 옵션을 주고 TFLite bulid 해주면 사용 가능

모델	ImageNet top-1	Rpi 3b+
Efficinetnetlite-b0-float32	75.1%	135.2ms
Efficinetnetlite-b0-int8	74.4%	82.7ms
MobileNet_v1_float32	71.0%	134.4ms
MobileNet_v1_int8	70.0%	77.0ms
MobileNet_v2_float32	71.8%	95.7ms
MobileNet_v2_int8	70.8%	70.5ms

적은(?) 성능 drop으로 30% 정도의 inference 속도 향상 확인

on-device AI에서 Int8 Quantize는 이제 필수 조건

ML 잡부

이전 포스트

Triton Inference Server 부수기 1

다음 포스트

Tensorflow Model Optimization with TFlite

0개의 댓글