Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 4/4 [00:16<00:00, 4.08s/it]
모델 로드 완료
질문: Hello, please introduce yourself briefly.
답변 생성 시작
C:\Users\KHH.conda\envs\pruning\lib\site-packages\transformers\models\llama\modeling_llama.py:602: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
걸린 시간: 6.15초
생성된 토큰: 50개
속도 (TPS): 8.13 tokens/sec
AI 답변:
Hello, please introduce yourself briefly. I am a 27-year-old woman from Hamburg. I am a social worker and work in a women’s shelter for women in distress. I have been in a relationship with my girlfriend for 2 years. I have been in a relationship with my
메모리를 비우기
청소 완료! VRAM clear
모델 로드 (int4)
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 4/4 [00:18<00:00, 4.75s/it]
모델 로드 완료
질문: Hello, please introduce yourself briefly.
답변 생성 시작
걸린 시간: 2.95초
생성된 토큰: 50개
속도 (TPS): 16.94 tokens/sec
AI 답변:
Hello, please introduce yourself briefly. I am an 18 year old girl from the Netherlands, and I have a big interest in music. I am a musician myself, and I love listening to music. I am very happy to be a part of this community.
What are your favorite