250916 [ Day 51 ] - OpenCV (7), PyTorch (1)

TaeHyun·2025년 9월 16일

PyTorch opencv study

TIL

목록 보기

53/182

시작하며

오늘로 OpenCV를 마무리하고 본격적인 AI Machine Learning 파트에 들어갔다. 첫 내용으로는 PyTorch부터 시작하였다.

YOLO

YOLO (You Only Look Once) : 딥러닝을 활용한 객체 탐지 알고리즘 중 하나
YOLO는 매우 빠르며, GPU를 활용할 경우 초당 수십 프레임을 처리할 수 있어 실시간 객체 탐지에 적합함
단일 신경망 구조
높은 정확도

YOLO 설치

Python 3.12 버전 까지만 지원
pip install ultralytics or conda install -c conda-forge pytorch torchvision -> conda install -c conda-forge ultralytics
from ultralytics import YOLO

모델 불러오기

conf= : 정확도 / 신뢰도

model = YOLO("yolo11n.pt")
img = cv.imread("../images/person_dog.jpg")

# 객체 인식
results = model.predict(img, conf=0.5)

# 탐지 결과를 이미지 위에 그려줌
annotated_frame = results[0].plot()

cv.imshow("img", annotated_frame)

cv.waitKey(0)
cv.destroyAllWindows()
cv.waitKey(1)

영상에 적용 (한 프레임마다 탐지를 진행하기 때문에 느려질 수 있음)

cap = cv.VideoCapture("../videos/cars.mp4")
model = YOLO("yolo11n.pt")
fps = cap.get(cv.CAP_PROP_FPS)

while cap.isOpened():
    ret, frame = cap.read()

    if not ret:
        break

    result = model.predict(frame, conf=0.5)
    annotated_frame = result[0].plot()

    cv.imshow("cars", annotated_frame)

    if cv.waitKey(int(1000/fps)) == ord("q"):
        break

cap.release()
cv.destroyAllWindows()
cv.waitKey(1)

캠 화면에 적용

cap = cv.VideoCapture(0)
model = YOLO("yolo11n.pt")

while cap.isOpened():
    ret, frame = cap.read()

    if not ret:
        break

    result = model.predict(frame, conf=0.5)
    annotated_frame = result[0].plot()

    cv.imshow("video", annotated_frame)

    if cv.waitKey(1) == ord("q"):
        break

cap.release()
cv.destroyAllWindows()
cv.waitKey(1)

Tesseract - OCR

OCR (광학 문자 인식)

이미지를 분석하여 그 안에 포함된 문자나 텍스트를 디지털 데이터로 변환하는 기술
문서, 사진, 간판 등에서 글자를 읽어 전자문서로 저장하거나 검색, 편집 가능하게 함

Tesseract - OCR

무료로 제공되는 오픈소스 광학 문자 인식 엔진
이미지나 스캔한 문서에서 텍스트를 추출하는 데 사용
brew install tesseract
brew install tesseract-lang
pip install pytesseract
Tesseract OCR이 설치된 폴더에 있는 실행 파일의 경로 지정
- win : pyt.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe”
- mac : pyt.tesseract_cmd = r'/usr/local/bin/tesseract‘

img = cv.imread("../images/bill.png", cv.IMREAD_GRAYSCALE)

ret, binary = cv.threshold(img, -1, 255, cv.THRESH_BINARY | cv.THRESH_OTSU)

text = pyt.image_to_string(binary, lang="kor+eng")

print(text)

PyTorch

GPU를 이용한 동적 신경망 구축이 가능한 딥러닝 프레임워크 (NumPy와 유사)
Facebook이 2017년 Torch를 파이썬 기반으로 재개발
pip3 install torch torchvision
conda install pytorch torchvision -c pytorch

import torch
import torchvision

구성 요소

torch : 메인 네임스페이스, 텐서 등의 다양한 함수가 포함
torch.autograd : 자동 미분 기능을 제공하는 라이브러리
torch.nn : 신경망 구축을 위한 데이터 구조, 레이어 등을 제공하는 라이브러리
torch.multiprocessing : 병렬처리 기능을 제공하는 라이브러리
torch.optim : SGD(Stochastic Gradient Descent, 확률적 경사 하강법) 등 파라미터 최적화 알고리즘 제공torch.utils``` : 데이터 조작 등 유틸리티 기능 제공

텐서 (Tensor)

방향성과 크기를 동시에 표현하는 다차원 배열
데이터를 담기 위한 컨테이너로서 일반적으로 수치형 데이터를 저장

텐서의 초기화

초기화 : 텐서를 생성하고 특정 값을 최초로 채우는 것

초기화 되지 않은 텐서

empty() : 메모리에 남아있는 쓰레기값으로 채워진 텐서 생성

x = torch.empty(4, 2)

print(x)
# tensor([[8.4078e-45, 0.0000e+00],
#         [0.0000e+00, 0.0000e+00],
#         [1.1388e-38, 1.5574e-41],
#         [0.0000e+00, 0.0000e+00]])

무작위로 초기화된 텐서

rand() : 0과 1사이의 값으로 랜덤하게 초기화된 텐서
randn() : 표준정규분포를 따르는 랜덤한 값으로 초기화된 텐서 생성

x = torch.rand(4,2)
y = torch.randn(2,3)

print(x)
print(y)
# tensor([[0.4590, 0.9777],
#         [0.1977, 0.6261],
#         [0.0530, 0.9501],
#         [0.1548, 0.1872]])
# tensor([[ 1.4518, -0.5593,  0.6369],
#         [-0.1235,  0.2759,  0.1233]])

0으로 채워진 텐서

x = torch.zeros(4,2, dtype=torch.long)

print(x)
# tensor([[0, 0],
#         [0, 0],
#         [0, 0],
#         [0, 0]])

1로 채워진 텐서

x = torch.ones(2,4, dtype=torch.double)

print(x)
# tensor([[1., 1., 1., 1.],
#         [1., 1., 1., 1.]], dtype=torch.float64)

new_ones() : 기존 텐서의 속성(데이터 타입, 장치)을 물려받음

y = x.new_ones(3,2)
print(y)
# tensor([[1., 1.],
#         [1., 1.],
#         [1., 1.]], dtype=torch.float64)

사용자가 입력한 값으로 텐서 초기화

x = torch.tensor([1, 2.5])

print(x)
# tensor([1.0000, 2.5000])

NumPy에서 불러오기

from_numpy()

data = [1,2,3,4]
np_array = np.array(data)

x = torch.from_numpy(np_array)

print(x)
# tensor([1, 2, 3, 4])

_like 메서드

기존 텐서의 속성을 복사하면서 값만 다른 텐서를 만드는 메서드

x = torch.tensor([[1,2], [3,4]], dtype=torch.float64)
y = torch.zeros_like(x)

print(y)
# tensor([[0., 0.],
#         [0., 0.]], dtype=torch.float64)

텐서의 속성

t = torch.rand(3,4)

print(t.size()) # 모양
# torch.Size([3, 4])

print(t.shape) # 모양
# torch.Size([3, 4])

print(t.dtype) # 자료형
# torch.float32

print(t.device) # 장치
# cpu

데이터 타입

특정 데이터 타입의 텐서 생성

x = torch.FloatTensor([1,2,3])
y = torch.tensor([1,2,3], dtype=torch.float32)

print(x.dtype)
print(y.dtype)
# torch.float32
# torch.float32

타입 캐스팅(형 변환)

ft = torch.FloatTensor([1.1, 2.2, 3.3])
print(ft)
print(ft.short())
print(ft.int())
print(ft.long())
# tensor([1.1000, 2.2000, 3.3000])
# tensor([1, 2, 3], dtype=torch.int16)
# tensor([1, 2, 3], dtype=torch.int32)
# tensor([1, 2, 3])

it = torch.IntTensor([1,2,3])
print(it)
print(it.half())
print(it.float())
print(it.double())
# tensor([1, 2, 3], dtype=torch.int32)
# tensor([1., 2., 3.], dtype=torch.float16)
# tensor([1., 2., 3.])
# tensor([1., 2., 3.], dtype=torch.float64)

CUDA Tensor

x = torch.randn(1)

if torch.cuda.is_available():
    tensor = x.to("cuda")

tensor.device
# device(type='cuda', index=0)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
# cuda

x = torch.ones(2,3, device=device)

다차원 텐서 표현

0D Tensor(Scalar)

하나의 숫자를 담고 있는 텐서
축과 형상이 없음

t0 = torch.tensor(0)

print(t0.ndim)
print(t0.shape)
# 0
# torch.Size([])

1D Tensor(Vector)

값들을 저장한 리스트와 유사한 텐서
하나의 축이 존재

t1 = torch.tensor([1,2,3])
print(t1.ndim)
print(t1.shape)
# 1
# torch.Size([3])

2D Tensor(Matrix)

행렬과 같은 모양으로 두개의 축이 존재
일반적인 수치, 통계 데이터셋에 주로 활용

t2 = torch.tensor([[1,2,3], [1,2,3], [1,2,3]])
print(t2.ndim)
print(t2.shape)
# 2
# torch.Size([3, 3])

3D Tensor

큐브와 같은 모양으로 세개의 축이 존재
데이터가 연속된 시퀀스 데이터나 시간축이 포함된 시계열 데이터에 해당
ex) 주식 가격, 질병 발병 데이터

t3 = torch.tensor([[[1,2,3], [1,2,3], [1,2,3]], 
                    [[1,2,3], [1,2,3], [1,2,3]], 
                    [[1,2,3], [1,2,3], [1,2,3]]])
print(t3.ndim)
print(t3.shape)
# 3
# torch.Size([3, 3, 3])

4D Tensor

4개의 축
컬러 이미지가 대표적인 사례
주로 샘플, 높이, 너비, 컬러 채널을 가진 구조로 사용

t4 = torch.tensor(
                [[[[1,2,3], [1,2,3], [1,2,3]], [[1,2,3], [1,2,3], [1,2,3]], [[1,2,3], [1,2,3], [1,2,3]]],
                [[[1,2,3], [1,2,3], [1,2,3]], [[1,2,3], [1,2,3], [1,2,3]], [[1,2,3], [1,2,3], [1,2,3]]]]
)
print(t4.ndim)
print(t4.shape)
# 4
# torch.Size([2, 3, 3, 3])

5D Tensor

5개의 축
비디오 데이터가 대표적 사례
주로 샘플, 프레임, 높이, 너비, 컬러 채널을 가진 구조로 사용

t5 = torch.tensor(
                [[[[[1,2,3], [1,2,3], [1,2,3]], [[1,2,3], [1,2,3], [1,2,3]]],
                [[[1,2,3], [1,2,3], [1,2,3]], [[1,2,3], [1,2,3], [1,2,3]]]],
                [[[[1,2,3], [1,2,3], [1,2,3]], [[1,2,3], [1,2,3], [1,2,3]]],
                [[[1,2,3], [1,2,3], [1,2,3]], [[1,2,3], [1,2,3], [1,2,3]]]]]
)
print(t5.ndim)
print(t5.shape)
# 5
# torch.Size([2, 2, 2, 3, 3])

마치며

PyTorch는 NumPy와 매우 유사한 점이 많아서 아직은 크게 어려운 내용은 없었다. NumPy 복습과 수학 공부를 계속한다면 앞으로의 Machine Learning 파트를 순조롭게 배울 수 있을 것 같다.

NOTION

MY NOTION (OpenCV. 05)
MY NOTION (PyTorch. 01)

TaeHyun

Hello I'm TaeHyunAn, Currently Studying Data Analysis

이전 포스트

250915 [ Day 50 ] - OpenCV (6)

다음 포스트

250916 [ Day 51 ] - OpenCV (7), PyTorch (1)

TIL

시작하며

YOLO

YOLO 설치

모델 불러오기

Tesseract - OCR

OCR (광학 문자 인식)

Tesseract - OCR

PyTorch

구성 요소

텐서 (Tensor)

텐서의 초기화

초기화 되지 않은 텐서

무작위로 초기화된 텐서

0으로 채워진 텐서

1로 채워진 텐서

사용자가 입력한 값으로 텐서 초기화

NumPy에서 불러오기

_like 메서드

텐서의 속성

데이터 타입

특정 데이터 타입의 텐서 생성

타입 캐스팅(형 변환)

CUDA Tensor

다차원 텐서 표현

0D Tensor(Scalar)

1D Tensor(Vector)

2D Tensor(Matrix)

3D Tensor

4D Tensor

5D Tensor

마치며

NOTION

250915 [ Day 50 ] - OpenCV (6)

250917 [ Day 52 ] - PyTorch (2)

0개의 댓글