Object Detection and Tracking

oyoi·2024년 5월 14일

ECC_Artificial Intelligence with Python

목록 보기

11/16

Frame differencing

: 프레임 차이 대조법이란 실시간 비디오 스트림에서 캡처한 연속된 프레임들의 차이점을 분석한 정보를 이용해 영상에서 움직이는 부분을 찾아내는 기법이다.

import cv2 

# Compute the frame differences 
def frame_diff(prev_frame, cur_frame, next_frame): 
    
    # Difference between the current frame and the next frame 
    diff_frames_1 = cv2.absdiff(next_frame, cur_frame) 
    # Difference between the current frame and the previous frame 
    diff_frames_2 = cv2.absdiff(cur_frame, prev_frame) 
    
    return cv2.bitwise_and(diff_frames_1, diff_frames_2)
    
# Define a function to get the current frame from the webcam 
def get_frame(cap, scaling_factor): 
    
    # Read the current frame from the video capture object 
    _, frame = cap.read()
    # Resize the image 
    frame = cv2.resize(frame, None, fx=scaling_factor,  
            fy=scaling_factor, interpolation=cv2.INTER_AREA) 
    # Convert to grayscale 
    gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY) 
 
    return gray  
    
if __name__=='__main__': 
    
    # Define the video capture object 
    cap = cv2.VideoCapture(0)
    # Define the scaling factor for the images 
    scaling_factor = 0.5 
    
    # Grab the current frame 
    prev_frame = get_frame(cap, scaling_factor)  
    # Grab the next frame 
    cur_frame = get_frame(cap, scaling_factor)  
    # Grab the frame after that 
    next_frame = get_frame(cap, scaling_factor) 
    
    # Keep reading the frames from the webcam  
    # until the user hits the 'Esc' key 
    while True: 
        # Display the frame difference 
        cv2.imshow('Object Movement', frame_diff(prev_frame,  
                cur_frame, next_frame))
        # Update the variables 
        prev_frame = cur_frame 
        cur_frame = next_frame
        # Grab the next frame 
        next_frame = get_frame(cap, scaling_factor)
        # Check if the user hit the 'Esc' key 
        key = cv2.waitKey(10) 
        if key == 27: 
            break 
	# Close all the windows 
    cv2.destroyAllWindows()

Tracking objects using colorspaces

: 프레임 차이 계산법은 간단한 기법으로 유용하긴 하나 노이즈에 매우 민감해 물체를 정확히 추적할 수 없다. 물체를 정확히 추적하기 위해서는 물체의 속성을 정확히 파악해야 하며 이에 색 공간이 활용될 수 있다. RGB보다는 HSV이 색 공간을 활용하는 편이 사람이 색을 인식하는 방식에 더 가까워 물체 추적에 더 적합하다. 이 기법은 캡처한 RGB 프레임을 HSV 색 공간으로 변환하고 추적할 물체의 색상에 대한 임계값을 이용해 물체를 추적한다.

import cv2 
import numpy as np

if __name__=='__main__': 
    # Define the video capture object 
    cap = cv2.VideoCapture(0) 
    
    # Define the scaling factor for the images 
    scaling_factor = 0.5 
    
    # Keep reading the frames from the webcam  
    # until the user hits the 'Esc' key 
    while True: 
        # Grab the current frame 
        frame = get_frame(cap, scaling_factor) 
        # Convert the image to HSV colorspace 
        hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
        
        # Define range of skin color in HSV 
        lower = np.array([0, 70, 60]) 
        upper = np.array([50, 150, 255])
        
        # Threshold the HSV image to get only skin color 
        mask = cv2.inRange(hsv, lower, upper) 
        
        # Bitwise-AND between the mask and original image 
        img_bitwise_and = cv2.bitwise_and(frame, frame, mask=mask)
        
        # Run median blurring  
        img_median_blurred = cv2.medianBlur(img_bitwise_and, 5) 
        
        # Display the input and output 
        cv2.imshow('Input', frame) 
        cv2.imshow('Output', img_median_blurred) 
        
        # Check if the user hit the 'Esc' key 
        c = cv2.waitKey(5)  
        if c == 27: 
            break 
            
    # Close all the windows 
    cv2.destroyAllWindows()

Object tracking using background subtraction

: 배경 분리법이란 주어진 영상에서 배경에 해당하는 모델을 만들고, 이를 이용해 움직이는 물체를 추출하는 기법이다. 이 기법의 두 가지 핵심 작업 중 하나는 배경에 대한 모델을 만드는 것이다. 배경 모델을 실시간으로 업데이트 해야 하므로 프레임 차이 계산법과는 좀 더 다른 방식으로 처리해야 한다. 이를 위해 기준점이 계속 변하는 적응형 알고리즘을 구현해야 한다.

import cv2 
import numpy as np 

if __name__=='__main__': 
    # Define the video capture object 
    cap = cv2.VideoCapture(0) 
    
    # Define the background subtractor object 
    bg_subtractor = cv2.createBackgroundSubtractorMOG2()
    
    # Define the number of previous frames to use to learn.  
    # This factor controls the learning rate of the algorithm.  
    # The learning rate refers to the rate at which your model  
    # will learn about the background. Higher value for  
    # 'history' indicates a slower learning rate. You can  
    # play with this parameter to see how it affects the output. 
    history = 100
    # Define the learning rate 
    learning_rate = 1.0/history ’
    
    # Keep reading the frames from the webcam  
    # until the user hits the 'Esc' key 
    while True: 
        # Grab the current frame 
        frame = get_frame(cap, 0.5)
        
        # Compute the mask  
        mask = bg_subtractor.apply(frame, learningRate=learning_rate)
        
        # Convert grayscale image to RGB color image 
        mask = cv2.cvtColor(mask, cv2.COLOR_GRAY2BGR)
        
        # Display the images 
        cv2.imshow('Input', frame) 
        cv2.imshow('Output', mask & frame)
        
        # Check if the user hit the 'Esc' key 
        c = cv2.waitKey(10) 
        if c == 27: 
            break 
            
    # Release the video capture object 
    cap.release() 
     
    # Close all the windows 
    cv2.destroyAllWindows()

Building an interactive object tracker using the CAMShift algorithm

: 색 공간 기반 추적 기법은 추적 전 색상의 범위를 먼저 정의해야 한다는 제약이 있다. 실시간으로 물체를 인식하고 추적하는 알고리즘이 바로 캠시프트 알고리즘이다. 평균(중심) 이동 알고리즘을 적응형으로 개선한 버전인 것이다.

중심 이동 알고리즘에 대해 먼저 알아보자. 관심영역이란 관심을 가지고 추적할 대상이 담김 영역으로, 추적할 물체 주변에 경계선을 그은 것이다. 관심영역에 대한 컬러 히스토그램을 기준으로 특징 점의 집합을 선택하고 그 점들의 중심을 구한다. 점들의 중심이 관심 영역의 기하 중심에 머물러 있다면 이는 물체가 움직이지 않았다는 의미이다. 반대로 점들의 중심점이 관심 영역의 중심과 다르다면 물체가 움직였다는 것이다. 따라서 관심 영역을 이동해야 한다. 이때 경계선의 중심점의 위치가 이동한 점들의 중심점 위치와 일치해야 한다. 이 작업을 각 프레임마다 실행하면 실시간으로 물체를 추적할 수 있다.

중심 이동 알고리즘은 시간이 흐름에 따라 물체의 크기가 변할 수 없다는 단점이 있다. 캠시프트 알고리즘은 바로 러한 단점을 보완해 경계선의 크기가 물체의 크기에 적응하도록 개선했다.

import cv2 
import numpy as np

# Define a class to handle object tracking related functionality 
class ObjectTracker(object): 
    def __init__(self, scaling_factor=0.5): 
        # Initialize the video capture object 
        self.cap = cv2.VideoCapture(0)
        # Capture the frame from the webcam 
        _, self.frame = self.cap.read()
        # Scaling factor for the captured frame 
        self.scaling_factor = scaling_factor
        # Resize the frame 
        self.frame = cv2.resize(self.frame, None,  
                fx=self.scaling_factor, fy=self.scaling_factor,  
                interpolation=cv2.INTER_AREA)
        # Create a window to display the frame 
        cv2.namedWindow('Object Tracker')
        # Set the mouse callback function to track the mouse 
        cv2.setMouseCallback('Object Tracker', self.mouse_event)
        # Initialize variable related to rectangular region selection 
        self.selection = None 
        # Initialize variable related to starting position  
        self.drag_start = None 
        # Initialize variable related to the state of tracking  
        self.tracking_state = 0
        
    # Define a method to track the mouse events 
    def mouse_event(self, event, x, y, flags, param): 
        # Convert x and y coordinates into 16-bit numpy integers 
        x, y = np.int16([x, y])
        
        # Check if a mouse button down event has occurred 
        if event == cv2.EVENT_LBUTTONDOWN: 
            self.drag_start = (x, y) 
            self.tracking_state = 0
        
        # Check if the user has started selecting the region 
        if self.drag_start: 
            if flags & cv2.EVENT_FLAG_LBUTTON: 
                # Extract the dimensions of the frame 
                h, w = self.frame.shape[:2] 
                # Get the initial position 
                xi, yi = self.drag_start 
                # Get the max and min values 
                x0, y0 = np.maximum(0, np.minimum([xi, yi], [x, y])) 
                x1, y1 = np.minimum([w, h], np.maximum([xi, yi], [x, y]))
                # Reset the selection variable 
                self.selection = None
                # Finalize the rectangular selection 
                if x1-x0 > 0 and y1-y0 > 0: 
                    self.selection = (x0, y0, x1, y1)
            else: 
                # If the selection is done, start tracking   
                self.drag_start = None 
                if self.selection is not None: 
                    self.tracking_state = 1 
    
    # Method to start tracking the object 
    def start_tracking(self): 
        # Iterate until the user presses the Esc key 
        while True: 
            # Capture the frame from webcam 
            _, self.frame = self.cap.read() 
            # Resize the input frame 
            self.frame = cv2.resize(self.frame, None,  
                    fx=self.scaling_factor, fy=self.scaling_factor,  
                    interpolation=cv2.INTER_AREA) 
            # Create a copy of the frame 
            vis = self.frame.copy() 
            # Convert the frame to HSV colorspace 
            hsv = cv2.cvtColor(self.frame, cv2.COLOR_BGR2HSV)
            # Create the mask based on predefined thresholds 
            mask = cv2.inRange(hsv, np.array((0., 60., 32.)),  
                        np.array((180., 255., 255.))) 
            
            # Check if the user has selected the region 
            if self.selection: 
                # Extract the coordinates of the selected rectangle 
                x0, y0, x1, y1 = self.selection 
                # Extract the tracking window 
                self.track_window = (x0, y0, x1-x0, y1-y0)
                # Extract the regions of interest  
                hsv_roi = hsv[y0:y1, x0:x1] 
                mask_roi = mask[y0:y1, x0:x1] 
                # Compute the histogram of the region of  
                # interest in the HSV image using the mask 
                hist = cv2.calcHist( [hsv_roi], [0], mask_roi,  
                        [16], [0, 180] )     
                # Normalize and reshape the histogram 
                cv2.normalize(hist, hist, 0, 255, cv2.NORM_MINMAX); 
                self.hist = hist.reshape(-1)
                # Extract the region of interest from the frame 
                vis_roi = vis[y0:y1, x0:x1]
                # Compute the image negative (for display only) 
                cv2.bitwise_not(vis_roi, vis_roi) 
                vis[mask == 0] = 0
             
            # Check if the system in the "tracking" mode 
            if self.tracking_state == 1: 
                # Reset the selection variable 
                self.selection = None 
                # Compute the histogram back projection 
                hsv_backproj = cv2.calcBackProject([hsv], [0],  
                        self.hist, [0, 180], 1)
                # Compute bitwise AND between histogram  
                # backprojection and the mask 
                hsv_backproj &= mask
                # Define termination criteria for the tracker 
                term_crit = (cv2.TERM_CRITERIA_EPS | 
                        cv2.TERM_CRITERIA_COUNT, 10, 1)
                # Apply CAMShift on 'hsv_backproj' 
                track_box, self.track_window = cv2.CamShift(hsv_backproj,  
                        self.track_window, term_crit) 
                # Draw an ellipse around the object 
                cv2.ellipse(vis, track_box, (0, 255, 0), 2) 
                # Show the output live video 
                cv2.imshow('Object Tracker', vis)
‘           
			# Stop if the user hits the 'Esc' key 
            c = cv2.waitKey(5) 
            if c == 27: 
                break 
        # Close all the windows 
        cv2.destroyAllWindows()
        
if __name__ == '__main__': 
    # Start the tracker 
    ObjectTracker().start_tracking()

Optical flow based tracking

: 광학 흐름은 이미지의 특징점을 이용해 물체를 추적하는 기법이다. 실시간 영상으로부터 연속적으로 들어오는 프레임을 보고 있다가 현재 프레임에서 일련의 특징점을 발견하면 이를 변위 벡터로 계산해서 특징적음 추출한다. 그리고 연속된 프레임 사이에서 이러한 특징점이 움직이는 것을 화면에 표시한다. 이렇게 움직이는 변위 벡터를 모션 벡터라고 부른다.

루카스-카나데 기법은 가장 유명한 광학 흐름 기법이다. 현재 프레임에서 특징점을 추출하고 특징점의 중심에 3X3 패치를 만든다. 각 패치에 있는 점들이 모두 비슷한 방향으로 움직인다고 가정한다. 각 패치에 대해 이전 프레임과 일치하는 부분이 있는지 찾아보고 오차값을 기준으로 가장 유사한 것을 선택한다. 가장 유사한 패치를 찾았다면 현재 패치의 중심점으로부터 이전 패치에서 찾은 유사한 패치의 중심에 이르는 경로를 구한다. 이 경로가 바로 모션 벡터다. 다른 패치들에 대해서도 이러한 방식으로 모션 벡터를 계산한다.

import cv2 
import numpy as np 

# Define a function to track the object 
def start_tracking(): 
    # Initialize the video capture object 
    cap = cv2.VideoCapture(0) 

    # Define the scaling factor for the frames 
    scaling_factor = 0.5 
    
    # Number of frames to track 
    num_frames_to_track = 5  
 
    # Skipping factor 
    num_frames_jump = 2 
    
    # Initialize variables 
    tracking_paths = [] 
    frame_index = 0 
    
    # Define tracking parameters 
    tracking_params = dict(winSize  = (11, 11), maxLevel = 2, 
            criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT,  
                10, 0.03)) 
                
    # Iterate until the user hits the 'Esc' key 
    while True: 
        # Capture the current frame 
        _, frame = cap.read() 
        # Resize the frame 
        frame = cv2.resize(frame, None, fx=scaling_factor,  
                fy=scaling_factor, interpolation=cv2.INTER_AREA) 
        # Convert to grayscale 
        frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) 
        # Create a copy of the frame 
        output_img = frame.copy() 
        
        if len(tracking_paths) > 0: 
            # Get images 
            prev_img, current_img = prev_gray, frame_gray 
            # Organize the feature points 
            feature_points_0 = np.float32([tp[-1] for tp in \ 
                    tracking_paths]).reshape(-1, 1, 2) 
            # Compute optical flow 
            feature_points_1, _, _ = cv2.calcOpticalFlowPyrLK( 
                    prev_img, current_img, feature_points_0,  
                    None, **tracking_params) 
            # Compute reverse optical flow 
            feature_points_0_rev, _, _ = cv2.calcOpticalFlowPyrLK( 
                    current_img, prev_img, feature_points_1,  
                    None, **tracking_params) 
 
            # Compute the difference between forward and  
            # reverse optical flow 
            diff_feature_points = abs(feature_points_0 - \ 
                    feature_points_0_rev).reshape(-1, 2).max(-1) 
            # Extract the good points 
            good_points = diff_feature_points < 1 
            # Initialize variable 
            new_tracking_paths = [] 
          	# Iterate through all the good feature points  
            for tp, (x, y), good_points_flag in zip(tracking_paths,  
                        feature_points_1.reshape(-1, 2), good_points): 
                
                # If the flag is not true, then continue 
                if not good_points_flag: 
                    continue 
                
                # Append the X and Y coordinates and check if 
                # its length greater than the threshold 
                tp.append((x, y)) 
                if len(tp) > num_frames_to_track: 
                    del tp[0] 
 
                new_tracking_paths.append(tp)
                # Draw a circle around the feature points 
                cv2.circle(output_img, (x, y), 3, (0, 255, 0), -1) 
 
            # Update the tracking paths 
            tracking_paths = new_tracking_paths 
 
            # Draw lines 
            cv2.polylines(output_img, [np.int32(tp) for tp in \ 
                    tracking_paths], False, (0, 150, 0)) 
       	
        # Go into this 'if' condition after skipping the  
       	# right number of frames 
        if not frame_index % num_frames_jump: 
            # Create a mask and draw the circles 
            mask = np.zeros_like(frame_gray) 
            mask[:] = 255 
            for x, y in [np.int32(tp[-1]) for tp in tracking_paths]: 
                cv2.circle(mask, (x, y), 6, 0, -1) 
            
            # Compute good features to track
            feature_points = cv2.goodFeaturesToTrack(frame_gray,  
                    mask = mask, maxCorners = 500, qualityLevel = 0.3,  
                    minDistance = 7, blockSize = 7)  
            # Check if feature points exist. If so, append them 
            # to the tracking paths 
            if feature_points is not None: 
                for x, y in np.float32(feature_points).reshape(-1, 2): 
                    tracking_paths.append([(x, y)]) 
       
       	# Update variables 
        frame_index += 1 
        prev_gray = frame_gray 
        
        # Display output 
        cv2.imshow('Optical Flow', output_img) 
        
        # Check if the user hit the 'Esc' key 
        c = cv2.waitKey(1) 
        if c == 27: 
            break 

if __name__ == '__main__': 
    # Start the tracker 
    start_tracking() 
 
    # Close all the windows 
    cv2.destroyAllWindows()

Face detection and tracking

: 얼굴 검출이란 입력된 이미지에서 얼굴이 있는 지점을 찾아내는 것이다.

Using Haar cascades for object detection

: 영상에서 얼굴을 추출하기 위해 하 캐스케이드를 활용하낟. 하 캐스케이드란 하 특징에 대한 여러 개의 분류기를 순차적으로 거치는 방식으로 물체를 감지하는 기법이다. 하 특징은 여러 이미지 사이의 패치를 단순히 더하고 뺸 것이다. 하 캐스케이드 기법은 단순한 분류기들을 여러 단계로 이어서 처리해 전반적인 분류 작업의 정확도를 높인다.
특징을 추출한 후 분류기에 전달해 여러 종류의 정사각 부분 영역의 이미지를 검사한 후 얼굴이 아닌 부분은 버린다. 이때 적분 이미지 기법을 활용하면 작업 속도를 향상시킬 수 있다.