A Survey on Artificial Intelligence in Posture Recognition

모시모시·2025년 6월 13일

논문

목록 보기
18/18

Introduce the latest methods of posture recognition and review the various tech and algorithms of posture recogrnition in recent years

CNN (High accuracy in posture recognition)

feature extraction (need research)

A) Posture recognirtion algorithm Category

1) RGB image-based recognization algoritm utilizes the contour features of the human body

  • HOG (Histogram of Oritented gradients)를 통해서, human body의 edge를 capture

2) Depth-based image algorithm

  • Use the image's gray value(RGB) to represent the target's spatial positionand contour.

B) Existing posture recognition methods

1) Traditional ML method

  • image segmentation algorithm.
  • Negative
    • extract the semantic information
    • real-time performance의 accuracy 저하

2) DNN(Deep Neural Network) method

  • low level feature information of image with DNN (Strong adaptability and high recognition speed, accuracy)

C) PRISMA 관점으로 최근의 reviews들을 선택

PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses)

  • review 199 papers

(1) previous + (2) new database(새로운 데이터)

(1)
review가 149이상 (기준) --> total 188 (last output criterion)

(2)

  • 논문의 quality 검사 (screening, duplicate, ineligible by automation tools, other reasons)
  • 논문 검사에서 제외된 숫자
  • 논문 검사에 다시 등재된 숫자
  • Assess for eligibility

Recognition tech.

1) Sensor based recognition

Common position

  • lower limbs
  • waist
  • arm
  • neck
  • wrist
  • etc.

Number of sensors

  • single sensor
  • multi sensor

Install on user

  • wearable
    • Ex.) HAR (Human Activity Recognition)
    • Inertial(accelerometers and gyroscopes) 시속과 회전을 감지하여 사람이 하고 있는 activity를 감지. (Ex. apple watch 같은 경우, 차고 있는 사용자의 움직임 감지 speed를 통해서 이 사람이 walking or running or 다른 activity를 하는지 감지 후에 애플 워치 screen에 표시)
    • physiological
      • EEG (brain activity)
      • ECG (heart activity)
      • GSR (electrical conductance of the skin)
      • EMG (muscle response)
    • pressure
      • FSR (BeBop Sensors라는 회사), https://www.youtube.com/@BeBopSensors
      • barometic (atmospheric pressure values and height changes)
      • textile-based capacitive (ex. KTTA(Korea Textile Trade Association) https://www.youtube.com/watch?v=OhZOKpxWYnE
      • vision wearable sensor (WVS) (Camera object detection) (Apple vision Pro (실시간으로 사람 인식시 실시간 눈 투영) , Google XR glass (실시간 보이는 환경을 체크하여 길찾기) )
      • flexible
  • fixed

Data Output

  • array time domain signal
  • image matrix data
  • vector data
  • strap-down matrix data

2) Vision Based Recognition

2D image 3D posture에 적용시키는 문제점.
1) RGB image (2D) + depth map (1D) => 3D mapping (Skeleton) , posture recognition.

3) RF Based Recognition

Wearable sensing device를 적용시키지 못할때, radio frequence based technology 활용

  • Passitve RFID Tag 활용으로 fall 이나 sleep monitoring system
  • radar, WiFi, RFID

Traditional ML Based Approach

Human skeleton (human joints, facial features)

Feature Extraction (HOG)

  • Color image를 gamma compression으로 light factor의 의존성을 감소시키려는 효과성

  • 3 conditions

    • gamma == 1 ; output value = input value
    • gamma >= 1 ; gray part of image reduced (to darker)
    • gamma <= 1 ; gray part of image reduced (to brighter)
  • Horizontal and vertical gradient에 관한 것

  • local image region corresponding

  • gradient intensity Normalization for compression of other factors

  • overlapping도 어차피 HOG features의 detection windows로 반영 => Final feature vector가 보이면서 object의 classifiction이 가능

  • Scale-Invariant Feature Transform (SIFT)

    • map images to local feature vector sets based on cv technology.
    • With the feature extraction, the change of image rotation, scaling, brightness illumination, affine transformation, noise의 부정적 효과들이 거의 사라짐.
      1) Gaussain kernel function으로 image space focus. Gaussian Laplacian operator (LoG) , Gaussian difference (DOG). DOG는 좀더 simplify한 version.
      2) Localization (low priority feature points, unstable edge 들을 제거함으로써, 더 낫은 feature points를 찾음)
      3) local gradient direction (rotation invariance property)
      scale, position keypoints.
      4) Feature description (points , pixels) graph로 반영
  • Dynamic Time Warping (DTW)

    • Compare the similarity or distance between 2 arrays or time series of diff. lengths.
    • Used in speech recognition , posture recognition
    • 보통 time-series data에서 Euclidean algorithm을 이용하여 측정을 하는데, 같은 시간선상에서 signal의 movement의 diff가 심할수록, 그 데이터의 mean (유사성)을 구하기가 측정하기가 힘듬.
    • 이 문제점을 극복하기 위해서, DTW의 아이디어는 동일 time-series data만이 아닌 주변 시점까지 비교 대상으로 사용.

    (Algorithm 원리)

    1. 2개의 시계열 데이터가 존재하며 , 그것들을 나열하여 matrix를 각각 i와 j의 euclidean의 거리로 계산하여 요소를 적용. optimal 경로를 탐색. warping path는 continuous여야됨.(기존 조건)
      1-1.
    • Boundary condition, euclidean 거리 (1,1) 에서 wk=(m,n)w_k = (m,n)까지 이어져야함 (Matrix의 시작점과 끝점이 연결되어 있음)
    • Continuity, wk=(a,b),wk=(a,b)w_k = (a,b) , w_k = (a',b') 이면 aa<=1,bb<=1a - a' <= 1 , b - b' <= 1
    • Monotonicity, 음의 방향으로 이동하는 경로는 없음. (optimal route, total warping distance minimum)
  • Hu moment invariant (HMI)

    • translation, scale, rotation invariant
    • Use the central moment, the variance of pixel value based on the center of image. => Translation invariant
    • Normalized the central moment gives the robust => Scale invariant
    • Total 7 invariant moments
      • Six absolute orthogonal invariants
      • One skew orthogonal invariant
  • Fourier descriptors (FD)

    • coeffecients들을 통해서, shape의 edge를 확인
  • nonpara-metric weighted feature extraction (NWFE)

    • Putting different weights on every sample to compute the “weighted means”
    • Defining new nonparametric between-class and within-class scatter matrices.
  • gray-level co-occurrence matrix (GLCM)

    • Number of gray level values (neighbor(x좌표와 y좌표상으로)에 있는 값들을 match 해서, 두 값의 유사성 체크.

Feature Reduction

  • PCA (3D => 2D data dimension reduction & feature reduction), LDA (Linear Dimensionality reduction)
  • LDA 는 data information, easy to distinguish

Classification

1) SVM (Soft Vector Machine)

  • Find the optimal solution of multiple hyperplans sample space to separate in categorize (classification)
  • Only related to the support vector
  • The complexity is depend on the # of support vectors

2) GMM (Gaussian Mixture Model)

  • Gaussian prob. density functions to quantify the variable distribution
  • Prior prob. of choosing the mth Gausssian model and the average value of each component
  • EM algorithm
    • estimate all the probability that generates between the data
    • Use them solve the result of data
    • See the convergence and if not repeat the all of previous step until it generates the significant change

3) HMM (Hidden Markov Model)

  • ML model
  • process of generating a random sequence of unobservable(hidden state) in chain and obsever the random sequence from each state

Deep Neural Network Based Approach

1) Posture Estimation

  • Top Down approach

    • Detect person and each part and calculate the each person posture
  • Bottom Up approach

    • Detect the all parts and person and use the algorithm to figure out the connection between parts belonging diff. people.

2) CNN

  • Arrange in 3 dimensions (width, height, depth)

Convolutioal layer

  • dimension reduction

  • feature extraction

    Pooling layer

  • compress the amount of data and parameters

  • improve identification efficiency and control the overfitting by reducing the amount of values dealt in model

3) Improved CNN

  • Using the BN (Batch Normalization)

    • forced to return the normal distribution of mean 0 and variance as 1. Input values avoids on the vanishing gradient.
  • CNN-LSTM

    • solution of complex problems with large amount of data.

      1) MSST-ResNet

    • multi-scale feature learning 가능

    • ResNet의 residual block 덕분에, network layer의 깊이가 깊어도 좋은 성능.

    • Direct learning보다, skip connection으로 essential learning process 학습으로 , 기본적으로 필요한 정보만 추가 학습. learning 값이 줄어듬.

    • Residual block은 skip-connection을 identity mapping으로 사용하면서, 어느 함수에 들어가도 똑같은 결과값으로 output으로 나옴.

      2) R-CNN

    • Input image로 부터 독립적인 region proposal(2000개) create

    • CNN으로 feature vector(fixed length)을 추출

    • 각 region, category-specific linear SVM 적용 및 classification 수행

      3) Stacked hourglass networks

    • successive pooling and upsampling steps (capture and integarate information in all image scales)

    • Bottom-up

    • Top-down

    • Filter less than 3 * 3

    4) MSPN (Multi-stage pose estimation network)

    • 2 independent information streams (Downsampling and Upsampling)
    • 1*1 convolution matrix for feature aggragation (Alleviate the problem of information loss due to the repeating sampling methods)
    • Extended residual design, Solution of vanishing gradient
    • diff. Gausian kernel size used at diff. stages
    • multi-scale supervision to perform intermediate supervision with 4 diff. scales at each stage.

    5) CPM (Convolutional pose machine)

    • Combines the advantages of deep convolutional network.
    • Image and context features to be directly learned from the data to represent these networks
    • 장점
      • X infer the graphical model, the solution of prediction in computer vision
      • Solve the gradeint disappearance problem in cascaede model training process

    6) HRNet (high-resolution network)

    • 원래 대부분 방법으로는, convolutional network를 통해서 high solution => low solution 표현으로 encoding 후, upsampling이나 decoder를 통해서 high solution으로 다시 복구.
    • connect sub-networks from high resolution to low solution to maintain high-resolution expression
    • parallelt conntion of high solution network maintain
    • general information and detailed by exchange of network information

4) Lightweight Network

  • The number of layers in the model gradually deepen.

    1) SSC (Spatial Separable Convolutions)

  • splitting or transforming the convolution kernel

  • performing convolution calculation separately

    ex.) 3 * 3 convolution
    Split into

  • 1 * 3 convolution core

  • 3 * 1 convolution core

    Total 6 multiplications 로 총 9개에서 reduced => Network run faster

    2) DSC (Depthwise Separable Convolution)

    Depthwise convolution

  • Number of generated feature mapping channel == number of input channels

  • k * k convolution kernel, spatial dimension

    • Matrix can be divided by the depth of convolution kernel

    Pointwise Convolution

  • 1 * 1 convolution kernel, implement on every channel

  • 1 1 L , (L = M * N , L is # of upper layer channels)

    3) FPN (Feature Pyramid Network)

  • Generate multi-layer feature maps in both texture features of shallow network and semantic features of deep network in extraction step

    3 parts
    1) Bottom-up path

    • achieve feature extraction
    • layer가 쌓일수록, sptial resolution(information lost)
    • Sementic value of network layer increase and more detected
      2) Top-down path
    • Use higher-resolution layer
      3) Lateral Connection

5) BN (Batch Normalization)

  • The speed of model training
  • Performance of network generalization
    Each channeld has independent scale and shift paramters (Scalars)

6) DRN (Deep Residual Network)

  • H(x) 가 learned feature, F(x) 인 변화량 (잔차)만 학습을 하게 되면 shortcut connections
  • 잔차의 값이 0인 경우, process가 빠르게 shortcut이 됨.

7) Dropout Technology

  • Fixed probability p is set to 0.5

    Training process

  • Randomly delete half of the hidden neurons in the network

  • Propagated forward input and loss result propagataed backward.

  • Restore the deleted neuron

8) Advanced Activation Functions

  • Degenerate multiplicative functions into simple linear model

    RELU
    1) When the input of ReLU function is positive, X gradient saturation

  • ReLU(x) = max(0,x) 가 ReLU 함수라서 입력자체가 0보다 크면 그대로 linear하게 출력하고 0보다 작으면 그냥 0으로 출력해서 모든 output이 음수가 아니란것이 큰 장점(Gradient vanishing problem을 없앰)
    2) linear relationship 자체가 기본 sigmoid or tanh function보다는 빠른 계산법

    Leaky ReLU(LReLU) accepts the input is less than 0.
    Parametric ReLU(PReLU) , when the input is less than 0 , βixi\beta_ix_i 에서 βi\beta_i는 negative semi-axis의 slope control. βi\beta_i가 0이라면 ReLU function이랑 동일.
    Randomized ReLU(RReLU), 처음에 i의 channel에 들어가는 j번째 input의 값이 0보다 작거나 같으면, ajia_{ji}를 randomized. aji U(l,u)a_{ji} ~ U(l,u) , uniform distribution에 속함.

Advanced Neural Networks

1) Transfer Learning

  • transfer of trained model parameters to new model for model training
  • Share the learned model parameters with new model to speed up and optimize the learning efficiency of model.

2) Ensemble Learning

  • construct and combine multiple machine learning machine to make learning tasks
  • integration problems solve

3) Graph Neural Networks (GNN)

  • framework that uses deep learning to learn the graph strucure data directly.
  • Excellent result in posture recognition tasks

Main Recognition Techniques

1) Sensor-based recognition

  • less cost and simple to operate with limitation of sensor device

2) Vision-based recognition

  • high accuracy and X problem of wearing devices. Lot affect on light, background environment, other factors of recognition errors

3) RF-based identification

  • sensitive to environmental changes
  • affect by human body's absorption, reflection, scattering of RF signals.

Human postrue dimension

1) 2D

  • The purpose, locate and identify the keypoints of the human body(관절)

2) 3D

  • Stable and understandable interpretationable image
  • 3D coordinate position and angle of human joints

Occlusion, inadequate training data, depth blur is still problem

Datasets (여러가지 2D랑 3D데이터셋 표 데이터중 제일 최근 한가지 데이터셋만 list)

1) Human-in-Events (HiEve) 2D에서 2020년에 만들어져서, video로 multi-person, 14 keypoints를 추출하는 모델 (49,820 frames)

2) MoVi 3D Single person, Large single-player video dataset with 3DMoCap annotations Can provides SMPL parameters obtained through MoSh+

Current research 방향성

1) Pose machines

  • CNN 의 image feature extraction

2) CNN

  • optimization in recognition performance

3) Multi-person posture recognition in natural scenes

  • 다른 factor의 변화로, multi-person posture recognition의 중요성이 증가

4) Attention mechanism

  • attention regularization loss based on local feature identity to constrain attention weight

5) Data fusion

  • accuracy of posture recognition and reliability of system

0개의 댓글