A Survey on Artificial Intelligence in Posture Recognition

모시모시·2025년 6월 13일

논문

목록 보기

18/18

Introduce the latest methods of posture recognition and review the various tech and algorithms of posture recogrnition in recent years

CNN (High accuracy in posture recognition)

feature extraction (need research)

A) Posture recognirtion algorithm Category

1) RGB image-based recognization algoritm utilizes the contour features of the human body

HOG (Histogram of Oritented gradients)를 통해서, human body의 edge를 capture

2) Depth-based image algorithm

Use the image's gray value(RGB) to represent the target's spatial positionand contour.

B) Existing posture recognition methods

1) Traditional ML method

image segmentation algorithm.
Negative
- extract the semantic information
- real-time performance의 accuracy 저하

2) DNN(Deep Neural Network) method

low level feature information of image with DNN (Strong adaptability and high recognition speed, accuracy)

C) PRISMA 관점으로 최근의 reviews들을 선택

PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses)

review 199 papers

(1) previous + (2) new database(새로운 데이터)

(1)
review가 149이상 (기준) --> total 188 (last output criterion)

(2)

논문의 quality 검사 (screening, duplicate, ineligible by automation tools, other reasons)
논문 검사에서 제외된 숫자
논문 검사에 다시 등재된 숫자
Assess for eligibility

Recognition tech.

1) Sensor based recognition

Common position

lower limbs
waist
arm
neck
wrist
etc.

Number of sensors

single sensor
multi sensor

Install on user

wearable
- Ex.) HAR (Human Activity Recognition)
- Inertial(accelerometers and gyroscopes) 시속과 회전을 감지하여 사람이 하고 있는 activity를 감지. (Ex. apple watch 같은 경우, 차고 있는 사용자의 움직임 감지 speed를 통해서 이 사람이 walking or running or 다른 activity를 하는지 감지 후에 애플 워치 screen에 표시)
- physiological
  - EEG (brain activity)
  - ECG (heart activity)
  - GSR (electrical conductance of the skin)
  - EMG (muscle response)
- pressure
  - FSR (BeBop Sensors라는 회사), https://www.youtube.com/@BeBopSensors
  - barometic (atmospheric pressure values and height changes)
  - textile-based capacitive (ex. KTTA(Korea Textile Trade Association) https://www.youtube.com/watch?v=OhZOKpxWYnE
  - vision wearable sensor (WVS) (Camera object detection) (Apple vision Pro (실시간으로 사람 인식시 실시간 눈 투영) , Google XR glass (실시간 보이는 환경을 체크하여 길찾기) )
  - flexible
fixed

Data Output

array time domain signal
image matrix data
vector data
strap-down matrix data

2) Vision Based Recognition

2D image 3D posture에 적용시키는 문제점.
1) RGB image (2D) + depth map (1D) => 3D mapping (Skeleton) , posture recognition.

3) RF Based Recognition

Wearable sensing device를 적용시키지 못할때, radio frequence based technology 활용

Passitve RFID Tag 활용으로 fall 이나 sleep monitoring system
radar, WiFi, RFID

Traditional ML Based Approach

Human skeleton (human joints, facial features)

Feature Extraction (HOG)

Color image를 gamma compression으로 light factor의 의존성을 감소시키려는 효과성
3 conditions
- gamma == 1 ; output value = input value
- gamma >= 1 ; gray part of image reduced (to darker)
- gamma <= 1 ; gray part of image reduced (to brighter)
Horizontal and vertical gradient에 관한 것
local image region corresponding
gradient intensity Normalization for compression of other factors
overlapping도 어차피 HOG features의 detection windows로 반영 => Final feature vector가 보이면서 object의 classifiction이 가능
Scale-Invariant Feature Transform (SIFT)
- map images to local feature vector sets based on cv technology.
- With the feature extraction, the change of image rotation, scaling, brightness illumination, affine transformation, noise의 부정적 효과들이 거의 사라짐.
  1) Gaussain kernel function으로 image space focus. Gaussian Laplacian operator (LoG) , Gaussian difference (DOG). DOG는 좀더 simplify한 version.
  2) Localization (low priority feature points, unstable edge 들을 제거함으로써, 더 낫은 feature points를 찾음)
  3) local gradient direction (rotation invariance property)
  scale, position keypoints.
  4) Feature description (points , pixels) graph로 반영
Dynamic Time Warping (DTW)
- Compare the similarity or distance between 2 arrays or time series of diff. lengths.
- Used in speech recognition , posture recognition
- 보통 time-series data에서 Euclidean algorithm을 이용하여 측정을 하는데, 같은 시간선상에서 signal의 movement의 diff가 심할수록, 그 데이터의 mean (유사성)을 구하기가 측정하기가 힘듬.
- 이 문제점을 극복하기 위해서, DTW의 아이디어는 동일 time-series data만이 아닌 주변 시점까지 비교 대상으로 사용.
(Algorithm 원리)
1. 2개의 시계열 데이터가 존재하며 , 그것들을 나열하여 matrix를 각각 i와 j의 euclidean의 거리로 계산하여 요소를 적용. optimal 경로를 탐색. warping path는 continuous여야됨.(기존 조건)
  1-1.
- Boundary condition, euclidean 거리 (1,1) 에서 $w_k = (m,n)$ 까지 이어져야함 (Matrix의 시작점과 끝점이 연결되어 있음)
- Continuity, $w_k = (a,b) , w_k = (a',b')$ 이면 $a - a' <= 1 , b - b' <= 1$
- Monotonicity, 음의 방향으로 이동하는 경로는 없음. (optimal route, total warping distance minimum)
Hu moment invariant (HMI)
- translation, scale, rotation invariant
- Use the central moment, the variance of pixel value based on the center of image. => Translation invariant
- Normalized the central moment gives the robust => Scale invariant
- Total 7 invariant moments
  - Six absolute orthogonal invariants
  - One skew orthogonal invariant
Fourier descriptors (FD)
- coeffecients들을 통해서, shape의 edge를 확인
nonpara-metric weighted feature extraction (NWFE)
- Putting different weights on every sample to compute the “weighted means”
- Defining new nonparametric between-class and within-class scatter matrices.
gray-level co-occurrence matrix (GLCM)
- Number of gray level values (neighbor(x좌표와 y좌표상으로)에 있는 값들을 match 해서, 두 값의 유사성 체크.

Feature Reduction

PCA (3D => 2D data dimension reduction & feature reduction), LDA (Linear Dimensionality reduction)
LDA 는 data information, easy to distinguish

Classification

1) SVM (Soft Vector Machine)

Find the optimal solution of multiple hyperplans sample space to separate in categorize (classification)
Only related to the support vector
The complexity is depend on the # of support vectors

2) GMM (Gaussian Mixture Model)

Gaussian prob. density functions to quantify the variable distribution
Prior prob. of choosing the mth Gausssian model and the average value of each component
EM algorithm
- estimate all the probability that generates between the data
- Use them solve the result of data
- See the convergence and if not repeat the all of previous step until it generates the significant change

3) HMM (Hidden Markov Model)

ML model
process of generating a random sequence of unobservable(hidden state) in chain and obsever the random sequence from each state

Deep Neural Network Based Approach

1) Posture Estimation

Top Down approach
- Detect person and each part and calculate the each person posture
Bottom Up approach
- Detect the all parts and person and use the algorithm to figure out the connection between parts belonging diff. people.

2) CNN

Arrange in 3 dimensions (width, height, depth)

Convolutioal layer

dimension reduction
feature extraction

Pooling layer
compress the amount of data and parameters
improve identification efficiency and control the overfitting by reducing the amount of values dealt in model

3) Improved CNN

Using the BN (Batch Normalization)
- forced to return the normal distribution of mean 0 and variance as 1. Input values avoids on the vanishing gradient.
CNN-LSTM
- solution of complex problems with large amount of data.
  
  1) MSST-ResNet
- multi-scale feature learning 가능
- ResNet의 residual block 덕분에, network layer의 깊이가 깊어도 좋은 성능.
- Direct learning보다, skip connection으로 essential learning process 학습으로 , 기본적으로 필요한 정보만 추가 학습. learning 값이 줄어듬.
- Residual block은 skip-connection을 identity mapping으로 사용하면서, 어느 함수에 들어가도 똑같은 결과값으로 output으로 나옴.
  
  2) R-CNN
- Input image로 부터 독립적인 region proposal(2000개) create
- CNN으로 feature vector(fixed length)을 추출
- 각 region, category-specific linear SVM 적용 및 classification 수행
  
  3) Stacked hourglass networks
- successive pooling and upsampling steps (capture and integarate information in all image scales)
- Bottom-up
- Top-down
- Filter less than 3 * 3
4) MSPN (Multi-stage pose estimation network)
- 2 independent information streams (Downsampling and Upsampling)
- 1*1 convolution matrix for feature aggragation (Alleviate the problem of information loss due to the repeating sampling methods)
- Extended residual design, Solution of vanishing gradient
- diff. Gausian kernel size used at diff. stages
- multi-scale supervision to perform intermediate supervision with 4 diff. scales at each stage.
5) CPM (Convolutional pose machine)
- Combines the advantages of deep convolutional network.
- Image and context features to be directly learned from the data to represent these networks
- 장점
  - X infer the graphical model, the solution of prediction in computer vision
  - Solve the gradeint disappearance problem in cascaede model training process
6) HRNet (high-resolution network)
- 원래 대부분 방법으로는, convolutional network를 통해서 high solution => low solution 표현으로 encoding 후, upsampling이나 decoder를 통해서 high solution으로 다시 복구.
- connect sub-networks from high resolution to low solution to maintain high-resolution expression
- parallelt conntion of high solution network maintain
- general information and detailed by exchange of network information

4) Lightweight Network

The number of layers in the model gradually deepen.

1) SSC (Spatial Separable Convolutions)
splitting or transforming the convolution kernel
performing convolution calculation separately

ex.) 3 * 3 convolution
Split into
1 * 3 convolution core
3 * 1 convolution core

Total 6 multiplications 로 총 9개에서 reduced => Network run faster

2) DSC (Depthwise Separable Convolution)

Depthwise convolution
Number of generated feature mapping channel == number of input channels
k * k convolution kernel, spatial dimension
- Matrix can be divided by the depth of convolution kernel
Pointwise Convolution
1 * 1 convolution kernel, implement on every channel
1 1 L , (L = M * N , L is # of upper layer channels)

3) FPN (Feature Pyramid Network)
Generate multi-layer feature maps in both texture features of shallow network and semantic features of deep network in extraction step

3 parts
1) Bottom-up path
- achieve feature extraction
- layer가 쌓일수록, sptial resolution(information lost)
- Sementic value of network layer increase and more detected
  2) Top-down path
- Use higher-resolution layer
  3) Lateral Connection

5) BN (Batch Normalization)

The speed of model training
Performance of network generalization
Each channeld has independent scale and shift paramters (Scalars)

6) DRN (Deep Residual Network)

H(x) 가 learned feature, F(x) 인 변화량 (잔차)만 학습을 하게 되면 shortcut connections
잔차의 값이 0인 경우, process가 빠르게 shortcut이 됨.

7) Dropout Technology

Fixed probability p is set to 0.5

Training process
Randomly delete half of the hidden neurons in the network
Propagated forward input and loss result propagataed backward.
Restore the deleted neuron

8) Advanced Activation Functions

Degenerate multiplicative functions into simple linear model

RELU
1) When the input of ReLU function is positive, X gradient saturation
ReLU(x) = max(0,x) 가 ReLU 함수라서 입력자체가 0보다 크면 그대로 linear하게 출력하고 0보다 작으면 그냥 0으로 출력해서 모든 output이 음수가 아니란것이 큰 장점(Gradient vanishing problem을 없앰)
2) linear relationship 자체가 기본 sigmoid or tanh function보다는 빠른 계산법

Leaky ReLU(LReLU) accepts the input is less than 0.
Parametric ReLU(PReLU) , when the input is less than 0 , $\beta_ix_i$ 에서 $\beta_i$ 는 negative semi-axis의 slope control. $\beta_i$ 가 0이라면 ReLU function이랑 동일.
Randomized ReLU(RReLU), 처음에 i의 channel에 들어가는 j번째 input의 값이 0보다 작거나 같으면, $a_{ji}$ 를 randomized. $a_{ji} ~ U(l,u)$ , uniform distribution에 속함.

Advanced Neural Networks

1) Transfer Learning

transfer of trained model parameters to new model for model training
Share the learned model parameters with new model to speed up and optimize the learning efficiency of model.

2) Ensemble Learning

construct and combine multiple machine learning machine to make learning tasks
integration problems solve

3) Graph Neural Networks (GNN)

framework that uses deep learning to learn the graph strucure data directly.
Excellent result in posture recognition tasks

Main Recognition Techniques

1) Sensor-based recognition

less cost and simple to operate with limitation of sensor device

2) Vision-based recognition

high accuracy and X problem of wearing devices. Lot affect on light, background environment, other factors of recognition errors

3) RF-based identification

sensitive to environmental changes
affect by human body's absorption, reflection, scattering of RF signals.

Human postrue dimension

1) 2D

The purpose, locate and identify the keypoints of the human body(관절)

2) 3D

Stable and understandable interpretationable image
3D coordinate position and angle of human joints

Occlusion, inadequate training data, depth blur is still problem

Datasets (여러가지 2D랑 3D데이터셋 표 데이터중 제일 최근 한가지 데이터셋만 list)

1) Human-in-Events (HiEve) 2D에서 2020년에 만들어져서, video로 multi-person, 14 keypoints를 추출하는 모델 (49,820 frames)

2) MoVi 3D Single person, Large single-player video dataset with 3DMoCap annotations Can provides SMPL parameters obtained through MoSh+

Current research 방향성

1) Pose machines

CNN 의 image feature extraction

2) CNN

optimization in recognition performance

3) Multi-person posture recognition in natural scenes

다른 factor의 변화로, multi-person posture recognition의 중요성이 증가

4) Attention mechanism

attention regularization loss based on local feature identity to constrain attention weight

5) Data fusion

accuracy of posture recognition and reliability of system

모시모시

이전 포스트