[SIGGAPH_2020] Local motion Phases for Learning Multi-Contact Movements

eric9687·2022년 4월 15일

Pose Estimation 논문 리뷰

목록 보기

2/12

consists of motion prediction network and gating network.
gating network
- computes a set of expert weights
- learn how to dynamically combine expert weights via blending coefficients to construct the motion prediction network.
enhancements
- training with local motion phases
- a generative control model that takes as input the raw high-level user control commands and generates a sharpe variety of control signals.
local phases는 각 뼈마다 계산되어짐
Genearative Control Model: 사용자의 control signals로 부터 다양한 움직임을 만든다.
하나의 input에 대응 하는 여러 모션들이 있을때, 평균 모션은 애매함이 존재할 수 있는데, 이를 해결하기위해서 generative modiel의 입력으로 control signal에 노이즈를 추가한다.

Input X
: P => Gating Network, V =>Generative controller, 나머지 => motion prediction Network
- Character State(S): 26개 뼈의 위치, 회전도, 속도
- Control Variables(V):
  - Root trajectory: root 궤적의 위치, 방향, 속력
  - Interaction Vector: 드리블의 방향높이를 표현할 수 있는 pivot vector와 미분값
  - Acvtion Variables: Idle, movem control hold를 표현.
- Conditioning Features(F):
  - Ball Movements: 공의 과거 위치, 속력, control weight.
  - Contact Information: 손,발,공의 접촉 상태.
- Opponent information(R):
  - 5m 주위의 있는지 여부
  - 5m안의 두 캐릭더의 26개의 joint 거리
  - 적의 궤적 위치, 방향, 속도
- Local Motion Phases(P): 주요 5개 bone phase
Output Y
- Character States(S): 다음 프레임의 26개 뼈의 포즈와 속도
- Future Control Variables(V): 유저의 control signal도 더해짐
- Conditoning Features(F): 다음 프레임의 공과 접촉 여부
- Local Motion Phase Updates: 다음 프레임의 주요 5개 bone phases

phase variables는 발의 접촉 패턴에 따른 시작과 끝의 포즈를 베이스로 정의되며, 시간과 움직 사이의 강한 연결성에 의미가 있다.
신경망이 높은 퀄리티의 모션을 만들어 낼 수 있도록 한다.
- 각 단계 상태의 포즈, 행동, 이동의 부분집합을 clustering하기 때문
- autoregressive motion generator는 시간안에 맞지 않는 움직임이나 디테일이 없어지는 막힘 현상이 잘 일어나는데, 이 방법은 시간안에 에니메이션이 계속 움직일 수 있도록 한다.
이전 연구의 global phase는 unstructures data에 확장성이나 실용성이 부족하다.
local phase는 정확도와 디테일면에서 좋은 결과를 보인다.
motion capture data로 local motion phase 계산하는 방법
- motion prediction애서 다음 프레임의 phase생성
- 정규화
- low-pass filtering
- cyan curve을 이용한 피팅
- 축약

: 유저의 컨트롤 시그널로 부터 실제같은 다양한 움직임을 만든다.

인코더-디코더 네크워크를 통해 컨트롤 시그널을 latent vector로 만들고 noise를 추가하여 다양성을 만든다.
motion capture data로 훈려됨.
구성:
- 인코더: autoregressive control signal과 input signal을 섞은 부드러운 ㅇ궤적을 입력으로 받아 latent vector로 변환
- 디코더: latent vector를 control signal로
- Discriminator: ground truth로 부터 만들어진 control signal을 구분
- 이 구성을 통해 manifold를 형성.

그러나 먼저 된 자로서 나중되고 나중 된 자로서 먼저될 자가 많으니라(마:19:30)