[논문 리뷰] Over-the-Air Deep Learning Based Radio Signal Classification - 1편

이우준·2021년 7월 22일

Cognitive Radio Deep Learning Modulation Neural Networks

Abstract

본 논문은 radio communication 신호들을 위한 deep learning 기반 radio signal classification 성능에 대해 연구를 진행하였다. Baseline method로는 gradient tree classification과 higher order moments를 사용하였고, 다양한 구성과 channel impairments(손상)에 걸쳐 두 접근 방법 간의 성능을 비교했다. 또한 carrier frequency offset, symbol rate, multipath fading의 영향을 simulation으로 확인했고, software radios를 사용하여 radio classification performance의 over-the-air measurement를 수행한 다음, 둘에 대한 performance와 training strategy를 비교했다. 마지막으로, 남아있는 problem들과 이러한 기술들을 사용하기 위한 설계 고려사항에 대해 논의하면서 논문은 끝이 난다.

Introduction

수많은 RF 정보의 복잡하고 높은 data rate을, 정확한 label로 분류하여 전달하는 것은 오늘날 많은 radio sensing 및 communication systems에서 중요한 요소이다. 그리고 많은 시간 동안 radio signal의 분류와 modulation의 인식은 특정 신호의 type이나 특성들에 대한 feature extractor를 직접 만든 뒤, analytical 하게 유도된 decision boundary나 low dimensional feature spaces 내에서 통계적으로 학습한 boundary를 사용하여 진행되었다.

한편, DL을 통해 많은 parameter들로 구성된 large neural network의 학습이 가능해지면서 high level supervised objectives를 기반으로, raw high dimensional input data로부터 feature learning을 직접 할 수 있게 되었다.

위와 같은 관점에서 ML의 트렌드는 end-to-end feature learning을 사용함으로서 data로부터 얻어낸 보다 정확한 high degrees-of-freedom (DOF) model로 근사를 진행하고, 단순한 rigid analytic feature들을 대체하는 방향으로 끊임없이 발전하고 있다. 그런데 이러한 트렌드는 vision이나 test processing, voice 영역에서 입증되었지만, 아직 radio time series dataset에 대해서는 광범위하게 적용되거나 완전히 실현되지 않았다.

본 논문에서 저자는 추가적인 radio signal 유형에 대해 확장된 dataset과 무선 전파 환경에 관한 보다 현실적인 simulation을 제공한다. 또한 새로운 dataset에 대한 over-the air (OTA) measurement (즉, real propagation effects), signal classification task를 위해 처음에 도입했던 method들보다 훨씬 성능이 뛰어난 새로운 methods, 많은 현실적인 engineering 설계를 비롯하여 radio signal classifier의 정확도, 성능에 영향을 주는 system parameter들에 대한 심층적인 분석을 제공한다.

Background

Baseline Classification Approach

Statistical Modulation Features

Digital modulation 기법들에 있어서 higher order statistics와 cyclo-stationary moments는 특정 modulation에 대한 carrier의 구조, symbol timing, symbol 구조에 의해 생성되는 storng periodic components를 가진 신호를 탐지하기 위해 가장 널리 쓰이는 feature 이다.

이러한 구조의 정확한 지식을 종합함으로써, auto-correlation function (ACF)와 spectral correlation function (SCF) 표면 peak 값들의 기대값은 알 수 없는 data로 구성된 신호의 robust classification를 제공하는데 성공적으로 사용되어 왔다.

Symbol timing이 별다른 artifact들을 만들어내지 않는 analog modulation에서는 다른 통계적 feature들이 signal classification을 수행하는 것에 유용하다.

논문에서는 baseline feature를 구하기 위해, 여러 compact higher order statistics (HOSs)를 이용했다. 그리고 이러한 값을 얻기 위해 다음의 수식으로 higher order moments (HOMs)를 계산했다.

M_{pq} = E[x^{p-q} (x^*)^q]

이러한 HOMs로부터 우리는 많은 higher order cumulants (HOCs)를 유도할 수 있는데, HOCs는 많은 modulation type들에 대한 효과적인 판별자(discriminators)로 알려져있다. HOMs를 결합하여 HOCs를 계산할 수 있으며, 각 표현식은 약간씩 다르다. 다음은 $C_{40}$ HOC에 대한 표현식의 한 예제이다.

C_{40} = M_{40}-3M^2_{20}

실제로는 feature scaling을 개선하기 위해 quadratically scaled HOCs (e.g. $\sqrt{C_{40}}$ )이 사용될 수 있다. 추가로 우리는 유용한 여러 statistical behavior들을 파악할 수 있는 많은 analog feature 들을 고려할 수 있다. 여기서 말하는 statistical behavior들에는 평균, 표준 편차, 순간 주파수 (instantaneous frequency), kurtosis (뾰족한 정도) of the normalized centered amplitude, absolute normalized instantaneous frequency 등 이전 연구들에서 유용하게 사용되었던 요소들이 해당된다.

Decision Criterion

논문의 baseline feature들을 class label로 mapping 할 때, 여러 개의 작은 ML 혹은 analytic decision process가 사용될 수 있다.

자주 사용되는 방법들로는 support vector machines (SVM), decision trees (DTrees), neural networks (NNs), 그리고 성능을 개선시키기 위해 classifier들을 결합한 ensemble method 등이 있다. Ensemble method들 중에서는 boosting, bagging, gradient tree boosting이 자주 쓰이는데, 그 중에서도 특히 XGBoost는 수많은 Kaggle data science 대회에서 우승을 차지했을 정도로 성공적으로 사용되는 효과적인 gradient tree boosting 구현 방식이다. 논문에서는 어떠한 single decision tree나 SVM 보다도 좋은 성능을 보이는 XGBoost 방법을 feature classifier로 선택했다.

Radio Channel Models

Wireless channel을 modeling 할 때, propagation 효과를 위한 많은 compact stochastic model들이 존재한다. 어떠한 무선 채널에서도 볼 수 있는 주요 impairments는 다음과 같다.

Carrier frequency offset (CFO)
서로 다른 local oscillators (LOs)와 motion (Doppler)로 인한 carrier phase 및 frequency offset.
Symbol rate offset (SRO)
다른 clock source들과 motion으로 인한 symbol clock offset과 time dilation (확장).
Delay Spread
Multi-path에서의 delayed reflection, diffraction, diffusion으로 인한 non-impulsive delay spread.
Thermal Noise
Receiver 단에서의 physical device sensitivity로 인한 additive white-noise impairment.

각각의 효과는 잘 modeling 될 수 있으며, 모든 wireless propagation medium에 어떠한 형태로 존재한다.

Deep Learning Classification Approach

오늘날 DL은 많은 parameter로 표현된 neural network model을 최적화하기 위해 SGD를 활용한 back-propagation에 의존하고 있다.

여러 개의 layer들로 구성된 neural networks를 보자. 각 layer가 input $h_0$ 를 output $h_1$ 으로 mapping 한다고 가정하고 weight를 $W$ , bias를 $b$ 라 두면 다음의 식이 성립한다. 이때 각각의 dimension은 $W$ 가 $|h_0 \times h_1|$ , $b$ 는 $|h_1|$ 이고, activation (ReLU)은 $|h_1|$ 에 element-wise로 적용되며 layer parmeter는 $\theta$ 이다.

h_1 = \max(0,h_0W+b)

Loss function ( $\mathcal{L}$ )은 classification task 이므로 categorical cross-entropy를 사용한다. $\mathcal{L}$ 을 이용하면, class labels $y_i$ 와 predicted class values $\hat{y}_i$ 에 대해 둘 사이 error gradients를 다음과 같이 구할 수 있다.

\mathcal{L}(y, \hat{y}) = - \frac{1}{N} \sum\limits_{i=0}^{N} [y_i \log(\hat{y}_i) + (1-y_i) \log (1-\hat{y}_i)]

Loss gradients의 back propagation을 통해 network $f(x,\theta)$ 의 layer weights $\theta$ 는 수렴할 때까지 맞춰진다.

\theta_{n+1} = \theta_{n} - \eta \frac{ \partial \mathcal{L} (y, f(x,\theta_n)) }{\partial \theta_n}

Training data에 over-fitting 하는 것을 줄이기 위해 regularization이 필요한데, 논문에서는 convolutional layer를 위해 batch normalization, FC layer를 위해서는 Alpha Dropout을 사용했다.

Reference

O’Shea, Timothy James, Tamoghna Roy, and T. Charles Clancy. "Over-the-air deep learning based radio signal classification." IEEE Journal of Selected Topics in Signal Processing 12.1 (2018): 168-179.

이우준

이전 포스트

[논문 리뷰] Federated Learning: Challenges, Methods, and Future Directions - 1편

다음 포스트