[딥러닝 논문 리뷰와 알고리즘 공부] #6 Taking the Models back to Music Practice

Jude's Sound Lab·2022년 6월 29일

Paper Review

목록 보기

6/17

Taking the Models back to Music Practice:

Evaluating Generative Transcription Models

built using Deep Learning

by Bob L. Sturm and Oded Ben-Tal

folk-rnn에 대한 논문입니다. 중요하다고 생각되는 부분만 잘라내보았습니다.

Abstract

모델을 5가지의 방식으로 평가하고자 한다. 1)population level?에서 3만개의 생성된 악보를 2만3천개의 샘플과 비교한다. 2)practice level실전의 단계에서 음악으로서의 가치를 평가한다. 3)nefarious tester깐깐한 검수자들의 의견을 통해서 모델이 한계를 넘어섰는지를 살핀다. 4)in the context of assited music composition작곡에 도움을 줄 수 있는 맥락에서 음악생성의 역략을 본다. 5)실제 작업의 세계에서 사용한다.
Our work attempts to demonstrate new approaches to evaluating the application of machine learning methods to modelling and making music

1 Introduction

we apply deep learning to model high-level textual music transcriptions within a practice termed “session”, e.g., traditional dance music found in Ireland and the UK.
char-rnn과 folk-rnn의 두 가지의 모델로 접근했다. Both char-rnn and folk-rnn are generative models, and so can be used to generate new transcriptions that reflect the conventions inherent to those in a training dataset. 트레이닝 데이터셋에 내재된 관습을 반영하는 악보를 만든다.

we are interested in determining or delimiting what this model has actually learned, as well as its applicability to composing music. 모델이 가지는 능력 자체를 그려내며 작곡 능력을 평가하는 것에 초점을 맞추는듯 하다.

We emphasise that folk-rnn is not modelling music, but instead a highly reductive abstraction removed from what one perceives as music. 결국 어떤 것이 음악이 되느냐를 고민하는 것 같다. 음악의 구조적인 틀을 복사해 낸다고 그것이 음악이 되는 건가? 그런것 같은데...

We thus limit our interrogation of the model to how well it understands the use of or meaning behind the elements of its vocabulary, their arrangement into larger units, and formal operations such as counting, repetition and variation 내 생각엔 모델이 시도하는 분석의 과정을 현상에 대한 언어의 구조화에 비유하는 것 같다. 아나 어렵게 쓰시네 철학책 읽는듯한...

하고 싶은거 세줄 요약
1) determining what it is actually learning to do
2) determining how useful it is in music practice
3) how to make it more usable for music practice.

2 Previous Work in Music Modelling and Generation using Recurrent Neural Networks

Descriptions of evaluation in research applying recurrent neural networks to music modelling and generation

3 Evaluation of the folk-rnn Model

5가지의 방식으로 folk-rnn을 평가한다고 한다.

3-1. Statistical Analysis of Outputs

Fig.1 compares the occurrence of specific metres, modes and number of tokens in the transcriptions

We see the model appears biased to generating transcriptions with common metre (4/4)
We see the model is a little biased togenerating transcriptions specifying dorian mode, and less so the major mode
we see that the model is greatly biased to generating transcriptions that are 140-155 tokens long.

Fig.2 shows the distribution of pitches in transcriptions of each mode

it is biased toward producing pitch tokens lower in pitch than B

Fig.3 shows the distribution of pitch classes (scale degrees) for transcriptions denoting each mode

As expected for this kind of tonal music, the root and fifth scale degrees are the most common pitch classes.

We now look at the variety of “measure token” sequences in the transcriptions as a means of assessing their forms, e.g., explicit repetition, phrase lengths, etc.

unique sequences로 생성된 멜로디를 비교하는데 training에서 자주 반복된 것은 생성 모델에서도 확실히 확인이 된다.

3-2. Musical Analysis of Outputs

램덤하게 샘플링한 표본들을 음악적으로 분석하는 내용이다.

장점
it has learned to some extent fundamental aspects of this kind
of music
The model also appears able to produce basic cadences, though these do not always work.
The folk-rnn model also appears to have learned to some extent aspects of composing homophonic melodies

단점
The harmonic implications of melodic patterns tend to be poor, which leads to weak or otherwise flawed cadences.
The folk-rnn model appears able to manipulate short melodic patterns and is also able to generate a conventional form, but is not able to relate these aspects as they are in the training data transcriptions.
but does not show an understanding of metre with its strong and weak beats in relation to melody and harmony

3-3. Nefarious Testing

모델이 bar를 잘 구분짓고, 여기에 작은 패턴들을 결합하고 반복하면서 큰 스케일을 만드는 능력을 가진 것을 보았다. 이 능력을 좀더 테스트 하기 위한 방법을 제시한다. we observe how the model behaves when we seed it with materials that are outside the conventions it has supposedly learned.

Jude's Sound Lab

chords & code // harmony with structure

이전 포스트

[딥러닝 논문 리뷰와 알고리즘 공부] #5 DARTS: Differentiable Architecture Search

다음 포스트