[paper review] Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Jude's Sound Lab·2023년 6월 1일

Paper Review

목록 보기

12/17

https://github.com/zharry29/drums-with-llm

Abstract

we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances.
models that are not pre-trained (Transformer) shows no such ability beyond naive repetition.
Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method

Introduction

Why Drum?

First, the drum set is one of the most common and important instruments in many genres of music such as jazz, funk, blues, gospel, latin, pop, rock, metal, etc.
Second, the symbolic representation of the drum set is simpler than most pitched
instruments, as each note does not have a pitch but corresponds to a hit on one drum.
Third, the performance of a drum set typically is endowed with more degree of freedom with regard to the audience’s aesthetics than many other instruments.

How does it procede?

Finetune a state-of-the-art LLM, GPT3 model on the Groove dataset

music has notes, measures, and sections, while language has tokens, sentences, and paragraphs.

Groove dataset

from https://arxiv.org/abs/2301.01162

Google’s Groove MIDI Dataset is the largest and the most highquality to date, containing 1,150 MIDI files and over 22,000 measures of drumming by 10 professional drummers.

The Groove dataset was originally proposed to study microtiming and expressive performance

we re-purpose the dataset to study drum composition

For simplicity, we only consider those in the time signature of 4/4.
we quantize all notes to a 16-th note grid.
An implication ofsuch quantization is that deliberate off-grid playing such as
triplets or swing feels is lost.
we discard the velocity information
we reduce them to simply the basic, articulation (head hit) of the hi-hat, crash cymbal, ride cymbal, bass drum, snare drum, and floor tom.
we truncate each MIDI file to only the first 16 measures; at 128 BPM, for example, this equates to 30 seconds.
we remove empty leading measures whose first quarter note is a rest, and ignore grooves with less than 8 measures.

Experiment

Drum_roll

To help LLMs identify the boundary between measures, we add a newline of “SEP” between every 16 lines (a measure) and a newline of “END” after the final line.

6 drumset(2^6) x 64 time step

Task

the model is given the first 2 measures and must complete the rest of the 14 measures of the groove.

Model

two naive base-line model

randomly choosing whether to play a note model
repeating the second given measure model

pre-trained GPT3 model

GPT3 Davinci with 175 billion parameters
GPT3 Ada model with 350 million parameters

un-pre-trained GPT3 model

un-pre-trained GPT3 model, namely a Transformer (Vaswani et al. 2017) with the same size as GPT3. : While the training loss does converge, the model predicts the same certain sequence regardless of what 2 measures are provided, performing no better than the random baseline.

Evaluation

Objective Evaluation

Perplexity
Perplexity is a metric used to measure how well a language model predicts a sequence of words or a given text. It quantifies the level of uncertainty or confusion of the model when faced with predicting the next word. A lower perplexity value indicates that the language model is more accurate and confident in its predictions, while a higher perplexity value suggests more uncertainty and less accurate predictions.

from https://towardsdatascience.com/the-relationship-between-perplexity-and-entropy-in-nlp-f81888775ccc
Structural similarity

from Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation https://arxiv.org/abs/2210.10349
Pattern and Fill analysis

There exists one or more consistent patterns of some rhythmic idea and occasional change-ups known as fills.
The measures in a pattern are sufficiently similar, but ideally not identical.
The measures in a fill are sufficiently different from those in adjacent patterns.

To classify each measure as either a pattern or a fill, we take a sliding window of size 3 centered at some measure mi and calculate the edit distances between this measure and its two neighbors

We then calculate the average intra-distance between the two centroids, and the average inter-distance between each measure and the centroid it is assigned to.

Intra-centroid distance: between pattern and fill

Inter-centroid distance: between measures in a group of pattern or fill

from https://arxiv.org/abs/2301.01162

Subjective Evaluation

Concretely, all drum grooves produced via different means are shuffled and randomly present to one of the authors who has had years of training in drumming.

• Is the groove repetitive, meaning there is little or no variation among measures?
• Is the groove consistent, meaning there is some variation among measures but a steady rhythmic idea (specifically, the back-beat placement) can be followed?
• Is the groove chaotic, meaning there is either too much variation, or a lack of a clear rhythmic idea?
• Does the groove contain any reasonable drum fill?

For the smaller GPT3 Ada model, the observation holds to a larger extent, with more inconsistent grooves and less fills.

from https://arxiv.org/abs/2301.01162

Jude's Sound Lab

chords & code // harmony with structure

이전 포스트

[paper review] MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training

다음 포스트

[paper review] Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions

1개의 댓글

Brecken Shaffer

2024년 3월 21일

Embark on a musical journey through the heart of China with its captivating range of traditional instruments. From the enchanting melodies of the Flute to the intricate harmonies of the Guqin, each Chinese musical instruments tells a story of ancient heritage. With West Lake Taobao Agent Shopping Service, acquiring these treasures is convenient, offering global shipping to bring the magic of Chinese music to your doorstep.

답글 달기

[paper review] Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Paper Review

Abstract

Introduction

Why Drum?

How does it procede?

Groove dataset

Experiment

Drum_roll

Task

Model

two naive base-line model

pre-trained GPT3 model

un-pre-trained GPT3 model

Evaluation

Objective Evaluation

Subjective Evaluation

[paper review] MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training

[paper review] Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions

1개의 댓글

관련 채용 정보