[paper review] Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Jude's Sound Lab·2023년 6월 1일
1

Paper Review

목록 보기
12/16
post-thumbnail

https://github.com/zharry29/drums-with-llm

Abstract

  • we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances.

  • models that are not pre-trained (Transformer) shows no such ability beyond naive repetition.

  • Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method

Introduction

Why Drum?

  • First, the drum set is one of the most common and important instruments in many genres of music such as jazz, funk, blues, gospel, latin, pop, rock, metal, etc.

  • Second, the symbolic representation of the drum set is simpler than most pitched
    instruments, as each note does not have a pitch but corresponds to a hit on one drum.

  • Third, the performance of a drum set typically is endowed with more degree of freedom with regard to the audience’s aesthetics than many other instruments.

How does it procede?

Finetune a state-of-the-art LLM, GPT3 model on the Groove dataset

  • music has notes, measures, and sections, while language has tokens, sentences, and paragraphs.

Groove dataset


from https://arxiv.org/abs/2301.01162

Google’s Groove MIDI Dataset is the largest and the most highquality to date, containing 1,150 MIDI files and over 22,000 measures of drumming by 10 professional drummers.

  • The Groove dataset was originally proposed to study microtiming and expressive performance

we re-purpose the dataset to study drum composition

  • For simplicity, we only consider those in the time signature of 4/4.
  • we quantize all notes to a 16-th note grid.
  • An implication ofsuch quantization is that deliberate off-grid playing such as
    triplets or swing feels is lost.
  • we discard the velocity information
  • we reduce them to simply the basic, articulation (head hit) of the hi-hat, crash cymbal, ride cymbal, bass drum, snare drum, and floor tom.
  • we truncate each MIDI file to only the first 16 measures; at 128 BPM, for example, this equates to 30 seconds.
  • we remove empty leading measures whose first quarter note is a rest, and ignore grooves with less than 8 measures.

Experiment

Drum_roll

To help LLMs identify the boundary between measures, we add a newline of “SEP” between every 16 lines (a measure) and a newline of “END” after the final line.

6 drumset(2^6) x 64 time step

Task

the model is given the first 2 measures and must complete the rest of the 14 measures of the groove.

Model

two naive base-line model

  1. randomly choosing whether to play a note model
  2. repeating the second given measure model

pre-trained GPT3 model

  1. GPT3 Davinci with 175 billion parameters
  2. GPT3 Ada model with 350 million parameters

un-pre-trained GPT3 model

  1. un-pre-trained GPT3 model, namely a Transformer (Vaswani et al. 2017) with the same size as GPT3. : While the training loss does converge, the model predicts the same certain sequence regardless of what 2 measures are provided, performing no better than the random baseline.

Evaluation

Objective Evaluation

  1. Perplexity
    Perplexity is a metric used to measure how well a language model predicts a sequence of words or a given text. It quantifies the level of uncertainty or confusion of the model when faced with predicting the next word. A lower perplexity value indicates that the language model is more accurate and confident in its predictions, while a higher perplexity value suggests more uncertainty and less accurate predictions.


    from https://towardsdatascience.com/the-relationship-between-perplexity-and-entropy-in-nlp-f81888775ccc

  2. Structural similarity



    from Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation https://arxiv.org/abs/2210.10349

  3. Pattern and Fill analysis

  • There exists one or more consistent patterns of some rhythmic idea and occasional change-ups known as fills.
  • The measures in a pattern are sufficiently similar, but ideally not identical.
  • The measures in a fill are sufficiently different from those in adjacent patterns.

To classify each measure as either a pattern or a fill, we take a sliding window of size 3 centered at some measure mi and calculate the edit distances between this measure and its two neighbors


  • We then calculate the average intra-distance between the two centroids, and the average inter-distance between each measure and the centroid it is assigned to.

Intra-centroid distance: between pattern and fill

Inter-centroid distance: between measures in a group of pattern or fill



from https://arxiv.org/abs/2301.01162

Subjective Evaluation

Concretely, all drum grooves produced via different means are shuffled and randomly present to one of the authors who has had years of training in drumming.

• Is the groove repetitive, meaning there is little or no variation among measures?
• Is the groove consistent, meaning there is some variation among measures but a steady rhythmic idea (specifically, the back-beat placement) can be followed?
• Is the groove chaotic, meaning there is either too much variation, or a lack of a clear rhythmic idea?
• Does the groove contain any reasonable drum fill?

For the smaller GPT3 Ada model, the observation holds to a larger extent, with more inconsistent grooves and less fills.


from https://arxiv.org/abs/2301.01162

profile
chords & code // harmony with structure

1개의 댓글

comment-user-thumbnail
2024년 3월 21일

Embark on a musical journey through the heart of China with its captivating range of traditional instruments. From the enchanting melodies of the Flute to the intricate harmonies of the Guqin, each Chinese musical instruments tells a story of ancient heritage. With West Lake Taobao Agent Shopping Service, acquiring these treasures is convenient, offering global shipping to bring the magic of Chinese music to your doorstep.

답글 달기