[Symbolic-Music-Encoding]#5.5 paper review, TunesFormer: Forming Irish Tunes with Control Codes by Bar Patching

Clay Ryu's sound lab·2024년 1월 30일
0

Projects

목록 보기
35/43

TunesFormer: Forming Irish Tunes with Control Codes by Bar Patching

by Shangda Wu, Xiaobing Li, Feng Yu and Maosong Sun

Introduction

a Transformerbased dual-decoder model that combines bar patching and control codes to efficiently generate expressive Irish music in ABC notation


from TunesFormer, Shangda Wu et al.

Contribution

  • As a dual-decoder model based on bar patching, TunesFormer significantly accelerates
    generation speed while maintaining the quality of the generated music.
  • TunesFormer enables users to generate melodies with diverse musical forms, providing
    flexibility and alignment with artistic vision through control codes.
  • To support future research, we release the Irish Massive ABC Notation (IrishMAN) dataset, an open-source collection of 216,284 Irish tunes in the ABC notation format.

Methodology

TunesFormer

Given 𝐿 as sequence length and 𝑃 as patch size, bar patching reduces the patch-level decoder complexity from O(L2)O(L^2) to O(L2/P2)O(L^2 / P^2). Meanwhile, the character-level decoder complexity becomes 𝑂(𝐿𝑃)𝑂(𝐿𝑃) <- O(P2)(L/P)O(P^2) * (L / P). Considering 𝑀 and 𝑁 as parameter sizes for patch and character-level decoders respectively, computational need shifts from (M+N)L2(M + N) * L^2 to M(L2/P2)+NLPM * (L^2 / P^2) + N * LP.

Control Codes

from CTRL: A Conditional Transformer Language Model for Controllable Generation, Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher

  • S:number of sections - Dictates melody sections, ranging 1-8 (e.g., S : 1 for a singlesection melody, and S : 8 for a melody with eight sections), based on symbols like [ |, | |, | ], | :, : :, and : | used to represent section boundaries.
    • B:number of bars - Sets number of bars within a section. It counts on the bar symbol | .
    The range is 1 to 32 (e.g., B : 1 for a one-bar section, and B : 3 2 for a section with 32 bars).
    • E:edit distance similarity - Manages similarity between section 𝑐 and previous section
    𝑝. Derived from Levenshtein distance [16] 𝑙𝑒𝑣 (𝑐, 𝑝), it measures section differences:
    eds(c,p)=1lev(c,p)max(c,p)eds(c,p) = 1 - \frac {lev(c,p)}{max({c},{p})}
    where |𝑐| and |𝑝| are the string lengths of the two sections. It is discretized into 11 levels, ranging from 0 to 10 (e.g., E : 0 for no similarity, and E : 1 0 for an exact match). For the 𝑁-th section, there are 𝑁 − 1 previous sections to compare with.

Dataset

216,284 Irish ABC tunes sourced from thesession.org and abcnotation.com

Uniformity is maintained by converting tunes to XML and back using scripts(https://wim.vree.org/svgParse/index.html)

Training code

huggingface transformer

looks easy to carry, huggingface is a good tool

data for feeding


patch to embedding

This is tricky part. The auther flatten each character using one-hot vector, and add linear layer on it.

scheduler

update step

Experiment

RWKV


from Efficient Transformers: A Survey, Yi Tay et al.

Metric

two objective metrics

  • Efficiency: The number of tokens generated per second on an RTX 2080 Ti.
  • Controllability: Quantifying control precision by comparing edit distance between generated and actual control codes.

comparative evaluations

Thirteen Irish musicians compared melody pairs one from thesession.org with chord symbols, and a modelgenerated continuation from the initial two bars.

  • Engagement: Captivating to the ear, evokes emotional resonance, and maintains the listener’s interest.
  • Authenticity: Representing the distinctive characteristics of Irish traditional music.
  • Harmoniousness: Creating a natural flow that unifies melody and harmony into a
    cohesive and pleasing musical experience.
  • Playability: Well-suited for performance and offers a wide range of playing techniques.

Question

validity in data split

making embedding

the difference between using nn.Embedding and one-hot+linear

profile
chords & code // harmony with structure

0개의 댓글