from https://shaankhosla.substack.com/p/talking-tokenization
since music songs are more structural (e.g., bar, position) and diverse (e.g., tempo, instrument, and pitch), encoding symbolic music is more complicated than natural language
from Compound Word Transformer, arxiv
Simply adopting the pre-training techniques from NLP(e.g., BERT) to symbolic music only brings marginal gains
Experiments demonstrate the advantages of MusicBERT on four music understanding tasks, including melody completion, accompaniment suggestion, genre classification, and style classification.
- due to the complicated encoding of symbolic music, the pre-training mechanism (e.g., the masking strategy like the masked language model in BERT) should be carefully designed to avoid information leakage in pre-training.
- as pre-training relies on large-scale corpora, the lack of large-scale symbolic music corpora limits the potential of pre-training for music understanding.
from MusicBert
from MusicBert
Column, mask all token classes in Octuple token
80% of them are replaced with [MASK], 10% of them are replaced with a random element, and 10% remain unchanged, following the common practice
from MusicBert
from MusicBert
from MusicBert
from PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music arxiv
Melody completion (Liang et al., 2020) is to find the most matched consecutive phrase in a given set of candidates for a given melodic phrase.
There are 1,793,760 data pairs in the training set and 198,665 data groups in the test set in this task (Liang et al., 2020).
from MusicBert
This is a sequence-level (period-level) evaluation task, with a concrete application of PiRhDy-melody. There are 1,784,844 pairs in the training dataset, and 199,270 pairs in the testing dataset.
from PiRhDy
Accompaniment suggestion (Liang et al., 2020) is to find the most related accompaniment phrase in a given set of harmonic phrase candidates for a given melodic phrase.
Genre classification and style classification (Ferraro and Lemstrom¨, 2018) are multi-label classification tasks. Following Ferraro and Lemstrom¨ (2018), we use the TOP-MAGD dataset for genre classification and the MASD dataset for style classification. TOP-MAGD contains 22,535 annotated files of 13 genres, and MASD contains 17,785 files of 25 styles
ground truth : [Song A, Song B, Song C]
prediction : [Song B, Song D, Song A]
Mean Average Precision(MAP)
take mean over all those values like query / users
Let's say k=2,
We look into our predicted list up until position 'k' and see if any of those recommendations match with ground truth data.
In this case [Song B, Song D] are in top two positions out of which 'Song B' does match with ground truth data so we can say that there was a "hit" within top two positions or simply HITS@2 equals True or equals to "a hit".
Looking at our data:
Song 1: Actual genres [Rock, Pop], Predicted genres [Rock]
Song 2: Actual genres [Pop], Predicted genres [Pop]
Song 3: Actual genres [Jazz], Predicted genres [Jazz, Rock]
Song 4: Actual genres [Rock, Jazz], Predicted genres [Rock]
Song 5: Actual genre [Pop], Predicted genre [Pop]
We should count the True Positives (TP), False Positives (FP), and False Negatives (FN) as follows:
True Positives (TP): These are the instances where our model correctly predicted the genre of a song. From above data TP = 5.
False Positives (FP): These are instances where our model incorrectly predicted an extra genre for a song. From above data FP = 1.
False Negatives (FN): These are instances where our model failed to predict a correct genre for a song. From above data FN =2.
Next we calculate Precision and Recall:
Precision = TP / (TP + FP) = 5 / (5 + 1) ≈0.83
Recall = TP / (TP + FN) =5 / (5 +2)=0.71
Finally we plug these into our F1 score formula:
F1_micro
=2x((precisionrecall)/(precision+recall))
=2((0.83x0.71)/(0.83+0.71))
≈0.77
We conduct experiments on MusicBERTsmall with a maximum sequence length of 250 due to the huge training cost of MusicBERTbase.
from MusicBert
from MusicBert
from MusicBert
In the heart of Miami, Mariachi Miami Si Señor sets the stage for an extraordinary musical experience. Feel the pulse of joy and passion as the best mariachis serenade you with authentic tunes. Join us on this unforgettable journey where every note resonates with the fire of true Mariachi music. Say '¡Si señor!' to moments that linger forever.