Audio Tokens Part 14: Back to Square 0.01
audio-tokensFor reasons that may be obvious from the previous post, I needed to take a day off. Way too much time spent on a Python append/extend mixup for my taste.
That bug has been in ModelTrainer from the initial commit. So all the metrics from the past few weeks were invalid. Now that that is fixed, the real, unadulterated results from the current setup are really, really, really not good.
I’m going to see if I have anything resembling progress with what I have so far. Moving back to the balanced training set only because it’s likely a bit easier to get some traction with. Commit e73d3d8. BASELINE IT!
Now THAT is what you call a baseline. Nowhere to go but up.
But first, the metrics are now a little wonky for a different reason: a quirk of AudioSet is that there are classes that don’t actually have any members. In the ontology file they’re listed in a “restrictions” list field with possible values of “abstract” and “blacklist”. But the metrics code assumes all classes have positive values, and evaluation behavior is kind of undefined in classes that don’t have any positive examples. So it’s time to remove those classes from the list of 631 classes we’ve been using so far.
(“We”? Who’s that? Me and the squirrels here in the Botanical Garden? They’re not really contributing much.)
Updating AudiosetMetadataProcessor to filter those out.
Also need to change the mAP calcuation in general because since I’m using a non-stratified 90/10 train/val split of the (small, ~20K) Audioset unbalanced_train dataset, there are going to be some of the 543 remaining classes not present in training or validation, which will trigger warnings and make the mAP a little wonky.
And let’s get rid of those other metrics for now. I’ll bring them back later if I need them. Honestly, I haven’t even been logging them to wandb so far.
[…time passes]
With a new mAP calculation that excludes classes with no positive examples, compared to the previous mAP calculation(commit 4e149fc):
Just need about 100 more metric calculation adjustments like that and we’ll be ready to publish. But that looks like a reasonable bump from changing the metric to add the exclusion. Can’t test for a label if it’s not there.
Huh.