Audio Tokens Part 16: Sanity Checks for Everyone!
audio-tokensI’ve been on a bit of a tear the past few days (right now at commit 3550615). I separated out the Dataset/DataLoader processing into its own classes,moved metrics calculation into its own class, did a bit of cleanup refactoring, and all of that so I could start just sending the raw STFT vectors into models as embeddings, and also add a new dirt-simple baseline model.
All of this to try to figure out if the consistently terrible val mAP results that seem to happen on every variation of model and hyperparameters are just because this idea doesn’t work as-is, or if there might be another bug in the preprocessing pipeline mucking things up.
Turns out that the baseline has almost the same results curves as the LSTM and BERT. Not quite the same, but almost. And every val mAP ends up at 0.05 or below. I want to track this down before I wrap this up.
Today’s plan:
- Look at a few more generated spectrograms.
- Do they look sane? Continue.
- Do they look insane? Fix the spectrograms!
- Try the spectrograms with a standard vanilla CNN of the type that is known to work well on spectrograms.
- Do the results improve significantly? End this round of this project and move on–it doesn’t work as-is.
- Do the results not improve significantly? Keep going.
- Look at the Audioset-string-label to numbered label conversion
- Good? Continue
- Wrong? Fix it!
- Look at the assignment of labels to ytids.
- Good? Continue
- Wrong? Fix it!
- Check the outputs of the model. See if there’s anything off-by-one in there messing with things.
- Good? Continue
- Wrong? Fix it!
- If everything above checks out, maybe I’m just not using enough data. Maybe the balanced training set just isn’t enough with all these categories. Shovel 25-50% of Audioset training data into the pipeline.
- Does it work better? Maybe that’s the issue?
- Does it have the same bad results? Throw MacBook out of window, wrap this up, and try a different project.
All except for the last one I should be able to do today.