Audio Tokens Part 10: Ruling Out the Obvious, Leaving Only the Obvious

20 Sep 2024 audio-tokens

I spend most of yesterday scouring the code for train/dev data leakage, and not much luck. I also fixed up the 1D pre-processing convolution I thought I had got working before so it works now. But back to the weird thing about those high metrics.

I also got most of the code ready for using more of the AudioSet training data, which mostly meant batching things instead of keeping the entire active training dataset in memory.

Then I ran what I assumed was the same code again, the code that got the 0.41 val mAP. Here’s the new one against the previous:

It’s still way better than before, around 0.3 val mAP, but that’s significantly different from 0.41. Three things come to mind here:

The code changed more than I think it did during my refactoring. I don’t think this one is true, since the basic workflow is the same.
The training set is small enough to produce big variations in the results. This one seems likely.
The random numbers in cluster creation and training are causing some of those variations, too. Also seems possible.

Number 3 is the quickest fix, with set_seed():

def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

Going to run that everywhere with the actual seed in the config file.

Let’s try running this with a few different seeds. I’m going to start noting the commit hash on these runs, which I probably should have been doing before now. This one: e4fd3a1. ONLY changing the random seeds for these runs (42, 666, 2727)–not making a new commit for each run.

Question answered. Seed changes don’t alter the results here in any serious way, even with a small training set.

Our candidates for the mysterious change are now code changes and training set size. To test the training set size, I’ll generate a new small train/val split. If I get the same results, then the issue is probably a code change. Generating train/val split with a different random seed.

(Of course, I shouldn’t do this too often, since all of my train/val sets are coming from the same larger training set. But the current train/val sets are about 20K examples out of 2MM in the training set, so I think leakage should be minimal.)

Here’s the new set vs the old set:

Changing the training set gives the same essential results.

The difference between this 0.30 val mAP and the 0.41 val mAP from two days ago must be in the code or config.

[…time passes]

Checked out commit a98862b, and the .41 is back. Now to figure out what the difference is between that code and the current code (4db3dbf). So I’m not sure what triggered the 0.15 -> 0.41 jump yet. Something before that commit.