Audio Tokens Part 4: More Tokens!

5 Sep 2024 audio-tokens

Well, more types of tokens, anyway. Instead of a limited vocabulary of 50 tokens, let’s try something a little more interesting. Like 1000. [time passes] Ran it, here’s an excerpt:

2024-09-05 18:16:23,258 - INFO - Epoch 9
2024-09-05 18:16:23,258 - INFO - Train Loss: 0.0210, Train F1 (macro): 0.4990, Train F1 (micro): 0.9962, Train Hamming Loss: 0.0038, Train mAP: 0.0824
2024-09-05 18:16:23,258 - INFO - Val Loss: 0.0209, Val F1 (macro): 0.4991, Val F1 (micro): 0.9962, Val Hamming Loss: 0.0038, Val mAP: 0.0826
Training: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1022/1022 [07:40<00:00,  2.22it/s, loss=0.0169]
Validating: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 146/146 [00:20<00:00,  7.22it/s]
2024-09-05 18:24:33,739 - INFO - Epoch 10
2024-09-05 18:24:33,739 - INFO - Train Loss: 0.0210, Train F1 (macro): 0.4990, Train F1 (micro): 0.9962, Train Hamming Loss: 0.0038, Train mAP: 0.0810
2024-09-05 18:24:33,739 - INFO - Val Loss: 0.0209, Val F1 (macro): 0.4991, Val F1 (micro): 0.9962, Val Hamming Loss: 0.0038, Val mAP: 0.0893

That’s…terrible. Let’s try bumping the learning rate from 5e-5 to 1e-3 just to show that we’re serious.

2024-09-05 21:13:08,753 - INFO - Epoch 20
2024-09-05 21:13:08,753 - INFO - Train Loss: 0.0210, Train F1 (macro): 0.4990, Train F1 (micro): 0.9962, Train Hamming Loss: 0.0038, Train mAP: 0.0821
2024-09-05 21:13:08,753 - INFO - Val Loss: 0.0210, Val F1 (macro): 0.4991, Val F1 (micro): 0.9962, Val Hamming Loss: 0.0038, Val mAP: 0.0740
3

That’s even worse. I don’t like you either, AudioSet.