[Update 19 Sep: Funny thing, this actually didn’t work out as planned. I forget to turn tokenization on for these examples. So the below change looks to be a random blip of some sort, not from adding convolution. Going to try to really add it later. Also, the comment below about running the same data over and over again accidentally was sort of true.]
Before switching to major, focus-shifting architecture changes here I figured I’d try something simpler first. From the last update, I’m taking the first option: run a basic 1D convolution on the STFT time-slices to try to make them a bit more frequency-invariant. Eight kernels, kernels size 3, and just concatting them together to create the new input for the K-Means clustering. The previous input, recall, was just the entire STFT time-slice vector as-is.
Read more...