Audio Tokens Part 18: The Wrap-Up
Updated 2025-04-22 with a brief intro for context. This project explored whether short-time audio features (STFT slices) could be clustered into symbolic “tokens” and modeled using sequence architectures like BERT. It didn’t work out the way I’d hoped, but I figured out a lot about where this kind of approach breaks down, and what might be worth trying next. (Also, I spent a few days chasing phantom performance gains thanks to a classic extend() vs append() bug.) ⸻ ...