I’ve been fascinated with the MBARI hydrophone recording archive for a while now. It’s from a hydrophone installation in Monterey Bay that’s been recording almost 24/7 since 2015. It’s partly what inspired me to make AudioLoop–all that data deserves to be made into some useful labeled datasets.
I’ve been curious to see how well Perch 2.0 does at extracting useful embeddings from these recordings. A Google DeepMind paper claims that “despite having almost no marine training data,” Perch 2.0 performs well at marine species classification.
I thought I’d try that with some raw1 MBARI recordings. I used an older Google/NOAA humpback detector model and pulled out 5-second clips from a particular MBARI 24-hour audio file (December 21, 2016 in this case), then bucketed them into various humpback-detection confidence levels based on the model’s reported confidence level.
Then I fed some clips into Perch 2.0 to create embeddings. For the first round, I stuck to the >90% (likely humpback) and <5% (likely not-humpback) buckets as a first-pass sanity check. I then took ten clips from each of those buckets, sent them through Perch, and made a PCA plot of the Perch embeddings.
And we get this:

Which shows that, even given MBARI clips with a high noise floor, Perch seems to separate whale and not-whale pretty well2. You can hear the clips below.3
What’s more interesting is that the non-humpback clips seem to be split into two clusters, and that they aren’t just PCA phantom clusters that fall apart on closer inspection. Clips 3-8 are only background noise (the ocean is noisy), but the slightly-separated group of clips (1, 2, 9, 10) all have dolphin vocalizations! Clips 1 and 2 have prominent vocalizations, 9 and 10 much quieter ones.
So just using raw, unpreprocessed, noisy clips from the MBARI hydrophone, Perch 2.0 appears to separate noise-only, humpback, and dolphin sounds into their own embedding clusters.
Clips
The only preprocessing applied was removing the DC offset in the originals. ↩︎
It’s important to note here that Perch is technically not verifying ground-truth whale call presence, but labels inferred from the results of the previous humpback detector model. In this very limited set of clips, you can verify their ground truth yourself by listening to them. But at the very least Perch’s embeddings are expressing some of the same inherent audio features as the humpback-specific model. ↩︎
These clips have been scaled up to a reasonable listening level. ↩︎