In the last post, it started to look like Perch 2.0, at least on less-definitely humpback MBARI hydrophone clips, doesn’t reveal much whale/non-whale structure in its embeddings.

But a quick listen to some of the positive clips raises an important point: humpbacks make a lot of different types of sounds.1 It’s possible that the embeddings are tied more to the qualities of the individual calls than to the presence or absence of a call at all.

So it would make sense to do some more by-ear listening to some nearest-neighbor groups and see if they share any common audible characteristics.

I’m still using the older humpback-specific detector model to classify clips into various confidence-level buckets (0-10% for strongly no-whale, 90-100% for strongly whale, and so on).

For this listening I’ll take some obviously (both to the detector and me) humpback-containing audio clips with different kinds of humpback sounds, and compare them to their nearest neighbors in the Perch embeddings.

For the last tests, I used 200 clips as the mini-corpus for producing similarity scores, split evenly between likely-positive and likely-negative. For this one, I’m going to use significantly more, since I’m looking for finer-grained distinctions. I’ve pulled out ~9000 clips from 12/21/16, spread fairly evenly across buckets.


A quick digression: Let’s look at the actual clip distribution across buckets.

bar chart of clip distribution

That’s pretty U-shaped. The detector for this day is often quite confident of the presence or absence of a whale sound.


Here’s a pretty distinct clip to start with. It’s a sort of bloop:2

7695

And the five closest clip embeddings by cosine similarity:

7697(0.95)
7693(0.93)
7696(0.88)
7759(0.88)
6107(0.86)

Critically here, the first three of the five are within 60 seconds of the original clip,3 which doesn’t make this a terribly useful comparison as-is. Those clips could very well be the same whale making the same sounds over that particular minute.

So let’s get the next ten closest embeddings:

7939(0.84)
6108(0.84)
8271(0.83)
7694(0.83)
8200(0.82)
7803(0.82)
6239(0.80)
7695(0.79)
7881(0.79)
8091(0.79)

That’s a more varied set of clips, from different times of the day.

And every single one of them has a bloop or two.

Perch embeddings seem to be mapping distinct bloop clips close by in the embedding space.


  1. It doesn’t appear that the field has really settled on a taxonomy for whale vocalizations yet. I’ll just try to describe them as I go. ↩︎

  2. It’s still notable that in many years of 24/7 MBARI hydrophone recordings, very few clips have signals as clear as those from December 21, 2016. It’s quite the day: a bit of an open-hydrophone night at the cafe tonight. ↩︎

  3. The first clue is the clip numbers are almost in sequence. The second, more definitive clue is that the filenames have the times in them, although the current audio embedding mechanism I’m using here doesn’t display them, so you’d have to look at the HTML source to know that. ↩︎