Perch and MBARI Clips: Low Confidence Neighbors

Time to continue looking into whether Perch 2.0’s embeddings make it easy to distinguish between humpback-present and humpback-absent clips in raw MBARI hydrophone recordings.

Last time, I looked at two clips with high-confidence scores from Google’s humpback detector model. Then I compared their Perch-computed embeddings to their nearest neighbors using a ~9000-clip sample stratified across detector-score buckets.

For the high-confidence humpback-positive clip, the Perch embedding’s 150 nearest neighbors and the detector scores seemed to be in line: the neighbors were almost all in the detector highest-confidence bucket.

For the high-confidence humpback-negative clip, the 15 nearest neighbors had detector scores that varied much more: between 0% and 50%. In addition, on manual listening it was noteworthy that two of those clips were actually faint positives that had low detector-model scores while their embeddings were still in the very close neighborhood of the known negative test clip. This suggested that classifying faint positives using either the detector model or Perch embeddings could be difficult.

It’s time to try the same experiment, but with lower-confidence positive and negative clips (70-90% and 10-30% detector model scores) and see if their embedding neighborhoods have the same characteristics.

First up, a lower-confidence positive clip (peak detector score: 0.802)

7273

Sounds like a humpback, but definitely not as high-signal as the high-confidence clips, as expected.

Now the 15 nearest neighbors, with cosine similarity and detector-score bucket:

6819(0.93)(70-90%)

7094(0.93)(70-90%)

7297(0.92)(70-90%)

3163(0.92)(10-30%)

8805(0.92)(>=90%)

6964(0.92)(70-90%)

8895(0.92)(>=90%)

7283(0.92)(70-90%)

8920(0.92)(>=90%)

6910(0.92)(70-90%)

6679(0.92)(70-90%)

5543(0.92)(50-70%)

6721(0.92)(70-90%)

7127(0.92)(70-90%)

7110(0.91)(70-90%)

There’s more variety in the detector-score buckets than we had with the high-confidence clip. Ten of the fifteen neighbors are also in the 70–90% bucket, three are in the >=90% bucket, one is in 50–70%, and one is in 10–30%. Since those clips also sound humpback-positive, Perch is still retrieving humpback-positive neighbors for a humpback-positive query clip. The tougher question is whether Perch is grouping these embeddings by humpback presence alone, or by the larger acoustic scene.

Also of note is that the 10-30% score clip seems pretty clearly to be a positive. That looks like a miss by the humpback detector. But it shows up in the Perch neighborhood with all the other positive clips.

Moving to a lower-confidence negative clip (peak detector score: 0.200)

3376

This is an interesting clip because it has some sort of faint unidentifiable sound that might be a humpback.

And the 15 nearest neighbors:

1892(0.96)(5-10%)

4111(0.96)(30-50%)

2860(0.96)(10-30%)

896(0.96)(<=5%)

3524(0.96)(10-30%)

963(0.96)(<=5%)

910(0.96)(<=5%)

2080(0.96)(5-10%)

974(0.96)(<=5%)

908(0.96)(<=5%)

982(0.96)(<=5%)

933(0.96)(<=5%)

3222(0.96)(10-30%)

956(0.96)(<=5%)

3535(0.96)(10-30%)

Note that all of the clips are very close by in the embedding space. This is a dense neighborhood. They appear to my ears to be mostly noise, with the occasional faint, evocative, yet unidentifiable sound thrown in. And the humpback detector is pretty confident about many of them. It’s more confident about the lack of a humpback in those clips than it is about the original clip.

For lower-confidence negatives, Perch seems to be putting this clip in a dense neighborhood of similar noise-like clips. But all of those clips seem fair to put into the negative category–the neighborhood placement generally agrees with the low scores of the detector.

It’s about time to wrap up this set of experiments. In the next post I’ll do just that.