Continuing on from the last post about Perch 2.0 embeddings from MBARI raw hydrophone data, it seems important to note that the PCA plot in that last only looked at the extremes–the clips where the humpback detector model expressed strong positive confidence (>90%) or extreme negative confidence (<5%).

So let’s look at what happens when we take slightly less obvious examples. Let’s compare 70-90% vs 10-30% confidence clips. We’ll use the same day, December 21, 2016, because it has a lot of high-confidence examples.1

The clips are embedded below.

Here’s the PCA plot of 10 detector-tagged likely whale and 10 likely not-whale:

PCA plot of Perch embeddings from MBARI data

This is a bit more interesting. Either a) the humpback detector is getting some classifications wrong, b) Perch embeddings can’t easily distinguish these less obvious clips, or the c) the PCA axes of variance-maximization don’t correspond well to what we’re looking for here.

Let’s start with the first, and manually verify the humpback detector’s classification:

10-30% likely humpback (“correct” here means not-humpback):

  1. correct. Clearly dolphin.
  2. correct. More dolphin.
  3. correct. Background noise
  4. incorrect. Sounds like a humpback at 0:02
  5. correct. Indistinguishable noise
  6. correct. Again, indistinguishable, although I think I might hear a humpback off in the distance
  7. tossup. The last second of this clip sounds humpback-like.
  8. incorrect. There are a couple faint humpback bloops there
  9. correct. Noise
  10. correct. There’s a tone, but it doesn’t sound whale-like

70-90% likely humpback (“correct” here means humpback):

  1. correct. That’s a whale or two
  2. correct. Humpback
  3. correct. This is a close one, but the last second qualifies it
  4. correct. Humpback
  5. correct. Humpback
  6. correct. Humpback roar
  7. correct. Faint, but there
  8. correct. Fainter, but there
  9. correct. Humpback
  10. correct. Humpback call in the first half-second

Let’s call it 17.5/20. Not as perfect results as with the 10/90 clips in the last post. But if we correct the plot with the manual label corrections:

Corrected PCA plot of Perch embeddings from MBARI data

Based on this new plot, and with knowledge of the ground truth as determined by my listening2, we can see a few different groups:

  1. The dolphins (1,2)
  2. The definitive whale clips (11,12,14,15,16,19)
  3. The less-obvious whale clips (4,8,13,17,18,20)3
  4. The not-whale clips (3,5,6,7,9,10)

But (3) and (4) really aren’t easily distinguishable in this representation of the Perch embeddings without ground-truth knowledge.

It seems that once we correct for the humpback detector model’s incorrect classification, for less-obvious positive and negative examples the clusters don’t separate in the embedding space in a clearly obvious way, but they certainly show tendencies.

At this point, I can either try a larger dataset and see if clearly-detectable clusters resolve, or a different visualization besides PCA.

If the embedding clusters aren’t apparently separable with any other methods, we might have to consider that the older, more specific humpback-detector model separates the positive and negative classes more cleanly than a simple Perch embedding visualization can.

More attempts to see what Perch is capable of in the next post.

Clips

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20


  1. The MBARI example humpback detector notebook uses this date, I would assume, because there are a lot of fantasically strong humpback calls that day. Most days that I’ve listened to don’t have anything near the strength of that day’s clips. Running the humpback detector model on June 21, 2016, for example doesn’t find any 90% confidence clips (and only a few >70% ones). December 21 is simply full of them. ↩︎

  2. Which, to be fair, is less an assessment of ground truth and more an assessment of my particular whale-call sensing capabilities. If you disagree about some of my decisions, let me know! ↩︎

  3. (Although I’d say that 16 and 19 are not actually that close-call.) ↩︎