I’ve been working for a while on AudioLoop, so I figure it needs a post.

AudioLoop is a tool to help researchers create labels for large unlabeled audio datasets. It’s meant to indirectly solve a problem I see in bioacoustics: the lack of many large labeled datasets. Hopefully AudioLoop can help others create more of them.

The problem is that labeling can be a long, drawn-out, draining manual process. Having to listen to a million clips and click “yes” or “no” for each one is challenging even for the most determined, not to mention expensive. And many bioacoustics datasets are highly imbalanced: events of interest can be quite rare. Listening to 100 clips of noise for every one clip of interest isn’t a great way to spend time or resources.

It’s a general pipeline for human-in-the-loop active learning. Instead of having to manually label every example, you can train a model to automatically label easy-to-classify clips and present only the hard-to-label clips for humans to label. The hope is that by repeatedly training a model, focusing human labeling attention only on uncertain examples, much less manual labeling will be required.

The goal is to allow a researcher to import any existing dataset with a simple configuration file edit, and start up the loop with minimal setup.

It includes a basic prototype browser-based classifier interface for the uncertain examples.

My current work is on adding support for transfer learning using pre-trained audio models. Once that’s good to go, I plan on running a benchmark comparing the active learning loop to random clip selection to see how much labeling effort this system actually saves. I’ve been expecting that AudioLoop should significantly reduce the amount of manual labeling needed, but that’s something that needs measuring.

There’s a whole lot more detail in the AudioLoop repo, so go check it out, and if it sounds interesting, send me your thoughts!