Helping machines distinguish instruments

Facebook AI’s Demucs teaches AI to hear in a more human-like way

Maika Möbus
© Shutterstock / Miriam Doerr Martin Frommherz

Demucs is a new research project by Facebook AI. It is designed to separate musical tracks into different instruments or vocals, similar to how a human can detect the specific instruments, and solve the problems of existing approaches. In the long run, Demucs could be applied to other AI tasks as well.

Music source separation can be a tricky task for machines, while it’s easier for humans to distinguish the vocals, bass or drums. To help with this task, Facebook AI research scientist Alexandre Defossez has developed Demucs (deep extractor for music sources).

SEE ALSO: Deep learning in 3D with Facebook AI’s new tool PyTorch3D

As described in the famous “cocktail party effect”, humans have the ability to single in on a certain conversation in a loud environment. This task of sound source separation poses difficulties for machines though. Let’s see how AI tools manage this task and what sets Demucs apart.

Spectrograms vs. waveforms

Most commonly, as Defossez points out, AI separates music sources by analyzing spectrograms. While this method is well suited for instruments that resonate on a single frequency, spectrogram-based methods have their weaknesses. For examples, saxophone and guitar frequencies may cancel each other out.

This is where Demucs comes into play—an AI-based waveform model that is designed to work in a similar way to how computer vision detects patterns in images. “It detects patterns in the waveforms and then adds higher-scale structure,” as Defossez explains. Or in other words: “Demucs can re-create the audio that it thinks is there but got lost in the mix.”

Defossez based Demucs on Wave-U-Net, an earlier AI-powered waveform model, and then went on to fine-tune his model. It now not only outperforms Wave-U-Net, but is also “‘way beyond’ state-of-the-art spectrograms.”

In the future, technology like Demucs may improve the abilities of AI assistants to hear voice commands in loud environments. Additionally, it could also be used for hearing aids or noise-canceling headphones.

SEE ALSO: Using AI for managing images and videos at scale

If you’d like to experiment with Demucs, you can find further info in the research paper and download the code from GitHub.

See the [email protected] blog post for further details and sound samples.

Maika Möbus
Maika Möbus has been an editor for Software & Support Media since January 2019. She studied Sociology at Goethe University Frankfurt and Johannes Gutenberg University Mainz.

Inline Feedbacks
View all comments