Audio Source Separation with Demucs

Seanghay Yath,Mon May 29 2023

Setup

I'll be using yt-dlp to download audio from YouTube and I'll be using ffmpeg for audio transcoding so make sure you've installed them to follow along. Check the docs for the guide.

You might also need to run this in a Virtual Env or in an Conda Environment.

# Note: `audiosep` is the env name
conda create -n audiosep python==3.10.11
conda activate audiosep

Install Demucs 4 using pip

python -m pip install demucs==4.0.0

Download the Music

We're using yt-dlp to download a .wav audio file from YouTube. Once it's finished, we'll have an audio file called music.wav in the current directory.

Demucs expects the input audio to be in .wav format (PCM).

yt-dlp -f bestaudio/best --output music.wav "https://www.youtube.com/watch?v=jQaGTqR68xQ"

Separation

We're using the fine-tuned model called htdemucs_ft (Hybrid Transformer Demucs Fine-Tuned) which is a better version over a regular demucs model.

python -m demucs.separate \
  -n htdemucs_ft \
  --two-stems vocals \
  --out output/ \
  music.wav

--two-stems=vocals is used because we only want to separate vocals and non vocals from the audio. Leave it empty if you want to separate more than 2. Demucs supports drums, bass, vocals, and other.

After the script finished, we should have an ./output/ folder which includes the separated audio files. Those files are in .wav format so you might need to transcode it to .mp3 using ffmpeg

ffmpeg -y -i ./output/htdemucs_ft/music/no_vocals.wav no_vocals.mp3
ffmpeg -y -i ./output/htdemucs_ft/music/vocals.wav vocals.mp3

Preview

Original Audio

Vocals Only

Non vocals Only (Karaoke Version)