Warm tip: This article is reproduced from serverfault.com, please click

Convert Voice A to Voice B using librosa

发布于 2020-11-27 09:23:34

I am new to librosa and voice/sound analysis. I have searched this straight question in SO and google but did not get a understandable answer.

Consider there are two voices A and B. I want to convert voice A to voice B.

Given both voices, is it possible to do something on A to sound like B ?

Questioner
Naroju
Viewed
0
Jon Nordby 2020-11-28 17:03:39

This kind of task is sometimes called "style transfer", where one keeps the content the same (the spoken words) but change the expression via the style (prosody, how they are spoken). Some keywords to search for are Voice Style Transfer, Speech Style Transfer, Audio Style Transfer, Voice Translation, Voice Cloning, Prosody Transfer. Here is an explanation of some of the approaches, from Kyle Kastner, a practicioner in the field.

Good speech style transfer is a quite hard task, and there have been many research papers on it the last years. Many speech style transfer systems using neural networks are adaptations of Text to Speech (TTS) / Speech synthesis models, such as Tacotron, Tacotron 2 or Wavenet.

There are many open-source implementations of neural speech style transfer papers on Github, but many of them require considerable setup to use (downloading datasets, models, formatting inputs etc). One of the most popular alternatives is Real Time Voice Cloning, which supposed to be able to clone a voice with 5 seconds of audio. Another example is https://sforaidl.github.io/Neural-Voice-Cloning-With-Few-Samples/