VALL-E: After ChatGPT, here’s a new artificial intelligence capable of reproducing your voice by listening to it for 3 seconds
A team of researchers from Microsoft has launched VALL-E, a new artificial intelligence (AI) capable of synthesizing your voice. This model is currently not accessible to the general public but already raises questions about the ethics and danger of the project.
VALL-E is the latest addition to a growing AI family. We can mention DALL-E 2, which can instantly create images on all themes and in all imaginable styles, or ChatGPT, which allows to generate on request texts such as motivational letters, high school dissertations, scenarios…
But VALL-E goes even further. ‘We found that VALL-E can preserve the emotions of the person’s voice as well as the acoustic environment of the recording,’ states the paper released by the research team that designed the AI. Your text can be emitted in a fearful or joyful tone, and be more or less audible depending on the conditions in which you record your test audio.
A risk to the safety of users?
To achieve such an improvement, VALL-E was trained through 60,000 hours of speeches from the Meta Audio Library, LibriRight, “a training hundreds of times more important than for existing systems”.
However, this innovation raises many questions about the danger it can present. The software ‘could create potential risks of misuse, such as spoofing voice identification or posing as someone else,’ the team of researchers behind the AI acknowledges.
‘We designed the program on the basis that the user agrees to be the target of a synthesized voice,’ however, the designers argue. ‘If this model is generalized to anonymous users around the world, then a protocol should be included to ensure that the caller approves the use of their voice and its detection.’
The VALL-E demo, which allows you to observe how AI works with various examples, is available in English on GitHub.