Deep Voice – The Empire Grows Steadily Less Secure

Baidu AI Can Clone Your Voice in Seconds

From the post:

Baidu’s research arm announced yesterday that its 2017 text-to-speech (TTS) system Deep Voice has learned how to imitate a person’s voice using a mere three seconds of voice sample data.

The technique, known as voice cloning, could be used to personalize virtual assistants such as Apple’s Siri, Google Assistant, Amazon Alexa; and Baidu’s Mandarin virtual assistant platform DuerOS, which supports 50 million devices in China with human-machine conversational interfaces.

In healthcare, voice cloning has helped patients who lost their voices by building a duplicate. Voice cloning may even find traction in the entertainment industry and in social media as a tool for satirists.

Baidu researchers implemented two approaches: speaker adaption and speaker encoding. Both deliver good performance with minimal audio input data, and can be integrated into a multi-speaker generative model in the Deep Voice system with speaker embeddings without degrading quality.

See the post for links to three-second voice clips and other details.


The recent breakthroughs in synthesizing human voices have also raised concerns. AI could potentially downgrade voice identity in real life or with security systems. For example voice technology could be used maliciously against a public figure by creating false statements in their voice. A BBC reporter’s test with his twin brother also demonstrated the capacity for voice mimicking to fool voiceprint security systems.

That’s a concern? 😉

I think cloned voices of battlefield military commanders, cloned politician voices with sex partners, or “known” voices badgering help desk staff into giving up utility plant or other access, those are “concerns.” Or “encouragements,” depending on your interests in such systems.

Leave a Reply

You must be logged in to post a comment.