OpenAI Refining Voice Cloning with Voice Engine

avatar

image.png

The voice cloning industry is growing every day, and even more rapidly in recent times as AI models improve at making synthetic voices sound realistic. And now, OpenAI finally makes its debut with Voice Engine, but they are keen on entering the industry with it responsibly.

OpenAI's Voice Engine will allow users to record 15-second voice samples, and it will generate a copy of the voice. Its application crosses many areas of the voice industry, including audiobooks, podcasts, voiceovers, and virtual assistants. However, OpenAI hasn't announced when Voice Engine will be publicly available, as they are taking time to ensure that it is as safe as it can be.

image.png

Interestingly, Voice Engine's AI model has been around for a while now. It's been available as a "read aloud feature in the AI chatbot, ChatGPT, and even that was already impressive. Where it's training data is from, however, isn't so clear. They'd only say that it was trained on some public and licensed data.

Training data is a crucial type of information for AI providers. It is confidential for most of them, as it is some sort of competitive advantage between themselves. More so, they are also potential leads to IP-related issues, further discouraging them from talking much about them. Already, OpenAI is already facing allegations over IP law violations by training their models on copyrighted content without actually attributing them to the creators or providing incentives, so they'd rather be discreet with their information on training data.

In an actual sense, it is difficult to create useful AI without real-world samples, including copyrighted content, and so OpenAI pitches that fair usage of such works be allowed as long as it is developmental for the models when training them.

image.png

Voice Engine's training isn't based on user data, however. “We take a small audio sample and text and generate realistic speech that matches the original speaker,” said Jeff Harris, a product staff member at OpenAI. “The audio that’s used is dropped after the request is complete.” And so, Voice Engine analyses the (15-second) voice sample provided and the text to be read and generates a voice that matches the sample, all on the go as the request is made.

There are already-existing technologies such as ElevenLabs, Replica Studios, Papercup, and Respeecher, but unlike many of them, there really aren't controls to adjust to the pitch, cadence, and tone of a voice. No fine-tuning knobs, either. You give it a 15-second sample, and it generates a voice for the request. However, something interesting it does is carry on the expressiveness of the voice in the sample to generations of the synthetic voice. That is, if you sound excited in the sample, the generated voice will sound just as similar.

There are concerns as to what will become of creators in the voice industries and how this tool will affect them, considering how good enough these models are to replace most of them. There are already existing platforms that have been deploying these AI cloning models to create content. To benefit these creators, voice actors, and the like, they are asked to sign rights to the use of their voices by these models so their clients get to use their synthetic versions.

While some AI providers try to find balance amidst the controversy over the ethical usage of copyrighted works by either creating deals with SAG-AFTRA (Screen Actors Guild - American Federation of Television and Radio Artists) to create and licence copies of the media artist union members’ voices, like Replica Studios is doing, or hosting a marketplace for synthetic voices that allows users to create a voice, verify it, and share it publicly, like ElevenLabs, OpenAI is taking a different approach.

OpenAI will establish no such labour union deals or marketplaces, at least not in the near term, and requires only that users obtain “explicit consent” from the people whose voices are cloned, make “clear disclosures” indicating which voices are AI-generated, and agree not to use the voices of minors, deceased people, or political figures in their generations. Source

image.png

What we have seen with deepfakes in recent times and what's possible in the future with these AI models continue to raise concerns about the ethical and responsible use of AI. OpenAI is implementing some measures to prevent misuse of Voice Engine.

For now, Voice Engine is only going to be available to a very small number of people—say, 10 developers. OpenAI is prioritising use cases that are “low risk” and “socially beneficial,” Harris says, like those in healthcare and accessibility, in addition to experimenting with “responsible” synthetic media.

Watermarks are placed in the voice clones generated with Voice Engine. They are inaudible identifiers embedded in the generations that enable them to know if a voice clone was created by Voice Engine and who developed it. It's not promised that it can't be walked around, but they are described as "tamper resistant," at least.

An example of Voice Engine's performance is how it used this voice sample to generate three audio clips. Generated clip 1, clip 2, and clip 3. The difference between the original clip and the generated ones isn't apparent, and unsuspecting listeners will unlikely be able to figure it out.

image.png

OpenAI states that there will be HD and non-HD voices, but a spokesperson at OpenAI also says that there really isn't a difference between both of them. They are priced differently, however, with HD costing twice as much as non-HD.

Until OpenAI releases Voice Engine to the public, they are focusing more on safety issues as they develop the AI voice cloning model. “What’s going to keep pushing us forward in terms of the actual voice-matching technology is really going to depend on what we learn from the pilot, the safety issues that are uncovered, and the mitigations that we have in place,” Harris said. “We don’t want people to be confused between artificial voices and actual human voices.”


By the way, make earnings with your content on Hive via InLeo while you truly own your account. If you're new, sign up in a few minutes by clicking here!


References

Images 1, 2, 3, 4


Interested in more?

Meet the Humane AI Pin: Voice, Gesture, AI – No Screens Needed!

The Link: Bridging Minds and Machines with Neuralink's Brain Chip

AI-coustics: Revolutionizing Audio Clarity with Generative AI Technology

Posted Using InLeo Alpha



0
0
0.000
14 comments
avatar

It's cool to se that AI is becoming more responsible. Placing priority on users' safety is a great step.

0
0
0.000
avatar

Totally, man. It's a really good thing that AI vendors are careful and responsible with their products.

0
0
0.000
avatar

Congratulations @olujay! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)

You distributed more than 27000 upvotes.
Your next target is to reach 28000 upvotes.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out our last posts:

Hive Power Up Month Challenge - March 2024 Winners List
Be ready for the May edition of the Hive Power Up Month!
0
0
0.000
avatar

Olujay has come again with his Ai complex post🤦. Only the title alone was paining my brain, haha, and then I began to read further and keep meeting terminologies in Ai world. What can I say, Ai kept waxing stronger, striving to do even the unimaginable and I hope that all the advancement will be safe for us humans
I came via #dreemport
#dreemerforlife

0
0
0.000
avatar

What else will this AI not do. Maybe the next thing we'll hear is that AI can now record our breathing, lol.

Good a thing I already have a nice voice, already insured so wouldn't be needing it, hehehe.

#dreemerforlife

0
0
0.000
avatar

I think by the time that happens, it will not be news to us. 😁

0
0
0.000
avatar

This is why Elon Musk raised a debate about creating an anti-Ai committee that ensures AI is used in a manner than doesn't infringe on the right of its users.

There is a popular scamming technique used abroad called voice pinching, you are sent a voice note which is identical to someone you know but it is just another person playing pranks on you.

The voice pinching is not perfect due to inability of the tools used to perfectly imitate a human voice. Now imagine this kind of tool in the weaponry of those people. Digital conversations will no longer be safe because anyone with voice engine will be capable of imitating anyone's voice.

0
0
0.000
avatar

I am not aware of that committee Elon Musk is putting up, but I know that many AI vendors keep developing their guradrails to keep the use and development of AI responsible.

Yes, man, there is now AI voice cloning. One time, someone used it to break into their own bank using the tool. It's crazy the things people are doing with AI now.

0
0
0.000
avatar

Hmmm, it seems the advancement of technology which is a product of humans intelligence will eventually be the end of humans 😔

0
0
0.000
avatar

If course, we will be out own undoing. I hope that never happens, though.

0
0
0.000
avatar

From what I'm seeing, it is not a matter of if it will happen or not, it is a matter of when. 😅

0
0
0.000
avatar

Well, it is already happening. Abuse of AI is the undoing.

0
0
0.000