Who can steal our voice and how can we protect it?

Until recently, the voice was a unique feature belonging to a particular personality. Today it can be “stolen” and illegally used for various purposes, both simply for entertainment or to give a better sound to any content, as well as for committing crimes in cyberspace.

Voice and its significance

Voice is a unique characteristic of personality that has special features and allows you to separate it from others. A huge role in the sound of the voice and its perception is played by the timbre, complemented by certain overtones. Today, its value and recognition largely depend on the totality of the components of a particular person, his physiological characteristics, additional tools for establishing a voice. For example, many people became popular, actually gained recognition and fame solely due to their voice.

Over time, the value of a person’s voice began to gain momentum, entering the general process of commodification (i.e., the transformation of a particular good into a commodity). Unsurprisingly, many celebrities (and not just singers) have taken care to ensure their voices are protected. For example, a blogger with a multi-million audience, Mr. Bist, forbade the voicing company to make him a “Russian-speaking voice,” since the rights to this belong to his company. Many artists, including domestic ones, insure their voices. So, Nikolai Baskov did it in 2002, the price of insurance for the artist’s voice was so high that it even doubled the price of Luciano Pavarotti.

Voice status in the law

The definition of a voice can be protected by different legal norms. But the question of the effectiveness of such protection today is extremely relevant. Especially given the evolution of deepfake technology.

Voice is the intangible good of every person

Intangible goods are those that are given to everyone from birth and are protected by law. The list is enshrined in Article 150 of the Civil Code of the Russian Federation. Although the voice is not directly named in it, in essence it still belongs to this category. Civil legislation directly enshrines only the procedure for protecting the image of a citizen, describes its essence. Of course, when considering disputes, the courts always have the opportunity to apply these norms, and when stealing a voice by analogy. But in practice it turns out that this is completely ineffective, especially given the development of deepfakes.

How does it work? Deepfake is created through deep synthesis of human voice based on AI training. Samples of famous people (actors, singers, announcers, etc.) are always available from open sources. The AI material is almost impossible to distinguish from the original. The ease of obtaining a synthesized voice, its use for various purposes damages not only the earnings of celebrities, but sometimes their reputation (when used in criminal schemes, for example).

Case
In case No. 02-3580/2024, the dubbing-in actress went to court with a demand to protect her honor and dignity against Tinkoff Bank. She voiced a large array of text to train the bank’s voice assistant. This, according to the actress, was the only possible way to use her voice under the contract. Subsequently, she learned that her voice is used to voice porn ads, and is also publicly available on the bank’s website for its synthesis, sold at other sites.
The court refused the actress, since under the contract she provided consent to the anonymous use of the performance, its publication, amendments and additions to it, other revisions, including by third parties.
At the same time, in fact, the possibility of making changes to the version recording, using it in a modified form is a necessary post-processing tool (installation, noise removal, other improvements in its quality).

This and other similar situations prompted the authorities to take action. The Union of Announcers of Russia addressed the deputies of the State Duma of the Russian Federation with an initiative to resolve the issue of using synthesized voices, where they highlighted the problem of uncontrolled synthesis of voices of announcers, as well as cases of their theft.

And what about abroad?

It is important to note that at the level of foreign states, the voice is equated with the image, name and other non-property goods of citizens, they are provided with equal legal protection.

In the United States, such goods are protected on the basis of the right to publicity (legalized in 25 states) or to privacy. At the same time, if earlier there were disputes over cases of theft of voice (recording of performance, for example, by third parties), now there are also appeals due to cases of illegal synthesis using AI and deepfake technologies.

Case
Kyland Young filed a class action lawsuit against NeoCortext, Inc. (“NeoCortext”) for violating the right to publicity under California law. The reason was the misuse of personal characteristics, including voice, actors, musicians and other famous personalities to sell subscriptions to the deepfake application – Reface. With its help, users have the opportunity to “try on” the face, voice of their idols.

In some countries, a person’s voice is protected at the constitutional level as a personal intangible good. This is indicated, for example, in the basic law of Peru (part 7 of article 2), and Ecuador (part 18 of article 66).

Voice is biometric personal data

Voice is a biological characteristic that serves as a means of personal identification. As a general rule, voice undoubtedly refers to biometric personal data. This is directly indicated in the norms of special laws. However, it is important to understand that the voice cannot always be provided with legal protection on this side. The goal for which physiological data in the form of a voice was collected is important here. Biometrics are carried out to identify a specific person. And if it is not (the voice is assigned to others, changed), then the norms of these special laws in terms of incorrect processing of personal data are unlikely to help.

Let’s look at the example of a deepfake synthesized audio based on the original sample, subsequently processed using AI. It is difficult to talk about the purpose of identification here. And the synthesized voice itself is not biometrics, since these data are artificially created, they are not directly related to the physiological characteristics of a particular subject. Deepfake cannot reflect the biometric features of the subject, but only creates their imitation.

Voice – object of related rights

In the context of related rights, it is also difficult to talk about protecting the voice from deepfakes. There is a person’s right to performance and its inviolability, as well as the right to a phonogram (recording of performance). At the same time, in my opinion, protecting the voice through these tools does not always bring a satisfactory result. Because in fact, when training AI on the basis of various materials, no changes are made to the existing object, the original version record is not modified. The neural network is trained on the basis of different materials, reads the voice in different interpretations, including its colorfulness and emotionality. When synthesizing a voice, AI actually creates a new object, an independent recording used for other purposes.

And it turns out that it is correct to talk about the violation of related rights only in cases where specific changes are made to the original version – for example, to the original of the artist’s voice in a specific recording or to the soundtrack, etc.

Violations of public rights by creating voice deepfakes

Stealing a voice, creating a synthesized copy of it based on deepfake technology turns into a problem not only of a civil law nature, but also of a public nature. Since various crimes in cyberspace are committed by stealing identity through obtaining samples of a person’s voice. The most striking example, perhaps, is fraud through phone calls. Violators first try to take possession of the voice of a particular person by any means, using various tricks. For example, calling him on behalf of some service, recording a conversation and then using modified audio to attack his relatives and colleagues. Therefore, we often talk about the fact that you do not need to talk to fraudsters for a long time, since sometimes this is their goal – to write down as many words uttered by the victim as possible.

Fraudulent schemes with audio deepfakes are also diverse, ranging from calls allegedly from relatives asking for help to instructions from company executives or partners. The total amount of harm caused to citizens for such violations in 2023 is estimated at 19 billion rubles.

The possibility of politics’ voices being stolen and faked could be hugely damaging. After all, fraudsters use deepfakes to spread fake news, report on allegedly made decisions on state issues. A striking example was the case in 2018, when ex-US President Barack Obama allegedly spoke impartially about the then President Donald Trump. This deepfake, by the way, was created intentionally so that people do not believe everything that they see, but more carefully check the information.

Current trends in voice protection and the fight against voice deepfakes

In order to ensure the protection of the voice, a group of deputies of the Russian Federation proposed a specific initiative (bill No. 718834-8). They recommended amending the Civil Code of the Russian Federation by adding provisions on voice protection by analogy with images as an object of personal non-property rights. According to the idea, the synthesized voice model should be protected. To create and use it, it is necessary to obtain the prior consent of the citizen (but there are also exceptions). And then, if materials are found with a citizen’s voice posted without his consent, he will have established means of protecting his rights in the form of seizing material media and destroying such materials by a court decision.

Laws aimed at combating deepfakes are adopted everywhere in other states. For example, a bill has been introduced in the US Congress to punish the use of AI to create generated materials based on photos and videos, as well as people’s voices. These provisions presume effectively the right of property of each person to his personal non-property goods, including the voice. The possibility of their transfer is reserved only for the person to whom they belong. When copying and counterfeiting, a fine is established, reaching 50 thousand dollars.

Conclusion

AI technologies are developing at an unthinkable speed, legal regulation simply does not keep up with it, so many problematic issues are still open. However, the introduction of legislative initiatives in this area can significantly help in the fight against voice deepfakes, assist in reducing cases of “theft” of voice, or its processing, and limit the possibilities of profit for third parties using someone else’s voice.

Each of us can make our voice less likely to be stolen. To do this, it is worth limiting the publication of audio and video content in social networks, not communicating with suspicious persons by phone, carefully studying contracts for granting the right to use voice recording and making changes to it. In order not to become a victim of scammers with a deepfake, you should always be critical of the requests of relatives and colleagues for the transfer of money, especially when the other side insists on an urgent solution to the issue.