Voice recognition plus artificial intelligence

Voice assistance is only the tip of the iceberg of speech recognition technologies that are actually used much more widely. Adding artificial intelligence, we received programs that not only understand what a person says, but also respond to him. But so far, we cannot say that the intelligence of such assistants is strongly developed: either they answer by written scripts, or are constantly trained on the basis of a neural network, as many popular mass market assistants like Apple Siri, Google Assistant, Microsoft Cortana or Russian-speaking Alice and Marusya from Yandex and the VK group.

Nevertheless, their capabilities grow from year to year, so voice assistants who received popularity primarily in the mass market, begin to penetrate the B2B.

How will technology develop?

This year, according to analysts, there is a significant turning point in the industry, as a result voice assistants should step beyond a narrow circle of gadgets like smart columns or smartphones. According to Gartner forecasts, by the end of 2021, companies will spend $3.5 billion for virtual personal assistants. And by 2025, more than 50% of employees of intellectual labor will use such assistants on a regular basis (in 2019 there were only 2%).

There are already widespread cases of using virtual assistants in retail, for example, on websites of online stores, in support services for banks and service companies. However, this is only the beginning. Assistants have the potential to develop in systems of smart transport, urban infrastructure management, industry, medicine and education. And the wider their capabilities, the faster their spheres will expand use.

Top technology trends

How voice assistants will develop and whether innovations are possible in principle on the market? After all, speech recognition, processing of natural language and AI, including machine training is not at all news in the field of IT and has existed for a long time. What can change and push the market forward?

Personalization of experience

Voice assistants should become more personalized – and this is not about the eventual respond to the user name. Already, many of them know how to respond to the voice, determining who is in front of them, and acting further depending on the preferences of a particular man. This is how voice assistants work, integrated with smart home systems. They can turn on the desired illumination brightness or adjust the air conditioner for a particular family member if he has previously saved his settings. This is the simplest scenario.

In the future, assistants through machine learning should be able not only to learn the user’s language, but also to read his emotions, building a communication strategy taking into account the context. In business, this will mean that assistants will choose an individual manner of service for everyone who communicates with them. And not only choosing a greeting style between “Hello” and “Good afternoon,” but offering goods, services, in different special ways to realize marketing strategies. The ability to be “tailor made” for the user will be useful in education and medicine: there the personality plays a large role in communication, even if this personality is virtual.

Integration with other systems

The value of a voice assistant increases when it can receive data not only from its Google’s own base or weather service, but also from business systems. In the mass segment, this option is already being implemented. For example, assistants synchronize with system data notifications, personal calendars and various schedulers to remind you of the assigned meetings, the need to feed the cat or take medicine. Further, Samsung already settled the voice assistant in the refrigerator, and Google released the Google Assistant Connect solution, which allows third-party vendors to embed assistants in their devices.

In business, integration opportunities are wider, since the depth of data here is much greater. For example, if assistants learn to read the history of brand interaction with a client from CRM systems, they will be able to take this information into account in communication. Opportunities for integrations are almost limitless.

Developing Advertising Opportunities

This is another way to commercialize voice assistants. According to Juniper Research, in 2022, users will spend $19 billion on gadgets with voice capabilities. It is a huge audience with which brands can interact in the context of their request situation. Isn’t it logical if a person asks an assistant to call him a taxi, offer him some particular service at this moment? Now all advertising integrations in voice assistants are conducted at the level of experiments, but in fact it is a market with a billion potential that has yet to be developed.

Data Protection

This item is important because it disturbs end users. If voice assistant knows everything about you, from the name to the training schedule and contact list, where is the guarantee that information will not leak? Moreover, high-profile scandals with smart columns in this regard have already happened. Manufacturers are working on this, for example, Amazon has published several comprehensive documents on voice recording capabilities in Echo columns and how it saves user data. Such protocols are developed, but legal regulation issues in the sphere have yet to be improved. Especially for application of assistants in B2B, where the whole history of interaction with user data, especially personal, must be transparent and legal.

Integration of voice assistants and IVR

Corporate IVR systems already know what voice robots are, but so far their capabilities are quite primitive. In call centers, robots can orient the calling client in the voice menu, switch to the desired department or give a simple help: they voice the balance on the mobile account or credit card balance. The more voice assistants integrate in IVR, the closer is the day when a virtual personality answers you at that end of the wire, who can really help no worse than a living employee. Such assistants will not cause irritation and noticeably increase the speed of customer service.

Voice cloning

Voice cloning technology allows you to simulate realistic human speech. Deep learning technologies are used here that help machines copy more than just replicas of people, but their manner and emotional coloring. In life, we do not speak in the same tone with the same number of pauses between words. The living speech of man is rich in different shades, theatrical pauses, it can be slow, fast, lively, thoughtful and so on. All this has to be adopted by computers; the faster they learn to do so, the faster it will be easier for them to adapt. More human like assistants will get from users more sympathy and trust, which will allow technology to overcome psychological barriers.

By the way, there are no assistants who could pass the Turing test. English scientist Alan Turing came up with this test to evaluate the intelligence of the machine. To take the test, the robot must behave indistinguishable from a living person. Chairman of Alphabet Board of Directors earlier stated that Google Duplex passes the Turing test when scheduling meetings. But he stressed that this happens if certain conditions are met.

Voice Input Development

Initially, neural networks were trained on the voices of white men and women. However, people of different nationalities, cultures, professions, lifestyle in real life speech style can be different from the ideal that voice assistants are used to. So far, speech features make robots confused. As an example – in a simple situation when a person speaks with accent, is ill or wearing braces, his pronunciation changes.

Here is a green field for improving neural networks and their capabilities in understanding real people and situations. They have to learn to distinguish dialects, separate the voice from background noise and solve many other problems. One of them is to learn to enter into communication on the situation and without a direct request. That is, not to wait for the command “Okay, Google,” but to offer help first when it is needed.

Cloud connection

Since voice assistants work on a dataset basis, to access the base and search they require certain computing resources for the information they need. And the bigger and the more accessible they are, the faster the assistant responds to a specific situation. Logical development of these technologies would be based on cloud infrastructure on the SaaS model. It doesn’t matter if this is the public cloud of the provider who invented the assistant, or the private cloud of the company itself that wants to maximize its availability throughout its network of branches or stores. In fact, the work of assistants from the cloud will simplify their installation and make the technology simpler and more accessible.

Visualizing Assistants

Let’s agree, talking with a square piece of plastic has become familiar, but still not too cozy. In the future, virtual assistants must acquire a human appearance there, where their visualization would be justified. It can be bringing a character to LCD-display, as well as more complex implementations, up to holograms. They are still attributed to the genre of fiction, but such technologies already exist.

Creating the virtual assistant “Agatha,” we encountered that assistants visualization cases are few so far, and we considered it a promising niche. Our main idea is to make the assistant more real and human. From process point of view, we did nothing new by combining speech recognition and synthesis. In creation of Agatha’s visual image, which now lives as a 3D model on the screen, we were assisted by specialists in game animation. The character itself was purchased from the animated studio and refined. “Agatha” has lively facial expressions, she blinks, shows different gestures, and moves. Even if she is on standby, she looks alive: steps from foot to foot, shakes off weed from the skirt. We have not yet worked with the background, this can also become a separate story. Such a character received a very positive response from the audience.

What will happen next? Technology development will lead companies to find answers to the question, what is the best way to use voice to interact with your customers. New use cases will be born that will inspire others with their example. Progress is not in place, and in the future, voice technologies with visualization have every chance of becoming the main interface in the digital world.