Data labeling will become a game

The owners of NativeOS, developer of what is known as “the antivirus for brands” based on computer vision, are launching a new startup. The plan is to create a gamified platform for Big Data labeling, Little Big Data, with improved accuracy compared with existing systems. Invest Foresight has spoken with Ivan Soshchik, a co-founder of the company, at a meetup at the Microsoft technology center.

Vitaly Sotnikov, a co-founder of Little Big Data, launched the American startup NativeOS in 2017. The solution prevents ads from appearing in a negative content environment – containing mentions of alcohol, violence, drugs, religion, disaster, pornography, politics, or gambling. The NativeOS platform targets the problem of brand safety, or protecting brands in the digital space, using deep semantic structures and several neural networks; it ensures that brand messages are displayed exclusively in safe places, reducing the advertisers’ reputational losses. This is what the antivirus for brands is about. Microsoft, NVidia, Publicis Groupe and others cooperate with NativeOS; the details of their cooperation are not disclosed. NativeOS even trained the neural network to recognize all types of drugs to effectively avoid unsafe context for one of its clients (the name is not disclosed).

NativeOS needs a lot of labeled data, tagged photos, images, and other content for training neural networks. Initially, the startup ordered ready-made datasets from well-known vendors such as Yandex.Toloka, DBrain and others, but their quality of labelling was not good enough.

“About 20-25% of their data were no good at all,” Vitaly Sotnikov says. “And the quality of datasets is critical for our work. So we started thinking why that happened, and we realized that most data labeling services are not interested in their employees attaining their full potential: they have low motivation, doing boring, monotonous work, and low pay. In addition, people quickly get tired because of the monotony, lose concentration, and begin to make errors.”

To avoid all this, we need to focus not on working people who want to everything here and now, but on students – young people who are only taking their first steps in their careers and are open to all things new, the entrepreneurs decided. Incidentally, there are few IT programs at Russian universities where future IT specialists can achieve the required professional level, for example, rise all the way from front-end and UX up to AI & machine learning.

Little Big Data is ready to offer students an interesting job of data labeling. All they have to do is to enter the platform, register there, and start labeling data. To avoid errors, the work period is restricted to four hours per day, with an obligatory break taken every two hours. With its level of concentration and monotony, a data labeler’s work is similar to that of an air traffic controller, who monitors and directs the movement of aircraft and accepts responsibility for passengers’ lives, and typically has a break every two hours to stay in good shape.

To make the data labeling process less tiresome for employees, the Little Big Data platform provides an opportunity to receive additional education and develop required skills. The startup is collecting analytics on what future IT specialists are willing to learn. In addition, the new big data processing platform has been gamified. For instance, a labeler can play a character and boost skills by fighting with enemies. These enemies are the data that must be labeled. The data is packed in boxes; when facing the enemy, an employee has to label them. These gaming techniques turn the tiresome work into a game, a challenge. For their work, students receive payment as internal currency (Little Big Points), which they can spend on education. The payment is also made in fiducial currency, according to the market tariffs. With good speed and quality of work, a student can earn about RUR 3,000 ($42) per four hours.

Students will be able to use the labeled data to build their own training machines unless the employer has objections.

The work of the labelers is double-checked by other employees to confirm the quality. There are no plans to use bots to check the work – at least so far. Possibly, bots will be used to replace administrators who offer consultations to labelers on the technical specifications as to how a certain image object should be labeled and so on.

Little Big Data is so far at a minimum viable product (MVP) stage; in May, the company will begin recruiting employees, and the platform will be fully launched by the end of the year. Currently, the startup already has a client – SKIDATA, the world market leader in access and revenue management. For the renewed IKEA restaurants in shopping malls, they will label food, which will allow for a service without cashiers. Unlike supermarkets with their self-service checkouts, this process will be implemented with the use of computer vision, with a machine reading the information on what a person has on the tray and drawing up a bill.

As of today, the platform has managed to reduce the level of errors to one percent. The startup currently employs 15 labelers, of whom five are from Russia, including students from Peter the Great St. Petersburg Polytechnic University. Along with images, Little Big Data will soon start processing text and audio files as well.

By Natalia Kuznetsova

STARTUPS, TECHNOLOGY

Data labeling will become a game

FIND US ON SOCIAL MEDIA

SERVICE