Technology from Harry Potter has survived. Now to create a full-fledged video of a person, one of his pictures or photos is enough Machine learning researchers from Skolkovo and the Samsung AI center from Moscow published their work on creating such a system, along with a number of videos of celebrities and objects of art that received a new life.
The text of the scientific work can be read here . Everything is quite interesting there, with a lot of formulas, but the meaning is simple: their system is guided by "landmarks", sights of the face, like a nose, two eyes, two eyebrows, a chin line. So she instantly catches what a person is. And then it can transfer everything else (color, texture of the face, mustache, stubble, etc.) to any other video of a person. Adapting the old face to new situations.
Of course, this only works on portraits so far. Models need only one person, with a face turned towards us, so that he can at least see both eyes. Then the system can do anything with it, transfer any mimicry to it. It is enough to give her a suitable video (with another person with his head in approximately the same position).
Earlier, the AI has already learned how to make dipfaces, and Internet users have notably mocked celebrities by inserting them faces in porn and making memes with Nicolas Cage. But for this, they had to train the algorithms with megabytes (or, better, gigabytes) of data, find as many images and videos as possible with celebrities' faces in order to produce more or less decent results. The creator of Deepfakes himself said that it takes 8-12 hours to compile one short clip. The new system generates the result instantly, and at the entrance it only needs one picture.
With the previous system, we would never have been able to look at the live Mona Lisa, we only have one perspective. Now, with benchmarking algorithms, this becomes possible. You can't reach the ideal, but something is already close.
In the work of the Moscow researchers also used generative-adversary network. Two models of the algorithm are fighting with each other. Each is trying to deceive the opponent, and prove to him that the video she creates is real. Thus, a certain level of realism is achieved: the picture of a human face is not issued “into the light” if the model critic is not sure of its authenticity by more than 90%. As the authors say in their work, the images are governed by tens of millions of parameters, but at the expense of such a system, work boils very quickly.
If there are several images, the result improves. Again, the easiest way is to work with celebrities who have already been taken from all possible angles. To achieve the "ideal realism" need 32 pictures. In this case, the generated AI photos in low resolution will be indistinguishable from real photos of a person. Untrained people at this stage are no longer able to identify the fake - perhaps the odds remain with the experts or with the close relatives of the “experimental” from all these images.
If there is only one photo or picture, the result is not always the best. You can see the artifacts in the video, when the head is in motion, without any problems. The researchers themselves say that their weakest point is the look. The model based on the landmarks of the person, while not always understand how and where a person should look.