As a digital art pavillion we ought to open minds to all possiblities of digital art. This is one of that kind of AI's uses that will full magazines and future enterteiment products.
EMO Technology and coming similars, now declared solely for research purposes, will soon be available as a service. Looking at the downside, the quantity of unreal events (including scams) will exponentially increase. An all-fake system! On the positive side, sticking to the humanitarian aspect, one could preserve people's memory, one by one, just by having an audio sample and an average image. Of course, they ARE NOT them, but rather their simulacra. After all, primitive statues had the same task: to transmit and maintain the memory of ancestors. I know it's a romantic perspective that clashes with today's cynicism and its necrophilia. But we live in times where lateral and divergent thinking isn't enough; a strong countercurrent is needed...
The negative scenario, I admit, scares everyone. Harari spoke about it with satisfaction years ago in "Lessons for the 21st Century," and I found him a bit too enthusiastic compared to the analyses in his previous book, "Homo Deus." I must agree with him. If things continue like this, there will be an increasingly profound rift between the real and the virtual. Distrust will grow, and it wouldn't be strange to reach what Isaac Asimov defined 50 years ago as "the Frankenstein complex" when he talked in his books about robots and the self-imposed absolute prohibition of human society to make robots (read advanced technology products) as similar to humans as possible. If you've seen the movie "Bicentennial Man," you have an idea of Asimov's concepts. But honestly, can we afford to give up AI or an AI that helps dementia patients face the rest of their lives with greater relief? It feels like "Matrix," I admit, and I feel a sense of unease myself, but obscurantism terrifies me.
__________________________
La tecnologia EMO e similaria, ora dichiarata solo per finalità di ricerca, sarà a breve disponibile come servizio. A guardare il lato negativo la quantità di eventi non reali (truffe comprese) aumenterà esponenzialmente. Un all fake system! Cercando il positivo, e rimango sul piano umanitario, si potrà conservare la memoria delle genti, una per una, basta avere un campione audio e una immagine media. Certo NON SONO loro, ma un loro simulacro. In fondo le staute primitivamente avevano il medesimo compito: tramandare e mantenere il ricordo degli antenati. Lo so è una prospettiva romantica che stride con il cinismo odierno e la sua necrofilia. Ma viviamo tempi in cui il pensiero laterale e divergente non basta, occorre un controcorrente vigoroso.
Lo scenario negativo, ammetto, spaventa tutti. Harari ne parlava compiaciuto anni fa in Lezioni per il XXI secolo e lo trovai un po' esaltato, rispetto alle analisi del suo precedente libro, Homo Deus. Devo dargli ragione. Se le cose continuano cosi ci sarà una scissura sempre piu profonda fra reale e virtuale. La diffidenza aumenterà e non sarebbe strano arrivare a quello che 50 anni fa Isaac Asimov defini "il complesso di Frankenstein" parlando nei suoi libri di robot, del divieto assoluto autoimposto della società umana a rendere i robot ( leggi prodotto di avanzata tecnologia) quanto piu simili agli uomini. Se hai visto il film L'Uomo Bicentenario hai idea dei concetti di Asimov. MA in tutta onestà possiamo permetterci di rinunciare alla AI o a una AI che aiuti i malati di demenze ad affrontare il resto della loro vita con un sollievo maggiore? Sa di matrix, lo ammetto e auto avverto un senso di inquietitudine, ma l'oscurantismo mi terrorizza.
Luigi Starace
We proposed EMO, an expressive audio-driven portrait-video generation framework. Input a single reference image and the vocal audio, e.g. talking and singing, our method can generate vocal avatar videos with expressive facial expressions, and various head poses, meanwhile, we can generate videos with any duration depending on the length of input video.
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Linrui Tian, Qi Wang, Bang Zhang, Liefeng Bo
Institute for Intelligent Computing, Alibaba Group
Overview of the proposed method. Our framework is mainly constituted with two stages. In the initial stage, termed Frames Encoding, the ReferenceNet is deployed to extract features from the reference image and motion frames. Subsequently, during the Diffusion Process stage, a pretrained audio encoder processes the audio embedding. The facial region mask is integrated with multi-frame noise to govern the generation of facial imagery. This is followed by the employment of the Backbone Network to facilitate the denoising operation. Within the Backbone Network, two forms of attention mechanisms are applied: Reference-Attention and Audio-Attention. These mechanisms are essential for preserving the character's identity and modulating the character's movements, respectively. Additionally, Temporal Modules are utilized to manipulate the temporal dimension, and adjust the velocity of motion.
Post a Comment
0 Comments