The Rise of Hunyuan Video Deepfakes

Date:

Share post:

As a result of nature of a number of the materials mentioned right here, this text will include fewer reference hyperlinks and illustrations than regular.

One thing noteworthy is at present occurring within the AI synthesis neighborhood, although its significance might take some time to turn out to be clear. Hobbyists are coaching generative AI video fashions to breed the likenesses of individuals, utilizing video-based LoRAs on Tencent’s not too long ago launched open supply Hunyuan Video framework.*

Click on to play. Various outcomes from Hunyuan-based LoRA customizations freely obtainable on the Civit neighborhood. By coaching low-rank adaptation fashions (LoRAs), points with temporal stability, which have plagued AI video era for 2 years, are considerably decreased. Sources: civit.ai

Within the video proven above, the likenesses of actresses Natalie Portman, Christina Hendricks and Scarlett Johansson, along with tech chief Elon Musk, have been educated into comparatively small add-on recordsdata for the Hunyuan generative video system, which will be put in with out content material filters (corresponding to NSFW filters) on a consumer’s pc.

The creator of the Christina Hendricks LoRA proven above states that solely 16 photos from the Mad Males TV present had been wanted to develop the mannequin (which is a mere 307mb obtain); a number of posts from the Secure Diffusion neighborhood at Reddit and Discord verify that LoRAs of this sort don’t require excessive quantities of coaching information, or excessive coaching occasions, generally.

Click to play. Arnold Schwarzenegger is delivered to life in a Hunyuan video LoRA that may be downloaded at Civit. See https://www.youtube.com/watch?v=1D7B9g9rY68 for additional Arnie examples, from AI fanatic Bob Doyle.

Hunyuan LoRAs will be educated on both static photos or movies, although coaching on movies requires higher {hardware} sources and elevated coaching time.

The Hunyuan Video mannequin options 13 billion parameters, exceeding Sora’s 12 billion parameters, and much exceeding the less-capable Hunyuan-DiT mannequin launched to open supply in summer time of 2024, which has only one.5 billion parameters.

As was the case two and a half years in the past with Secure Diffusion and LoRA (see examples of Secure Diffusion 1.5’s ‘native’ celebrities here), the foundation model in question has a far more limited understanding of celebrity personalities, compared to the level of fidelity that can be obtained through ‘ID-injected’ LoRA implementations.

Effectively, a customized, personality-focused LoRA gets a ‘free ride’ on the significant synthesis capabilities of the base Hunyuan model, offering a notably more effective human synthesis than can be obtained either by 2017-era autoencoder deepfakes or by attempting to add movement to static images via systems such as the feted LivePortrait.

All the LoRAs depicted here can be downloaded freely from the highly popular Civit community, while the more abundant number of older custom-made ‘static-image’ LoRAs can also potentially create ‘seed’ images for the video creation process (i.e., image-to-video, a pending release for Hunyuan Video, though workarounds are possible, for the moment).

Click to play. Above, samples from a ‘static’ Flux LoRA; below, examples from a Hunyuan video LoRA featuring musician Taylor Swift. Both of these LoRAs are freely available at the Civit community.

As I write, the Civit website offers 128 search results for ‘Hunyuan’*. Nearly all of these are in some way NSFW models; 22 depict celebrities; 18 are designed to facilitate the generation of hardcore pornography; and only seven of them depict men rather than women.

So What’s New?

Due to the evolving nature of the term deepfake, and limited public understanding of the (quite severe) limitations of AI human video synthesis frameworks to date, the significance of the Hunyuan LoRA is not easy to understand for a person casually following the generative AI scene. Let’s review some of the key differences between Hunyuan LoRAs and prior approaches to identity-based AI video generation.

1: Unfettered Local Installation

The most important aspect of Hunyuan Video is the fact that it can be downloaded locally, and that it puts a very powerful and uncensored AI video generation system in the hands of the casual user, as well as the VFX community (to the extent that licenses may allow across geographical regions).

The last time this happened was the advent of the release to open source of the Stability.ai Stable Diffusion model in the summer of 2022. At that time, OpenAI’s DALL-E2 had captured the public imagination, though DALLE-2 was a paid service with notable restrictions (which grew over time).

When Stable Diffusion became available, and Low-Rank Adaptation then made it possible to generate images of the identity of any person (celebrity or not), the huge locus of developer and consumer interest helped Stable Diffusion to eclipse the popularity of DALLE-2; though the latter was a more capable system out-of-the-box, its censorship routines were seen as onerous by many of its users, and customization was not possible.

Arguably, the same scenario now applies between Sora and Hunyuan – or, more accurately, between Sora-grade proprietary generative video systems, and open source rivals, of which Hunyuan is the first – but probably not the last (here, consider that Flux would eventually gain significant ground on Stable Diffusion).

Users who wish to create Hunyuan LoRA output, but who lack effectively beefy equipment, can, as ever, offload the GPU aspect of training to online compute services such as RunPod. This is not the same as creating AI videos at platforms such as Kaiber or Kling, since there is no semantic or image-based filtering (censoring) entailed in renting an online GPU to support an otherwise local workflow.

2: No Need for ‘Host’ Movies and Excessive Effort

When deepfakes burst onto the scene on the finish of 2017, the anonymously-posted code would evolve into the mainstream forks DeepFaceLab and FaceSwap (in addition to the DeepFaceLive real-time deepfaking system).

This methodology required the painstaking curation of 1000’s of face photos of every id to be swapped; the much less effort put into this stage, the much less efficient the mannequin can be. Moreover, coaching occasions various between 2-14 days, relying on obtainable {hardware}, stressing even succesful methods in the long run.

When the mannequin was lastly prepared, it might solely impose faces into present video, and normally wanted a ‘goal’ (i.e., actual) id that was shut in look to the superimposed id.

Extra not too long ago, ROOP, LivePortrait and quite a few related frameworks have supplied related performance with far much less effort, and sometimes with superior outcomes – however with no capability to generate correct full-body deepfakes – or any aspect apart from faces.

Examples of ROOP Unleashed and LivePortrait (inset decrease left), from Bob Doyle’s content material stream at YouTube. Sources: https://www.youtube.com/watch?v=i39xeYPBAAM and https://www.youtube.com/watch?v=QGatEItg2Ns

Against this, Hunyuan LoRAs (and the same methods that can inevitably comply with) enable for unfettered creation of complete worlds, together with full-body simulation of the user-trained LoRA id.

3: Massively Improved Temporal Consistency

Temporal consistency has been the Holy Grail of diffusion video for a number of years now. The usage of a LoRA, along with apposite prompts, provides a Hunyuan video era a relentless id reference to stick to. In idea (these are early days), one might prepare a number of LoRAs of a specific id, every carrying particular clothes.

Beneath these auspices, the clothes too is much less prone to ‘mutate’ all through the course of a video era (because the generative system bases the following body on a really restricted window of prior frames).

(Alternatively, as with image-based LoRA methods, one can merely apply a number of LoRAs, corresponding to id + costume LoRAs, to a single video era)

4: Entry to the ‘Human Experiment’

As I not too long ago noticed, the proprietary and FAANG-level generative AI sector now seems to be so cautious of potential criticism regarding the human synthesis capabilities of its initiatives, that precise individuals hardly ever seem in venture pages for main bulletins and releases. As a substitute, associated publicity literature more and more tends to point out ‘cute’ and in any other case ‘non-threatening’ topics in synthesized outcomes.

With the arrival of Hunyuan LoRAs, for the primary time, the neighborhood has a possibility to push the boundaries of LDM-based human video synthesis in a extremely succesful (somewhat than marginal) system, and to completely discover the topic that the majority pursuits the vast majority of us – individuals.

Implications

Since a seek for ‘Hunyuan’ on the Civit neighborhood largely exhibits superstar LoRAs and ‘hardcore’ LoRAs, the central implication of the arrival of Hunyuan LoRAs is that they are going to be used to create AI pornographic (or in any other case defamatory) movies of actual individuals – celebs and unknowns alike.

For compliance functions, the hobbyists who create Hunyuan LoRAs and who experiment with them on various Discord servers are cautious to ban examples of actual individuals from being posted. The fact is that even picture-based deepfakes are actually severely weaponized; and the prospect of including actually lifelike movies into the combination might lastly justify the heightened fears which were recurrent within the media during the last seven years, and which have prompted new rules.

The Driving Pressure

As ever, porn stays the driving power for know-how. No matter our opinion of such utilization, this relentless engine of impetus drives advances within the state-of-the-art that may in the end profit extra mainstream adoption.

On this case, it’s potential that the value shall be increased than regular, because the open-sourcing of hyper-realistic video creation has apparent implications for prison, political and moral misuse.

One Reddit group (which I cannot identify right here) devoted to AI era of NSFW video content material has an related, open Discord server the place customers are refining ComfyUI workflows for Hunyuan-based video porn era. Each day, customers publish examples of NSFW clips – a lot of which may fairly be termed ‘excessive’, or at the very least straining the restrictions acknowledged in discussion board guidelines.

This neighborhood additionally maintains a considerable and well-developed GitHub repository that includes instruments that may obtain and course of pornographic movies, to supply coaching information for brand new fashions.

Since the most well-liked LoRA coach, Kohya-ss, now helps Hunyuan LoRA coaching, the obstacles to entry for unbounded generative video coaching are decreasing day by day, together with the {hardware} necessities for Hunyuan coaching and video era.

The essential facet of devoted coaching schemes for porn-based AI (somewhat than id-based fashions, corresponding to celebrities) is that a normal basis mannequin like Hunyuan just isn’t particularly educated on NSFW output, and should subsequently both carry out poorly when requested to generate NSFW content material, or fail to disentangle realized ideas and associations in a performative or convincing method.

By growing fine-tuned NSFW basis fashions and LoRAs, it will likely be more and more potential to venture educated identities right into a devoted ‘porn’ video area; in any case, that is solely the video model of one thing that has already occurred for nonetheless photos during the last two and a half years.

VFX

The large enhance in temporal consistency that Hunyuan Video LoRAs supply is an apparent boon to the AI visible results trade, which leans very closely on adapting open supply software program.

Although a Hunyuan Video LoRA strategy generates a complete body and surroundings, VFX corporations have virtually definitely begun to experiment with isolating the temporally-consistent human faces that may be obtained by this methodology, to be able to superimpose or combine faces into real-world supply footage.

Just like the hobbyist neighborhood, VFX corporations should look ahead to Hunyuan Video’s image-to-video and video-to-video performance, which is doubtlessly essentially the most helpful bridge between LoRA-driven, ID-based ‘deepfake’ content material; or else improvise, and use the interval to probe the outer capabilities of the framework and of potential diversifications, and even proprietary in-house forks of Hunyuan Video.

Although the license phrases for Hunyuan Video technically enable the depiction of actual people as long as permission is given, they prohibit its use within the EU, United Kingdom, and in South Korea. On the ‘stays in Vegas’ precept, this doesn’t essentially imply that Hunyuan Video is not going to be utilized in these areas; nevertheless, the prospect of exterior information audits, to implement a rising rules round generative AI, might make such illicit utilization dangerous.

One different doubtlessly ambiguous space of the license phrases states:

‘If, on the Tencent Hunyuan model launch date, the month-to-month lively customers of all services or products made obtainable by or for Licensee is larger than 100 million month-to-month lively customers within the previous calendar month, You could request a license from Tencent, which Tencent might grant to You in its sole discretion, and You aren’t licensed to train any of the rights below this Settlement except or till Tencent in any other case expressly grants You such rights.’

This clause is clearly aimed on the multitude of corporations which are prone to ‘intermediary’ Hunyuan Video for a comparatively tech-illiterate physique of customers, and who shall be required to chop Tencent into the motion, above a sure ceiling of customers.

Whether or not or not the broad phrasing might additionally cowl oblique utilization (i.e., by way of the supply of Hunyuan-enabled visible results output in well-liked motion pictures and TV) might have clarification.

Conclusion

Since deepfake video has existed for a very long time, it will be simple to underestimate the importance of Hunyuan Video LoRA as an strategy to id synthesis, and deepfaking; and to imagine that the developments at present manifesting on the Civit neighborhood, and at associated Discords and subreddits, signify a mere incremental nudge in the direction of actually controllable human video synthesis.

Extra seemingly is that the present efforts signify solely a fraction of Hunyuan Video’s potential to create utterly convincing full-body and full-environment deepfakes; as soon as the image-to-video part is launched (rumored to be occurring this month), a much more granular degree of generative energy will turn out to be obtainable to each the hobbyist {and professional} communities.

When Stability.ai launched Secure Diffusion in 2022, many observers couldn’t decide why the corporate would simply give away what was, on the time, such a invaluable and highly effective generative system. With Hunyuan Video, the revenue motive is constructed instantly into the license – albeit that it could show troublesome for Tencent to find out when an organization triggers the profit-sharing scheme.

In any case, the consequence is similar because it was in 2022: devoted growth communities have shaped instantly and with intense fervor across the launch. A number of the roads that these efforts will take within the subsequent 12 months are certainly set to immediate new headlines.

 

* As much as 136 by the point of publication.

First revealed Tuesday, January 7, 2025

Related articles

Altering Our Lives and Work

AI know-how has built-in into our every day routines greater than most individuals notice. Whether or not we’re...

Estimating Facial Attractiveness Prediction for Livestreams

So far, Facial Attractiveness Prediction (FAP) has primarily been studied within the context of psychological analysis, within the...

Apple vs Samsung vs Google in 2025

As we kick off 2025, AI telephones are the discuss of the tech world. Apple, Samsung, and Google...

Jessica Marie, Founder and CEO of Omnia Technique Group — Philosophy of “Marketing as Truth”, Know-how as a Drive for Good, Balancing Innovation and...

Jessica Marie, Founder and CEO of Omnia Technique Group, leads her firm on the intersection of know-how, ethics,...