Chinese tech giant quietly unveils advanced AI model amid battle over TikTok
"This is probably the most impressive model I've seen," one AI expert said.
In the rapidly expanding field of artificial intelligence, the Chinese tech giant behind TikTok this week quietly unveiled an advanced AI model for generating video that leapfrogs the company ahead of its U.S. competition and raises new concerns about the threat of deepfake videos.
ByteDance's OmniHuman-1 model is able to create realistic videos of humans talking and moving naturally from a single still image, according to a paper published by researchers with the tech company.
Experts who spoke to ABC News warned that the technology -- if made available for public use -- could lead to new abuses and magnify the longstanding national-security concerns about Beijing-based ByteDance.
"If you only need one image, then all of the sudden, it's much easier to find a way to target someone," Henry Ajder, a world-leading expert on generative AI told ABC News. "Previously, you might have needed hundreds of images, if not thousands, to create compelling, really interesting videos to train them on."
After training the model on over 18,700 hours of human videos, ByteDance researchers boasted that the technology is "unprecedented" in "accuracy and personalization," with users able to create "extremely realistic human videos" that significantly outperform existing methods. Based on a single still image, users can create content that lacks the telltale signs of artificial generation -- such as issues depicting hand movements or lip syncing -- and can potentially evade AI-detection tools, according to Ajder.
"This is probably the most impressive model I've seen to combine all of these different multimodal activities," Ajder said. "The ability to generate custom voice audio to match the video is notable and then, of course, there's just the fidelity of the actual video outputs themselves. I mean, they're incredibly realistic. They're incredibly impressive."
ByteDance declined ABC News' request for comment, and their research paper offered limited details about the source of the videos used to train the model.
A ByteDance representative told Forbes that the tool, if publicly deployed, would include safeguards against harmful and misleading content. Last year, TikTok announced that the platform would automatically label AI-generated content and generally work to improve AI literacy.
Among the videos released in the research paper, OmniHuman was used to transform a still image of Albert Einstein's portrait into a video of the theoretical physicist delivering a lecture. Other artificially generated videos depicted speakers delivering Ted Talks and musicians playing piano while singing. According to the research paper, the model can generate realistic video at any aspect ratio based on a single image and audio clip.
While the release of the model marks a new advancement in the rapidly growing field of artificial intelligence, it also raises the stakes of the harms that can stem from it, including deepfakes used to influence elections or produce non-consensual pornography, experts said.
According to John Cohen, an ABC News contributor and former head of intelligence at the Department of Homeland Security, the ability to create higher quality videos using AI could lead to "dramatic expansion" of the threats stemming from the content.
"The United States is in the midst of a dynamic and dangerous threat environment that in large part is fueled by online content that is purposely placed there by foreign intelligence services, terrorist groups, criminal organizations and domestic violence groups for the purposes of inspiring and informing criminal and oftentimes violent activities," Cohen said, warning that technology like OmniHuman could allow bad actors to create deep fakes "more effectively, more efficiently and more cheaply."
Ahead of the 2024 election, artificial intelligence was used by Russian individuals to sow discord among voters, including the dissemination of propaganda videos about immigration, crime, and the ongoing war in Ukraine, according to a recent report from the Brookings Institution, a nonpartisan research group.
While state and local authorities were able to correct much of the disinformation in real time, the advancing technology has had sprawling implications abroad. In Bangladesh -- a Muslim majority country -- AI was used to create a scandalous fake image of a politician in a bikini, and in Moldova, similar technology was used to create a fake video of the country's pro-West president supporting a political party aligned with Russia.
Before last year's New Hampshire primary, AI was used to create a phone call impersonating the voice of President Joe Biden encouraging recipients of the call to "save your vote" for the November general election, rather than participate in the critical early primary. The New Hampshire attorney general's office described the calls as "an unlawful attempt to disrupt the New Hampshire Presidential Primary Election and to suppress New Hampshire voters."
While OmniHuman has not been released for public use, Ajder predicted that the tool could soon be rolled out across ByteDance's platforms, including TikTok. The prospect adds to the complex dilemma the United States faces, as companies like ByteDance are required to support and cooperate with operations by China's military and intelligence services, according to Cohen.
ByteDance's technological success comes as the U.S. has invested record amounts of money to advance AI technology. President Donald Trump -- who named a so-called "AI czar" to his administration -- last month announced a $500 billion private sector AI investment between the companies OpenAI, Softbank and Oracle.
"The challenge is that our federal government has for years been too slow to react to this threat environment," Cohen said. "Until we do that, we're going to be behind the eight ball in dealing with these emerging threats."
ABC News' Kerem Inal and Chris Looft contributed to this report.