the shift
Anthropic
Claims Its New A.I. Model, Mythos, Is a Cybersecurity ‘Reckoning’
The
company said on Tuesday that it was holding back on releasing the new
technology but was working with 40 companies to explore how it could prevent
cyberattacks.
Kevin
Roose
By Kevin
Roose
Reporting
from San Francisco
April 7,
2026
Anthropic,
the artificial intelligence company that recently fought the Pentagon over the
use of its technology, has built a new A.I. model that it claims is too
powerful to be released to the public.
Instead,
Anthropic said on Tuesday, it will make the new model — known as Claude Mythos
Preview — available to a consortium of more than 40 technology companies,
including Apple, Amazon and Microsoft, which will use the model to find and
patch security vulnerabilities in critical software programs.
Anthropic
said it had no plans to release its new technology more widely, but was
announcing the new model’s capabilities in one area in particular — identifying
security vulnerabilities in software — in an effort to sound the alarm over
what the company believes will be a new, scarier era of A.I. threats.
“The goal
is both to raise awareness and to give good actors a head start on the process
of securing open-source and private infrastructure and code,” Jared Kaplan,
Anthropic’s chief science officer, said in an interview.
The
coalition, known as Project Glasswing, will include some of Anthropic’s
competitors in A.I., such as Google, as well as hardware providers like Cisco
and Broadcom, and organizations that maintain critical open-source software,
such as the Linux Foundation. Anthropic is committing up to $100 million in
Claude usage credits to the effort.
Logan
Graham, the head of an Anthropic team that tests new models for dangerous
capabilities, called the new model “the starting point for what we think will
be an industry change point, or reckoning, with what needs to happen now.”
Anthropic
occupies an unusual position in today’s A.I. landscape. It is racing to build
increasingly powerful A.I. systems, and making billions of dollars selling
access to those systems, while also drawing attention to the risks its
technology poses. The company was deemed a supply-chain risk this year by the
Pentagon for demanding certain limitations to the use of its technology. A
federal judge later stopped the designation from going into effect.
Anthropic
has not released much new information about the model, which was code-named
“Capybara” during development. But after some details were inadvertently leaked
last month, the company acknowledged that it considered it a “step change” in
A.I. capabilities, with improved performance in areas like coding and
cybersecurity research.
The
company’s decision to hold back Claude Mythos Preview, while giving access only
to partners out of concern for how it might be misused, has some precedent. In
2019, OpenAI announced it had built a new model, GPT-2, but was not releasing
the full version right away. The company claimed that its text-generation
capabilities could be used to automate the mass-production of propaganda or
misinformation. (It later released the model, after conducting additional
safety testing on it.) Many of the leaders of the GPT-2 project later left
OpenAI to start Anthropic.
This
time, Anthropic is making a different, more urgent claim. The company’s
executives say Claude Mythos Preview is already capable of carrying out
autonomous security research, including scanning for and exploiting so-called
zero-day vulnerabilities in critical software programs, flaws that are unknown
even to the software’s developer. These efforts can often be triggered by
amateurs with simple prompts. The company claims that the new model has already
identified “thousands” of bugs and vulnerabilities in popular software
programs, including every major operating system and browser.
One of
the vulnerabilities Claude found, the company said, was a 27-year-old bug in
OpenBSD, an open-source operating system that was designed to be difficult to
hack. Many internet routers and secure firewalls incorporate OpenBSD’s
technology. Another was a longstanding issue in a piece of popular video
software that automated testing tools had scanned five million times, without
finding any problems.
“This
model is good at finding vulnerabilities that would be well understood and
findable by security researchers,” Mr. Graham said. “At the same time, it has
found vulnerabilities, and in some cases crafted exploits, sophisticated enough
that they were both missed by literally decades of security researchers, as
well as all the automated tools designed to find them.”
Anthropic
announced on Monday that its projected annual revenue had more than tripled in
2026, to more than $30 billion from $9 billion. The growth has come largely
because of the popularity of Anthropic’s Claude as a tool for programming.
Anthropic
has focused on making Claude good at completing lengthy coding tasks, in hopes
of making it more useful to professional programmers and amateur “vibecoders.”
But an A.I. system designed to be good at coding is also good at spotting the
flaws in code — running automated scans for bugs and vulnerabilities that can
allow hackers to take control of users’ machines, expose sensitive user
information or wreak other havoc.
The
cybersecurity industry has been bracing for years for what more capable A.I.
models could do to critical tech infrastructure. Until recently, only expert
human researchers with access to specialized tools were capable of finding the
most severe security vulnerabilities. Now, the fear is that a powerful A.I.
model could discover them on its own.
“Imagine
a horde of agents methodically cataloging every weakness in your technology
infrastructure, constantly,” Nikesh Arora, the chief executive of Palo Alto
Networks, wrote in a blog post last week.
Mr.
Graham said one of the unanswered questions about Claude Mythos Preview, and
other future models that will be capable of doing similar things, was whether
most or all of the world’s critical software would need to be patched or
rewritten as a result of these new models.
“There
are a lot of really critical systems around the world, whether it’s physical
infrastructure or things that protect your personal data, that are running on
old versions of code,” Mr. Graham said. “If these previously were mostly secure
because it took a lot of human effort to attack them, does that paradigm of
security even work anymore?”
It is
wise to take claims about unreleased model capabilities from A.I. companies
with a grain of salt. In this case, though, cybersecurity researchers who have
been given access to Claude Mythos Preview have characterized the model as a
significant cybersecurity risk.
Elia
Zaitsev, the chief technology officer of CrowdStrike, a cybersecurity firm with
access to the new model through Project Glasswing, said in a statement
accompanying Anthropic’s announcement that the model “demonstrates what is now
possible for defenders at scale, and adversaries will inevitably look to
exploit the same capabilities.”
“What
once took months now happens in minutes with A.I.,” Mr. Zaitsev said.
Project
Glasswing takes its name from the glasswing butterfly, Mr. Kaplan said, which
uses transparent wings to hide in plain sight. Similarly, he said, many of
today’s most critical software programs contain bugs and vulnerabilities that
have existed in the open for years, but were buried in such complex technical
systems that no human ever found them.
According
to Mr. Kaplan, the cybersecurity capabilities of Claude Mythos Preview are not
a result of special training. Rather, they are just one of many areas in which
the model is better than previous ones. He predicted that similar cybersecurity
capabilities would exist in other models soon. As that happens, he said, the
arms race between hackers and the companies racing to defend their systems will
only escalate.
“As the
slogan goes, this is the least capable model we’ll have access to in the
future,” he said.
Kevin
Roose is a Times technology columnist and a host of the podcast "Hard
Fork."



Sem comentários:
Enviar um comentário