📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI has achieved near-complete automation in core engineering tasks, according to recent benchmarks. Research, however, remains less automated, though evidence suggests it may also be rapidly approaching automation. This shift could accelerate AI development significantly.
Recent benchmark data indicates that AI systems now automate the majority of core engineering tasks in AI research, with some experts declaring these tasks effectively solved. Meanwhile, research activities remain less automated but are showing rapid progress, suggesting a potential near-term shift in AI development dynamics.
Multiple independent benchmarks—CORE-Bench, MLE-Bench, and kernel design advancements—demonstrate that AI systems have achieved near-saturation in automating core engineering skills relevant to AI research. For example, CORE-Bench, which measures research reproduction capabilities, reached 95.5% in December 2025, with its creators calling it ‘solved.’ Similarly, MLE-Bench, evaluating performance on Kaggle competitions, hit 64.4% in February 2026, approaching mid-tier human performance.
These benchmarks indicate that AI can now reproduce research outputs and perform competitively in complex engineering tasks, dramatically lowering the cost and time of research replication and development. This progress is supported by a steady trajectory of innovations in kernel design, including automated GPU kernel generation and optimization, which are transitioning into production use.
However, Clark’s analysis notes that while engineering automation appears mature, the automation of research—conceptual, theoretical, and creative aspects—remains less certain. The structural question posed by Clark is whether research itself is just large-scale engineering, which could mean automation will accelerate further than expected.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.
AI research automation tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.
GPU kernel optimization software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.
AI development automation platforms
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational
![WavePad Audio Editing Software - Professional Audio and Music Editor for Anyone [Download]](https://m.media-amazon.com/images/I/B1fcLEGCs6S._SL500_.png)
WavePad Audio Editing Software – Professional Audio and Music Editor for Anyone [Download]
Full-featured professional audio and music editor that lets you record and edit music, voice and other audio recordings
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications of Near-Complete Engineering Automation
The automation of core engineering tasks in AI research could lead to a faster pace of AI development, reducing reliance on human effort for experimental reproduction, infrastructure optimization, and some aspects of research design. This shift may shorten timelines for breakthroughs and influence the landscape of AI innovation, potentially making research cycles more efficient.
Questions remain about the role of human intuition and creativity in research. If research becomes largely automated, the focus may shift toward idea generation and theory development, which are less clearly automated at present.
Recent Benchmarks and Progress in AI Engineering Tasks
Over the past 18 months, multiple benchmarks have tracked AI progress in core research skills. CORE-Bench, measuring research reproduction, improved from 21.5% in September 2024 to 95.5% in December 2025, with the lead author stating it is ‘solved.’ MLE-Bench, assessing performance in Kaggle competitions, rose from 16.9% to 64.4% in the same period, indicating AI now performs competitively with mid-tier human practitioners.
Advances in kernel design—such as automated GPU kernel creation and optimization—have been documented through research papers and industry applications, signaling that engineering capabilities are transitioning from experimental to production-ready. These developments collectively suggest that the engineering component of AI research is nearing full automation, while the research component remains less certain but rapidly progressing.
“Reproducing research papers at 95.5% reliability indicates that the primary challenge has shifted from capability to strategic decision-making regarding reproduction.”
— Thorsten Meyer, researcher
Unresolved Questions on Research Automation Potential
While engineering tasks appear fully automatable, the extent to which creative, theoretical, and conceptual research can be automated remains uncertain. Clark notes that some aspects of research may be inherently distinct from engineering, and whether AI can fully automate these remains an open question. The possibility that research is fundamentally a form of large-scale engineering suggests automation could accelerate further, but concrete evidence is lacking.
Next Steps in Monitoring AI Capability Growth
Researchers and industry observers will continue to track benchmark progress, especially in areas related to research theory and creative problem-solving. Industry applications of automated kernel design and infrastructure optimization are expected to expand, potentially leading to a new phase where AI-driven research becomes standard practice. Additionally, discussions around the ethical and strategic implications of fully automated research are likely to intensify.
Key Questions
What does the automation of engineering tasks mean for AI research?
It indicates that AI can now handle the technical, infrastructural, and reproducibility aspects of research, significantly reducing the time and cost involved in these phases.
Is research itself fully automatable now?
Not yet. While progress suggests it may be approaching, the creative and conceptual aspects of research remain less automatable, and this is an ongoing area of investigation.
How might this shift impact human researchers?
It could reduce the time spent on routine engineering tasks, allowing researchers to focus more on idea development, theory, and strategic planning.
What are the risks of fully automating research?
Potential risks include over-reliance on AI for scientific discovery, loss of human oversight, and ethical concerns about the direction and control of AI-driven research.
Source: ThorstenMeyerAI.com