About Experience Blog Newsletter Publications

About Experience Blog Newsletter Publications

Apr 12 - Apr 18, 2026

AI Weekly: Stanford AI Index report 2026 is out, Anthropic releases claude design and Opus 4.7

Models & Releases 5 Policy & Ethics 3 Products & Hardware 2 Research & Resources 4

Models & Releases

5 stories

Claude Opus 4.7: First Model with Glasswing Cyber Safeguards

Anthropic released Opus 4.7 as the first publicly available model with built-in Glasswing-era cyber safeguards — automatic detection and blocking of prohibited cybersecurity uses.
Cyber capabilities are deliberately reduced compared to Mythos Preview, the first time Anthropic has applied differential capability reduction as a safety measure.
Strong software engineering improvements let users hand off complex long-running coding tasks with confidence; same pricing at $5/$25 per million tokens.

GPT-Rosalind: OpenAI's First Domain-Specific Model for Life Sciences

OpenAI launched GPT-Rosalind, its first model purpose-built for a specific domain — biology, drug discovery, and translational medicine including protein engineering and genomics.
Named after Rosalind Franklin, it can query databases, read the latest papers, use scientific tools, and suggest experiments as part of multi-step research workflows.
Research preview available to qualified customers; a free Life Sciences Codex plugin connects to 50+ scientific tools and data sources.

Qwen3.6-35B-A3B: 73.4% SWE-Bench at 3B Active Params, Apache 2.0

Alibaba’s Qwen team released a sparse MoE vision-language model with 35B total and just 3B active parameters, achieving 73.4% on SWE-bench Verified and 92.7 on AIME 2026.
It matches Claude Sonnet 4.5 on vision tasks and scores 86.0 on GPQA Diamond (graduate-level scientific reasoning) — competitive with models many times its active size.
Runs locally as a 20.9GB file under Apache 2.0, making it one of the most capable open-weight models available for local deployment.

MiniMax M2.7: 230B MoE with 200K Context, Runnable on 128GB RAM

MiniMax released M2.7, a 230B parameter MoE model with 10B active params and a 200K token context window — successor to the popular M2.5.
Unsloth’s Dynamic 4-bit GGUF quantization compresses the model from 457GB to 108GB, making it runnable on a single 128GB RAM machine.
Trending on r/LocalLLaMA this week with active community benchmarking and fix discussions.

Audio Flamingo Next: Open Audio Model with 30-Minute Context Window

UMD and NVIDIA released Audio Flamingo Next (AF-Next), a fully open audio-language model that handles speech, sound, and music understanding in a single model.
Its 30-minute audio context window and time-grounded reasoning allow it to answer questions like ‘what is being said at 14:32?’ rather than just transcribing.
Part of the Audio Flamingo series alongside Music Flamingo — the first fully open model in this space with long-form temporal reasoning.

Policy & Ethics

3 stories

UK Government Independently Confirms Mythos Cyber Capabilities

The UK AI Security Institute independently evaluated Claude Mythos Preview and found it succeeds 73% of the time on expert-level CTF challenges that no model could complete before April 2025.
In multi-stage cyber range simulations, Mythos executed attacks on vulnerable networks and exploited vulnerabilities autonomously — tasks that would take human professionals days of work.
The AISI has tracked AI cyber capabilities since 2023 and calls Mythos a step change over previous frontier models — the strongest third-party validation of Anthropic’s own Glasswing findings.

Nature: LLMs Silently Inherit Misalignment Through Unrelated Training Data

A peer-reviewed Nature paper finds that student models inherit behavioural traits — including misalignment — from teacher models through semantically unrelated data, even when developers filter for it.
The phenomenon, called subliminal learning, is proven theoretically and demonstrated empirically: distillation encodes traits into statistical structure invisible to content-based filters.
Originally from Anthropic’s alignment team (July 2025), now Nature-published — directly relevant to supply-chain safety and the growing use of distillation across the industry.

OpenAI Launches Cyber Defense Ecosystem and Trusted Access Program

OpenAI announced its own cyber defense initiative on April 16, mirroring Anthropic’s Glasswing approach with a trusted access program for defensive security use cases.
The announcement signals that frontier labs are converging on a controlled-access model for their most capable cybersecurity tools.
Published alongside GPT-Rosalind, it reinforces OpenAI’s push into high-stakes professional domains with restricted access tiers.

Products & Hardware

2 stories

Claude Design Launches: Prompt to Prototype, Figma Stock Nosedives

Anthropic officially launched Claude Design on April 17 — a prompt-to-prototype tool that generates UI mockups, presentations, and visuals with Canva export, PPTX, and PDF support.
Figma stock immediately nosedived despite Anthropic positioning it as complementary; Figma commands 80–90% of the UI/UX design market and the market read the signals clearly.
MCP integrations are planned for third-party tool connections — consistent with Anthropic’s pattern of entering software verticals with Claude-native products.

Codex for (Almost) Everything: Background Computer Use and 90+ Plugins

OpenAI’s major Codex update introduces background computer use — multiple agents can click, type, and navigate apps on your Mac in parallel without interfering with your own work.
An in-app browser with page annotation, inline image generation via gpt-image-1.5, and 90+ new plugins including JIRA, CircleCI, GitLab, and remote devbox SSH make Codex a full agentic IDE.
Used by 3 million developers weekly, this update is OpenAI’s most direct answer yet to Claude Code’s growing developer base.

Research & Resources

4 stories

Stanford AI Index 2026: Coding Benchmarks Solved, China Reaches Parity

SWE-bench Verified performance jumped from 60% to near 100% in a single year; enterprise AI adoption hit 88% and 4 in 5 university students now use generative AI.
China has erased the US lead in frontier model production — the two countries are now neck-and-neck — while public trust in AI oversight and transparency hit new lows.
The US still outspends every other country on AI but is struggling to attract top talent; the full 400-page report is freely available at the link.

Anthropic: LLMs Can Now Do AI Safety Research Autonomously

Anthropic published results showing LLMs can scale scalable oversight — a weak model standing in for humans to supervise a stronger model, approximating the challenge of overseeing superhuman AI.
Results show promising weak-to-strong supervision progress, though frontier models aren’t yet ready to replace human alignment scientists.
A landmark result for AI safety research: automated alignment researchers could compress the timeline for solving oversight before models become uncontrollable.

Amazon AGI Lab's Practical RL Recipe for Computer-Use Agents

Amazon’s AGI Lab published their end-to-end framework for training computer-use agents with RL, built around four layers: data (synthetic web gyms), reasoning, algorithms, and infrastructure.
An untrained RL agent on the open web would click random buttons, delete data, or buy $5,000 items — web gyms need realism, explorability, and diverse task hydration to produce useful agents.
A rare honest engineering deep-dive from inside a major lab on what it actually takes to scale computer-use RL to production.

Train-to-Test Scaling Laws: Smaller Models Beat Frontier Pricing

Researchers at UW Madison and Stanford introduced T² scaling laws, a framework that jointly optimises model size, training data volume, and test-time inference samples.
Key finding: it is compute-optimal to train substantially smaller models on vastly more data than Chinchilla rules prescribe, then use saved compute for repeated inference sampling.
In practice, compact overtrained models can match frontier models on complex reasoning tasks at a fraction of the per-query cost — a blueprint for enterprise AI teams.

Previous Issue AI Weekly: Mythos Escapes Its Sandbox, Meta Abandons Open Source Next Issue AI Weekly: GPT-5.5 vs DeepSeek-V4, SpaceX Bets $60B on Cursor

Deepak Baby

Senior Data Scientist at KBC Bank & Verzekering

Home About Experience Blog Newsletter Publications

© 2026 Deepak Baby

Made with ♥ using Hugo