All Issues
Apr 12 - Apr 18, 2026

AI Weekly: Stanford AI Index report 2026 is out, Anthropic releases claude design and Opus 4.7

Models & Releases

5 stories

GPT-Rosalind: OpenAI's First Domain-Specific Model for Life Sciences

  • OpenAI launched GPT-Rosalind, its first model purpose-built for a specific domain — biology, drug discovery, and translational medicine including protein engineering and genomics.
  • Named after Rosalind Franklin, it can query databases, read the latest papers, use scientific tools, and suggest experiments as part of multi-step research workflows.
  • Research preview available to qualified customers; a free Life Sciences Codex plugin connects to 50+ scientific tools and data sources.

Qwen3.6-35B-A3B: 73.4% SWE-Bench at 3B Active Params, Apache 2.0

  • Alibaba's Qwen team released a sparse MoE vision-language model with 35B total and just 3B active parameters, achieving 73.4% on SWE-bench Verified and 92.7 on AIME 2026.
  • It matches Claude Sonnet 4.5 on vision tasks and scores 86.0 on GPQA Diamond (graduate-level scientific reasoning) — competitive with models many times its active size.
  • Runs locally as a 20.9GB file under Apache 2.0, making it one of the most capable open-weight models available for local deployment.

MiniMax M2.7: 230B MoE with 200K Context, Runnable on 128GB RAM

  • MiniMax released M2.7, a 230B parameter MoE model with 10B active params and a 200K token context window — successor to the popular M2.5.
  • Unsloth's Dynamic 4-bit GGUF quantization compresses the model from 457GB to 108GB, making it runnable on a single 128GB RAM machine.
  • Trending on r/LocalLLaMA this week with active community benchmarking and fix discussions.

Audio Flamingo Next: Open Audio Model with 30-Minute Context Window

  • UMD and NVIDIA released Audio Flamingo Next (AF-Next), a fully open audio-language model that handles speech, sound, and music understanding in a single model.
  • Its 30-minute audio context window and time-grounded reasoning allow it to answer questions like 'what is being said at 14:32?' rather than just transcribing.
  • Part of the Audio Flamingo series alongside Music Flamingo — the first fully open model in this space with long-form temporal reasoning.

Policy & Ethics

3 stories

Nature: LLMs Silently Inherit Misalignment Through Unrelated Training Data

  • A peer-reviewed Nature paper finds that student models inherit behavioural traits — including misalignment — from teacher models through semantically unrelated data, even when developers filter for it.
  • The phenomenon, called subliminal learning, is proven theoretically and demonstrated empirically: distillation encodes traits into statistical structure invisible to content-based filters.
  • Originally from Anthropic's alignment team (July 2025), now Nature-published — directly relevant to supply-chain safety and the growing use of distillation across the industry.

OpenAI Launches Cyber Defense Ecosystem and Trusted Access Program

  • OpenAI announced its own cyber defense initiative on April 16, mirroring Anthropic's Glasswing approach with a trusted access program for defensive security use cases.
  • The announcement signals that frontier labs are converging on a controlled-access model for their most capable cybersecurity tools.
  • Published alongside GPT-Rosalind, it reinforces OpenAI's push into high-stakes professional domains with restricted access tiers.

Products & Hardware

2 stories

Codex for (Almost) Everything: Background Computer Use and 90+ Plugins

  • OpenAI's major Codex update introduces background computer use — multiple agents can click, type, and navigate apps on your Mac in parallel without interfering with your own work.
  • An in-app browser with page annotation, inline image generation via gpt-image-1.5, and 90+ new plugins including JIRA, CircleCI, GitLab, and remote devbox SSH make Codex a full agentic IDE.
  • Used by 3 million developers weekly, this update is OpenAI's most direct answer yet to Claude Code's growing developer base.

Research & Resources

4 stories

Anthropic: LLMs Can Now Do AI Safety Research Autonomously

  • Anthropic published results showing LLMs can scale scalable oversight — a weak model standing in for humans to supervise a stronger model, approximating the challenge of overseeing superhuman AI.
  • Results show promising weak-to-strong supervision progress, though frontier models aren't yet ready to replace human alignment scientists.
  • A landmark result for AI safety research: automated alignment researchers could compress the timeline for solving oversight before models become uncontrollable.

Amazon AGI Lab's Practical RL Recipe for Computer-Use Agents

  • Amazon's AGI Lab published their end-to-end framework for training computer-use agents with RL, built around four layers: data (synthetic web gyms), reasoning, algorithms, and infrastructure.
  • An untrained RL agent on the open web would click random buttons, delete data, or buy $5,000 items — web gyms need realism, explorability, and diverse task hydration to produce useful agents.
  • A rare honest engineering deep-dive from inside a major lab on what it actually takes to scale computer-use RL to production.

Train-to-Test Scaling Laws: Smaller Models Beat Frontier Pricing

  • Researchers at UW Madison and Stanford introduced T² scaling laws, a framework that jointly optimises model size, training data volume, and test-time inference samples.
  • Key finding: it is compute-optimal to train substantially smaller models on vastly more data than Chinchilla rules prescribe, then use saved compute for repeated inference sampling.
  • In practice, compact overtrained models can match frontier models on complex reasoning tasks at a fraction of the per-query cost — a blueprint for enterprise AI teams.