Google released Gemma 4 12B (Apache 2.0), the first mid-sized Gemma with native audio input — no separate vision or audio encoders, projecting raw signals directly into token space.
Benchmark performance sits near the 26B MoE at less than half the memory; it runs locally on 16GB VRAM or unified memory via llama.cpp, MLX, Ollama, and LM Studio.
Google also launched the official Gemma Skills Repository, an agent skill library for building with the Gemma family, now at 150M+ total downloads.
MiniMax released M3, an open-weight model with frontier coding and agentic capability built on MSA (MiniMax Sparse Attention), supporting up to 1M context window with a minimum of 512K.
The model ships with native multimodality and is immediately available on Ollama and SiliconFlow — this week’s top post on r/LocalLLaMA.
M3 positions MiniMax as a serious open-weight competitor at the high end, directly targeting DeepSeek V4 and Qwen 3 in the open-weight coding agent category.
Microsoft AI launched seven MAI models trained from scratch on clean licensed data with zero distillation, co-designed with Maia 200 silicon for a 1.4x efficiency boost.
The headline capability is Frontier Tuning — RL in real-world environments using organizational workflow traces: MAI tuned for Excel matches GPT-5.4 at 10x lower cost, and MAI tuned for McKinsey achieved the highest win rate of any tested model at roughly 10x lower cost.
Microsoft and Mayo Clinic are also co-creating a frontier healthcare AI model trained on de-identified clinical data; owned by Mayo Clinic, deployed internally first, then available to other health systems via Azure Foundry.
JetBrains open-sourced Mellum2 (Apache 2.0) — a 12B total / 2.5B active MoE ‘focal model’ designed for high-frequency, low-latency tasks in multi-model agent systems such as routing, RAG summarization, and planning steps.
Inference time is cut to less than half of comparable models while remaining competitive on code generation, math, and reasoning benchmarks; it handles both natural language and code, a major evolution from the original Mellum (code completion only, Apr 2025).
Technical report at arXiv:2605.31268; available on HuggingFace for private and local deployment.
Anthropic, PBC confidentially submitted Form S-1 to the SEC for a proposed IPO of common stock, described as potentially one of the largest AI listings ever attempted.
The filing comes weeks after the Series H at a $965B valuation with $47B ARR, signalling the company is moving quickly to public markets while its financial position is at a historic high.
No pricing or timeline has been disclosed; the confidential S-1 process gives Anthropic flexibility to gauge institutional appetite before a public filing.
Meta unveiled an enterprise AI agent aimed at automating day-to-day business operations, entering the enterprise AI race directly against Microsoft Copilot and Salesforce Agentforce.
The agent is positioned to handle multi-step operational workflows across departments, extending Meta’s AI reach well beyond its consumer social platforms.
OpenAI’s frontier models and Codex are now generally available on AWS in both Commercial and GovCloud regions, removing procurement, security, and governance friction for enterprise customers.
Codex (5M+ weekly users) is now natively inside AWS developer environments via Amazon Bedrock; upcoming: Daybreak — OpenAI’s cyber/security suite including threat modeling, patch validation, and dependency risk analysis — also coming to AWS.
A 269-page bipartisan House draft from Reps. Obernolte and Trahan — dubbed ‘The Great American Artificial Intelligence Act’ — proposes a 3-year federal preemption of all state AI model development laws, including New York and California safety protocol requirements.
Frontier AI developers (OpenAI, Anthropic, Google DeepMind, xAI) would be required to implement catastrophic risk mitigation plans, providing a federal floor while blocking the emerging patchwork of state regimes.
The bill arrives days after Trump signed an EO for voluntary federal agency reviews of frontier models, and directly after Connecticut’s SB5 passed its Senate 32-4 in May, illustrating the accelerating tension between state and federal AI governance.
Anthropic expanded Project Glasswing — the joint industry initiative using Claude Mythos to find and fix critical software vulnerabilities — to approximately 150 new organizations across 15+ countries, adding Claude Security for codebase scanning and patch suggestions.
The global expansion signals a shift from a US-centric coalition to a multinational defensive security infrastructure, with critical infrastructure operators now participating across Europe, Asia-Pacific, and Latin America.
Announced at COMPUTEX 2026 by Jensen Huang, NVIDIA Cosmos 3 is an open physical AI world foundation model offering four capabilities: vision-language reasoning for real-time alerts and logistics, World Action Models (WAMs) for robot policy learning, physics-grounded world simulation for closed-loop evaluation, and synthetic video data generation from text/image/video/audio/action inputs.
The open ecosystem includes Cosmos Curator (data curation), Cosmos Evaluator (scoring generative outputs), and open post-training/inference frameworks; optimised for NVIDIA RTX PRO 6000 Blackwell and GB200.
Cosmos 3 targets robotics, autonomous vehicles, and industrial vision — providing the synthetic training data pipeline that physical AI teams previously had to build themselves.
At GTC Taipei, NVIDIA announced Nemotron 3 Ultra — a 550B MoE model with 5x faster inference and 30% lower cost vs open frontier peers, post-trained for LangChain, OpenHands, OpenClaw, Hermes Agent, and OpenCode; available on HuggingFace, OpenRouter, and NIM.
NemoClaw is an open blueprint framework connecting Nemotron models to enterprise harnesses, already deployed at Cadence (autonomous chip design/verification), Dassault Systèmes, Siemens Fuse EDA, Synopsys, and Foxconn (Nurabot clinical AI + MoMClaw factory ops).
OpenShell is a secure agent runtime with policy and privacy controls, partnering with Microsoft (Windows security primitives), Canonical (Ubuntu snaps), Red Hat, SAP (Joule Studio), and ServiceNow (Project Arc); CUDA-X libraries are available as agent skills in the Claude Code plug-in marketplace and Hermes Skills Hub.
OpenAI expanded Codex with six role-specific plugins (analysts, marketers, sales, designers, investors), Codex Sites (deploy hosted internal apps from a prompt), and Annotations (in-place editing of documents).
The expansion broadens Codex beyond software engineering to every knowledge-worker role, following the GA launch on AWS announced the same week and building on 5M+ weekly active users.
Codex Sites in particular creates a new category — a non-technical user can deploy a functional internal web app by describing it, with no code written and no infrastructure managed.
OpenAI updated GPT-Rosalind with stronger agentic coding, drug-discovery, and genomics performance, plus new plugins for evidence retrieval and bioinformatics workflows.
The update builds on the original GPT-Rosalind launch covered in the April 18 edition, which introduced the model as OpenAI’s first domain-specific offering for life sciences; this release adds operational tooling for researchers and clinical workflows.
OpenAI launched Dreaming V3, a compute-efficient background memory synthesis system that replaces ChatGPT’s saved memories — memories auto-update as time passes, correct stale context, and are reviewable via a memory summary page.
The system achieves approximately 5x lower compute cost vs the previous approach; rolling out to Plus/Pro users in the US now, with Free user rollout over coming weeks.
Dreaming V3 shifts ChatGPT memory from an explicit user-managed list to a continuously maintained model of each user — a significant architectural change in how personalisation is delivered at scale.
arXiv:2605.30621 disentangles two distinct capabilities in self-evolving LLM agents: harness-updating (writing improved prompts/skills/memory) is flat across capability tiers — a 9B Qwen model produces updates yielding the same gains as Claude Opus 4.6.
Harness-benefit is non-monotonic: weak models activate little benefit from updates, mid-tier models benefit most, and strong models gain less than mid-tier — meaning you should invest capability budget in the task-solving agent, not the evolver.