All Issues
May 25 - May 31, 2026

AI Weekly: Anthropic Hits $965B, Opus 4.8 Rewrites Agent Benchmarks

Models & Releases

3 stories

StepFun 3.7 Flash: 196B MoE Beats Flash Tier, Runs on 128GB RAM

  • StepFun dropped Step 3.7 Flash — a 196B total / 11B active MoE with built-in 1.8B ViT for vision — that outperforms DeepSeek V4 Flash (56.26% vs 55.6%) and Gemini 3.5 Flash on SWE-Bench Pro.
  • The model runs fully locally on 128GB RAM, making it one of the few flash-tier competitors accessible to prosumer hardware; it also scores 92.82% F1 on DeepSearchQA, approaching GPT-5.5’s 93.98%.
  • Available now on OpenRouter and NVIDIA NIM for cloud inference; the local option challenges the assumption that frontier-competitive MoE models require datacenter hardware.

Gemini Omni Flash: Any-Modality Output Now in APIs

  • Google officially launched Gemini Omni Flash at I/O 2026 (May 19), rolling out to developers and enterprise via APIs in coming weeks — the model generates outputs in any modality from any input, starting with video, then image and text.
  • It combines Gemini intelligence with Veo generative media technology, and is available in the Gemini app, Google Flow, and YouTube Shorts; Gemini 3.5 Flash also launched at 4× the speed of comparable frontier models at under half the cost.
  • The launch follows the Google I/O 2026 preview covered in the May 24 edition, representing the shift from leaked demo to developer-accessible product.

People & Business

5 stories

Mistral Announces Physics AI Stack for Airbus, BMW, ASML

  • At the AI Now Summit (May 28), Mistral unveiled a physics AI stack for industrial engineering built on its May 22 acquisition of Emmi; partners include Airbus (across commercial, helicopter, defence, and space divisions), BMW Group (crash simulation Large Industry Model), and ASML (semiconductor design optimisation).
  • Mistral also announced Les Ulis — a 10MW inference data center in Essonne, France opening Q3 2026, its first owned compute infrastructure — reinforcing its sovereign AI and full data-control positioning.
  • The industrial pivot and owned compute represent a significant strategic evolution for Mistral, complementing Vibe’s consumer/developer rebranding announced the same day.

Jensen Huang and Hassabis Push Back on AI Layoff Narrative

  • NVIDIA CEO Jensen Huang called CEOs blaming layoffs on AI ‘irresponsible’ and ’lazy,’ telling Channel NewsAsia ‘AI has just arrived, how is it possible they’re already losing jobs?’ — echoing Google DeepMind CEO Demis Hassabis, who called AI-driven developer layoffs a ’lack of imagination’ from employers.
  • Analysis suggests the real driver for many companies is capital reallocation to GPU budgets rather than genuine AI displacement; cited evidence includes 80% of companies that cut jobs for AI seeing zero improvement in returns, Klarna firing 700 then quietly rehiring, and 29% of employees at AI-layoff companies actively sabotaging adoption.
  • The narrative connects directly to the Meta 8,000 and Coinbase 14% layoff stories from prior editions, where AI was explicitly cited as justification even as broader strategic pivots drove the decisions.

Hassabis: AGI in 4 Years, 'Foothills of the Singularity'

  • Google DeepMind CEO Demis Hassabis said at and after Google I/O 2026 that humanity stands in the ‘foothills of the singularity,’ with AGI potentially arriving within 4 years — or sooner — citing growing confidence the industry has found the right technical path.
  • Hassabis also stated the industry has only a few years left to prepare for AGI, framing the current moment as a critical window for safety, governance, and societal readiness work.
  • The statement is the most specific AGI timeline claim from a major lab CEO in 2026, and was reported in detail by both Axios and MIT Technology Review.

Meta Launches Paid Tiers for Instagram, Facebook, WhatsApp

  • Meta officially launched Plus subscriptions globally — Instagram Plus ($3.99/mo), Facebook Plus ($3.99/mo), WhatsApp Plus ($2.99/mo) — alongside AI plans under the ‘Meta One’ brand: Meta One Plus ($7.99/mo) and Meta One Premium ($19.99/mo) with deeper compute/reasoning and expanded image/video generation.
  • AI plan testing starts next month in Singapore, Guatemala, and Bolivia; the proprietary model powering the AI tier is Muse Spark, from Meta Superintelligence Labs and Alexandr Wang — first launched in April.
  • The monetization push marks a structural shift for Meta’s consumer apps, which have been ad-only for two decades, and signals that frontier AI features will be paywalled even on mass-market platforms.

Policy & Ethics

2 stories

OpenAI Rosalind Biodefense Program Opens to Developers and US Gov

  • OpenAI opened its Rosalind Biodefense program — trusted developers can now apply to build biodefense and pandemic preparedness applications using GPT-Rosalind, OpenAI’s frontier reasoning model for life sciences; access is also expanding to US government and allied partners.
  • Launch partners include Fourth Eon Biosecurity (DNA synthesis screening), Lawrence Livermore National Laboratory (medical countermeasures), Johns Hopkins APL (protein-engineering platform), and CEPI (100 Days Mission, Ebola preparedness).
  • The program is global and open to academic, nonprofit, government, and mission-driven companies — distinct from GPT-Rosalind’s initial introduction in the April 18 edition as a research model, this marks its transition to a structured access and partner program.

Products & Hardware

2 stories

OpenAI Codex Agents Self-Improve on Tax Filing, Cut Time 60%

  • OpenAI published an engineering deep-dive showing how Codex agents can be built to self-improve on tax filing tasks — a feedback loop where agents evaluate their outputs against test cases and iteratively refine their code, cutting filing time by 60%.
  • The system demonstrates a concrete vertical use-case for agentic coding beyond software development, with self-evaluation driving improvement rather than human-in-the-loop correction at each step.
  • The post extends the Codex agentic expansion thread that began with Codex mobile/desktop and continues with domain-specific deployments across legal, finance, and compliance.

Research & Resources

3 stories

NVIDIA LocateAnything: 10x Faster Visual Grounding via Parallel Box Decoding

  • NVIDIA Research introduced LocateAnything with Parallel Box Decoding (PBD), which decodes the full bounding box (x1,y1,x2,y2) as a single atomic step rather than token-by-token — delivering 12.7 boxes/sec on H100, 10× faster than Qwen3-VL and 2.5× faster than Rex-Omni.
  • The model achieves SOTA on ScreenSpot-Pro GUI grounding (60.3 mean F1), DocLayNet (76.8), and LVIS dense detection; trained on LocateAnything-Data with 138M samples, 785M boxes, and 12M images covering detection, GUI grounding, OCR, layout, and referring comprehension.
  • Three inference modes (Fast/Slow/Hybrid) make it suitable for robotics, on-device GUI agents, and document understanding; the architecture is built on Moon-ViT + Qwen2.5 backbone.

PrismML Bonsai Image 4B: 1-bit Diffusion Runs in Browser via WebGPU

  • PrismML released Bonsai Image 4B in binary and ternary variants — 1-bit/ternary text-to-image diffusion transformers that weigh only ~3GB versus FLUX.2 Klein 4B’s ~16GB, making local deployment dramatically more accessible.
  • The model runs 100% in-browser via WebGPU with a live demo at huggingface.co/spaces/webml-community/bonsai-image-webgpu, no install required.
  • Apache 2.0 licensed and trending on r/LocalLLaMA, Bonsai Image represents the latest push in the extreme local inference thread — bringing frontier-quality generation to commodity hardware and even browsers.