All Issues
May 11 - May 17, 2026

AI Weekly: Full-Duplex AI Arrives, Alexa Swallows Rufus

Models & Releases

3 stories

Google Gemini Omni: AI Video Generation Leaked Before I/O

  • A leak ahead of Google I/O 2026 (May 19) revealed Gemini Omni, a conversational AI video generation system capable of generate, remix, and edit operations via chat interface.
  • Built on Veo 3 technology with deep Gemini integration, the tool signals Google’s push to unify its video and conversational AI stacks before the annual developer keynote.
  • The timing puts Gemini Omni in direct competition with OpenAI’s Sora 2 and similar tools from Runway and Kling, with Google under pressure to demonstrate multimodal parity at I/O.

Sakana's 7B RL Conductor Beats GPT-5 by Orchestrating It

  • Sakana AI’s RL Conductor is a 7B model trained via reinforcement learning on just 960 examples (2× H100s) to dynamically route and orchestrate GPT-5, Claude Sonnet 4, Gemini 2.5 Pro, and open-source models.
  • It outperforms each individual frontier model on reasoning and coding benchmarks while consuming far fewer tokens than competing orchestration frameworks.
  • Commercialized via the Fugu platform, RL Conductor suggests a new paradigm where small orchestrator models trained cheaply can outperform expensive direct inference from the models they manage.

People & Business

1 stories

Policy & Ethics

3 stories

Anthropic Maps Two Scenarios for Global AI Leadership by 2028

  • Anthropic’s policy team published a research paper on May 14 exploring two possible global AI trajectories by 2028: one where the US leads responsibly with coordinated governance, and one where a fragmented race leads to geopolitical instability.
  • The paper lands as the Connecticut SB5 frontier model regulation bill (covered May 3) advances through state legislature and amid ongoing debate over federal AI policy frameworks.
  • Anthropic positions the paper as a call for proactive policy investment rather than reactive regulation, emphasizing that the 2025–2026 window is critical for shaping governance norms.

ChatGPT Now Reads Sensitive Context to Reduce Over-Refusals

  • OpenAI on May 14 improved ChatGPT’s ability to recognize nuanced context in sensitive conversations, reducing over-refusals where the model previously declined legitimate requests.
  • The update follows months of community criticism that safety guardrails were too aggressive in medical, legal, and personal-advice contexts where detailed information is genuinely helpful.
  • OpenAI framed the change as a calibration rather than a safety rollback, saying the model now better distinguishes harmful intent from genuine need — part of the broader alignment work covered in our May 10 edition’s ‘Teaching Claude Why’ story.

Products & Hardware

5 stories

ChatGPT Personal Finance: Connect 12,000+ Bank Accounts

  • OpenAI launched personal finance tools in preview on May 15 for ChatGPT Pro subscribers in the US, allowing users to connect accounts from 12,000+ financial institutions for AI-powered spending analysis and financial planning.
  • The feature requires the $100/month Pro tier for now, with a phased rollout planned before expanding to Plus ($20/month) and lower tiers.
  • The move puts OpenAI directly in competition with Mint successors, YNAB, and fintech AI startups, and raises data-handling questions given OpenAI’s advanced account security rollout covered in our May 3 edition.

Codex Goes Mobile and Gets a Windows Sandbox

  • OpenAI expanded Codex to mobile and additional desktop environments on May 14, making the coding agent accessible beyond the web interface and continuing its push into the Coding Agent Wars thread.
  • Separately, OpenAI published an engineering deep-dive on May 13 explaining how it built a safe isolated sandbox to enable Codex on Windows, addressing code execution security for Windows users.
  • Together these releases significantly broaden Codex’s reach just as competitors Cursor and Claude Code accelerate their own mobile and cross-platform roadmaps.

Amazon Merges Rufus and Alexa+ Into One Shopping Agent

  • Amazon on May 13 unified the Rufus e-commerce chatbot and Alexa+ into a single persistent shopping agent called ‘Alexa for Shopping,’ capable of tracking prices, remembering preferences, and acting on behalf of users across Echo devices, Amazon.com, and the app.
  • The consolidation signals Amazon’s recognition that maintaining two parallel AI assistants in the shopping vertical created user confusion and split engineering investment.
  • The move mirrors the broader platform consolidation trend visible in Claude Managed Agents (Apr 11) and ChatGPT Workspace Agents (Apr 25), where conversational and task-execution AI are merging into unified agent surfaces.

Open Design: Open-Source Self-Hostable Alternative to Claude Design

  • nexu-io released Open Design (Apache 2.0) as a local-first, self-hostable alternative to Anthropic’s Claude Design (launched Apr 17), supporting 16 coding agent CLIs auto-detected on PATH including Claude Code, Codex, Cursor, Gemini CLI, Devin, Qwen, and Kimi.
  • It ships with 31 skills covering web prototypes, mobile apps, decks, and dashboards, plus 72 brand-grade design systems (Linear, Stripe, Vercel, Airbnb, Notion, Apple, Anthropic, Cursor) and a BYOK proxy for any OpenAI-compatible endpoint including Ollama and LM Studio.
  • Open Design can import Claude Design export ZIPs and includes an MCP server for Claude Code/Codex/Cursor integration — Docker and Vercel deployable — making it a viable self-hosted escape hatch for teams unwilling to lock into Anthropic’s hosted offering.

Research & Resources

3 stories

Game Boy Color Runs a Real Transformer LLM Without a PC

  • A developer ran Karpathy’s TinyStories-260K (INT8, fixed-point) on a stock 1998 Game Boy Color with EZ Flash Jr and a microSD card, storing the KV cache in cartridge SRAM and using the D-pad as a keyboard for prompts.
  • Output is extremely slow and mostly gibberish, but transformer prefill and autoregressive generation run entirely on original GBC hardware — built using Codex as a coding assistant.
  • The project sits in a growing tradition of extreme-hardware LLM demos alongside the LLM on a 1998 iMac G3 covered in our Apr 11 edition — see also this week’s Intel Optane Kimi K2.5 build below.

Intel Optane Build Runs 1-Trillion-Param Kimi K2.5 at 4 tok/s

  • An enthusiast built a Xeon + 768GB Intel Optane Persistent Memory (DCPMM) system that runs Kimi K2.5 (1T parameters, Q2_K_XL quantization) at approximately 4 tokens per second via hybrid GPU/CPU llama.cpp inference — with Optane PMem sourced cheaply on the secondary market acting as a DRAM tier.
  • Intel discontinued Optane in 2022, making this a community treasure-hunt engineering story: surplus enterprise DCPMM modules offer bandwidth characteristics that consumer DRAM can’t match at comparable cost for massive-model inference.
  • Paired with the Game Boy Color transformer above, this week’s r/LocalLLaMA highlights the full spectrum of local inference ambition — from 1998 8-bit hardware to trillion-parameter quantized giants.