Thinking Machines Lab (ex-OpenAI) released TML-Interaction-Small, a 276B MoE (12B active) model natively built for real-time full-duplex interaction with a 200ms micro-turn architecture that continuously interleaves audio, video, and text without turn boundaries.
It outperforms GPT Realtime-2.0 and Gemini Flash Live on FD-bench while remaining competitive on intelligence benchmarks, and introduces new capabilities like time-awareness (speak at user-specified times) and visual proactivity (count pushups, respond to visual cues mid-stream).
First model to meaningfully score on TimeSpeak, CueSpeak, and RepCount benchmarks where all existing models score near-zero; research preview is coming.
A leak ahead of Google I/O 2026 (May 19) revealed Gemini Omni, a conversational AI video generation system capable of generate, remix, and edit operations via chat interface.
Built on Veo 3 technology with deep Gemini integration, the tool signals Google’s push to unify its video and conversational AI stacks before the annual developer keynote.
The timing puts Gemini Omni in direct competition with OpenAI’s Sora 2 and similar tools from Runway and Kling, with Google under pressure to demonstrate multimodal parity at I/O.
Sakana AI’s RL Conductor is a 7B model trained via reinforcement learning on just 960 examples (2× H100s) to dynamically route and orchestrate GPT-5, Claude Sonnet 4, Gemini 2.5 Pro, and open-source models.
It outperforms each individual frontier model on reasoning and coding benchmarks while consuming far fewer tokens than competing orchestration frameworks.
Commercialized via the Fugu platform, RL Conductor suggests a new paradigm where small orchestrator models trained cheaply can outperform expensive direct inference from the models they manage.
Coursera and Udemy completed their merger on May 11, creating one of the world’s largest skills platforms with 290M learners, 18,000 enterprise customers, 95,000 content creators, and 315,000+ courses.
The deal is framed explicitly around AI-era workforce transformation — the combined entity aims to move from a content catalog to a ‘skills delivery platform’ connecting learning to real-world job outcomes.
OpenAI disclosed on May 13 that a supply chain attack targeted TanStack — a widely-used JavaScript library family — via a malicious npm package, and detailed how it detected and responded to the threat.
The incident is notable for involving AI coding tool infrastructure; TanStack is commonly used in projects built with and scaffolded by Codex, Cursor, and similar agents.
Anthropic’s policy team published a research paper on May 14 exploring two possible global AI trajectories by 2028: one where the US leads responsibly with coordinated governance, and one where a fragmented race leads to geopolitical instability.
Anthropic positions the paper as a call for proactive policy investment rather than reactive regulation, emphasizing that the 2025–2026 window is critical for shaping governance norms.
OpenAI on May 14 improved ChatGPT’s ability to recognize nuanced context in sensitive conversations, reducing over-refusals where the model previously declined legitimate requests.
The update follows months of community criticism that safety guardrails were too aggressive in medical, legal, and personal-advice contexts where detailed information is genuinely helpful.
OpenAI framed the change as a calibration rather than a safety rollback, saying the model now better distinguishes harmful intent from genuine need — part of the broader alignment work covered in our May 10 edition’s ‘Teaching Claude Why’ story.
Anthropic launched Claude Platform on AWS as GA on May 11, bringing full native Claude API features — Managed Agents, Skills, MCP connector, Files API, Code Execution, web search, prompt caching, batch processing — accessible via AWS IAM auth and CloudTrail audit logging.
The key distinction from Amazon Bedrock: Anthropic operates the service with data processed outside the AWS boundary (full feature parity, day-one new model access), whereas Bedrock positions AWS as data processor for stricter residency requirements; see also the related OpenAI-on-Bedrock announcement from our May 3 edition.
Available in most AWS commercial regions with billing via a single AWS invoice that retires against existing AWS commitments — a significant procurement simplification for enterprise customers already standardized on AWS.
OpenAI launched personal finance tools in preview on May 15 for ChatGPT Pro subscribers in the US, allowing users to connect accounts from 12,000+ financial institutions for AI-powered spending analysis and financial planning.
The feature requires the $100/month Pro tier for now, with a phased rollout planned before expanding to Plus ($20/month) and lower tiers.
OpenAI expanded Codex to mobile and additional desktop environments on May 14, making the coding agent accessible beyond the web interface and continuing its push into the Coding Agent Wars thread.
Together these releases significantly broaden Codex’s reach just as competitors Cursor and Claude Code accelerate their own mobile and cross-platform roadmaps.
Amazon on May 13 unified the Rufus e-commerce chatbot and Alexa+ into a single persistent shopping agent called ‘Alexa for Shopping,’ capable of tracking prices, remembering preferences, and acting on behalf of users across Echo devices, Amazon.com, and the app.
The consolidation signals Amazon’s recognition that maintaining two parallel AI assistants in the shopping vertical created user confusion and split engineering investment.
nexu-io released Open Design (Apache 2.0) as a local-first, self-hostable alternative to Anthropic’s Claude Design (launched Apr 17), supporting 16 coding agent CLIs auto-detected on PATH including Claude Code, Codex, Cursor, Gemini CLI, Devin, Qwen, and Kimi.
It ships with 31 skills covering web prototypes, mobile apps, decks, and dashboards, plus 72 brand-grade design systems (Linear, Stripe, Vercel, Airbnb, Notion, Apple, Anthropic, Cursor) and a BYOK proxy for any OpenAI-compatible endpoint including Ollama and LM Studio.
Open Design can import Claude Design export ZIPs and includes an MCP server for Claude Code/Codex/Cursor integration — Docker and Vercel deployable — making it a viable self-hosted escape hatch for teams unwilling to lock into Anthropic’s hosted offering.
Microsoft researchers released the DELEGATE-52 benchmark spanning 52 professional domains and found that frontier AI models consistently corrupt documents and lose content across extended multi-step tasks — only Python programming met a readiness threshold after 20 delegated interactions.
Agentic systems with tool access actually performed worse than base models in many domains, suggesting that tool-calling scaffolding introduces new failure modes at scale.
The findings challenge vendor claims about agentic readiness and connect to the Symphony/Codex orchestration work from our May 3 edition — implying that orchestration frameworks need better state preservation mechanisms before real delegation is viable.
A developer ran Karpathy’s TinyStories-260K (INT8, fixed-point) on a stock 1998 Game Boy Color with EZ Flash Jr and a microSD card, storing the KV cache in cartridge SRAM and using the D-pad as a keyboard for prompts.
Output is extremely slow and mostly gibberish, but transformer prefill and autoregressive generation run entirely on original GBC hardware — built using Codex as a coding assistant.
An enthusiast built a Xeon + 768GB Intel Optane Persistent Memory (DCPMM) system that runs Kimi K2.5 (1T parameters, Q2_K_XL quantization) at approximately 4 tokens per second via hybrid GPU/CPU llama.cpp inference — with Optane PMem sourced cheaply on the secondary market acting as a DRAM tier.
Intel discontinued Optane in 2022, making this a community treasure-hunt engineering story: surplus enterprise DCPMM modules offer bandwidth characteristics that consumer DRAM can’t match at comparable cost for massive-model inference.
Paired with the Game Boy Color transformer above, this week’s r/LocalLLaMA highlights the full spectrum of local inference ambition — from 1998 8-bit hardware to trillion-parameter quantized giants.