OpenAI released GPT-5.5 on April 23, citing strong gains in agentic coding, computer use, knowledge work, and early scientific research — areas where reasoning across context over time matters most.
It surpasses Claude Opus 4.7 on Terminal-Bench 2.0 (82.7% vs 69.4%) and GDPval (84.9% vs 80.3%), while matching GPT-5.4 per-token latency and using fewer tokens to complete the same Codex tasks.
Rolling out to Plus, Pro, Business, and Enterprise in ChatGPT and Codex; API access coming soon with additional safety requirements for partners.
DeepSeek released V4-Pro (1.6T params, 49B active) and V4-Flash (284B, 13B active), both with 1M token context, under MIT licence — a year after the original R1 moment and being hailed as the second DeepSeek moment.
At $5.22 per million tokens (blended), it costs roughly one-sixth of GPT-5.5 ($35.00) and one-sixth of Opus 4.7 ($30.00) for comparable performance on most benchmarks.
Key architectural advance: a hybrid CSA+HCA attention mechanism that reduces 1M-context inference to just 27% of the FLOPs and 10% of the KV cache required by V3.2.
Moonshot AI open-sourced Kimi K2.6, featuring long-horizon execution and agent swarm capabilities built for real engineering tasks, not just benchmarks.
In one demo, the model autonomously overhauled an 8-year-old financial matching engine over 13 hours and 1,000+ tool calls, delivering a 185% throughput improvement without human intervention.
A second demo deployed and optimised Qwen3.5-0.8B inference in Zig — a niche, out-of-distribution language — over 12 hours, achieving 20% higher throughput than LM Studio.
Alibaba's Qwen team released Qwen 3.6 27B, a dense model that outperforms the much larger Qwen 397B MoE variant on several key benchmarks — a striking demonstration of training efficiency over raw scale.
Available under Apache 2.0, it runs on consumer-grade hardware and has quickly become the top discussion thread on r/LocalLLaMA this week.
Part of the continuing Qwen series that has consistently punched above its weight class; follows last week's Qwen3.6-35B-A3B MoE release.
SpaceX — now merged with xAI in a combined $1.25 trillion entity — struck a deal giving it the right to acquire Cursor for $60 billion later this year, or pay $10 billion for their joint work together.
Cursor CEO Michael Truell confirmed the partnership, saying he's excited to scale Composer with the SpaceX team; Cursor was simultaneously raising $2 billion at a $50B+ valuation from a16z, NVIDIA, and Thrive Capital.
xAI's direct entry into the coding wars positions it against OpenAI Codex and Anthropic Claude Code — the same Cursor whose Composer 2 model was found to have used Kimi K2.5 without attribution in March.
Canadian AI lab Cohere is acquiring German startup Aleph Alpha in a deal that creates a $20 billion combined entity, with Cohere shareholders receiving ~90% and Aleph Alpha shareholders ~10%.
The deal is explicitly positioned to give European governments and enterprises an AI provider that is not a US tech giant; the German government is set to become an anchor customer.
Both governments facilitated the merger — the first major transatlantic AI consolidation, and a signal that the European AI sovereignty push is moving from policy to M&A.
Anthropic signed a new agreement with Amazon securing up to 5 gigawatts of compute capacity covering Trainium2 (shipping Q2 2026), Trainium3, Trainium4, and future custom silicon generations.
The full Claude Platform will now be available natively inside AWS — same account, same billing, no new credentials or contracts required for existing AWS customers.
Coming the same week as Meta's tens-of-millions-of-Graviton-cores deal, AWS is consolidating as the infrastructure backbone for multiple competing frontier AI labs simultaneously.
Anthropic published a detailed postmortem explaining that three separate bugs — a reasoning effort downgrade in March, a memory-clearing bug in late March, and a verbosity prompt change in April — combined to produce month-long quality complaints.
Because each change affected a different slice of traffic on a different timeline, the aggregate effect looked like unexplained broad degradation that was difficult to distinguish from normal feedback variation.
All three issues are resolved as of v2.1.116; Anthropic reset usage limits for all subscribers as a goodwill gesture — published on the same day GPT-5.5 launched.
A four-institution study (UCLA, MIT, Oxford, CMU; n=1,222) gave participants AI assistants for cognitive tasks and then removed access — those who had briefly used AI performed significantly worse than controls who never had access at all.
Replicated across three experiments scaling from 350 to the full cohort, the effect held across both mathematical reasoning and reading comprehension tasks.
The study is the first controlled causal evidence of AI-induced cognitive atrophy: short AI exposure degrades unaided performance and reduces willingness to try independently.
The UK government is reportedly considering invoking the break clause on Palantir's NHS Federated Data Platform contract, following sustained pressure from MPs, unions, and digital rights campaigners over data privacy concerns.
The contract has been controversial since its award, with critics arguing the platform gives a US defence contractor excessive access to NHS patient data.
If invoked, it would be one of the largest public-sector AI contract cancellations in the UK and would significantly set back the NHS digital data agenda.
OpenAI launched Workspace Agents, an evolution of GPTs powered by Codex — shared, organisation-wide agents that can handle complex long-running tasks across Slack, email, and internal tools.
Unlike GPTs, Workspace Agents run persistently in the cloud, operate across team workflows with approval gates, and are built for handoffs and context sharing between team members.
Available in research preview for Business, Enterprise, and Edu plans; GPTs will eventually migrate to the Workspace Agents format.
OpenAI's Images 2.0 adds a 'thinking' mode with built-in compositional reasoning, up to 2K resolution, and dramatically improved text rendering — handling small text, UI elements, iconography, and dense layouts that break most image models.
With a December 2025 knowledge cutoff, it can handle end-to-end creative workflows from copywriting through design composition — OpenAI's response to the Claude Design launch the previous week.
Available now to ChatGPT users with flexible aspect ratio support.
xAI launched Grok Voice Think Fast 1.0, a voice model designed for complex multi-step enterprise workflows including customer support, sales, and operational automation.
It tops the Tau Voice Bench leaderboard and ships directly embedded in Starlink — the first frontier AI voice model deployed inside a satellite internet service.
Extends the Audio AI race (alongside Mistral Voxtral, Gemini Flash Live, MAI-Voice-1) into a new distribution channel: satellite connectivity infrastructure.
Mistral's Leanstral is an open-source coding agent that uses Lean 4 theorem proving to produce mathematically guaranteed-correct code — the first serious attempt to eliminate human-in-the-loop code review through formal verification rather than testing.
A 119B MoE model with 6.5B active parameters, it outperforms Claude 4.6, Qwen, and Kimi on its target tasks; the catch is that developers write specs in Lean rather than reviewing generated output.
The debate is whether this shifts human oversight from code review to spec quality — and whether that is genuinely safer, or just harder to audit.
FairyFuse runs ternary-weight LLMs on commodity CPUs using AVX-512 fused masked add/subtract loops — eliminating floating-point multiplications entirely from the inference hot path.
Achieves a 29.6x kernel speedup and 32.4 tokens/second on a single Intel Xeon 8558P, outperforming llama.cpp Q4_K_M by 1.24x with near-lossless quality (WikiText-2 perplexity 5.52 vs 5.47 for FP16).
Key insight: ternary 16x weight compression shifts the memory-bound GEMV bottleneck toward compute — which is exactly where AVX-512 wins.