Résumé IA
MiniMax lance M2.7, un modèle open source chinois qui atteint les performances de GLM-5 (50 sur l'Intelligence Index d'Artificial Analysis) à seulement un tiers de son coût — 0,30 $/1,20 $ par million de tokens. Le modèle affiche 56,22 % sur SWE-Pro et 57 % sur Terminal Bench 2, se classe au-dessus de MiMo-V2-Pro et Kimi K2.5, et intègre une première capacité d'auto-évolution capable de gérer 30 à 50 % de son propre workflow de développement. Concurrent direct, Xiaomi MiMo-V2-Pro s'impose aussi comme modèle de raisonnement API-only avec 1M tokens de contexte et une efficacité token supérieure à ses pairs.
Not 2 months after their IPO and first public quarter , MiniMax is back in the news with MiniMax 2.7 , a nice bright spot in Chinese Open Models after the changeover in Qwen . They match Z.ai’s GLM-5 SOTA open model from last month , but the story is efficiency here (see green quadrant in Artificial Analysis’ chart ): The team calls out “ Early Echoes of Self-Evolution ”, calling it “our first model deeply participating in its own evolution.”, recalling Karpathy’s Autoresearch , although they only claim that “M2.7 is capable of handling 30%-50% of the workflow.”: They also report some work on multi-agent collaboration (“Agent Teams”) as well as follow Anthropic and OpenAI’s lead in applying their models for finance usecases . Finally, they launch OpenRoom , an open source demo for entertainment usecases. AI News for 3/18/2026-3/19/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space . You can opt in/out of email frequencies! AI Twitter Recap MiniMax M2.7, Xiaomi MiMo-V2-Pro, and the expanding “self-evolving agent” model class MiniMax M2.7 is the headline model release : MiniMax positioned M2.7 as its first model that “deeply participated in its own evolution,” claiming 56.22% on SWE-Pro , 57.0% on Terminal Bench 2 , 97% skill adherence across 40+ skills , and parity with Sonnet 4.6 in OpenClaw . A follow-up says the internal harness also recursively improved itself—collecting feedback, building eval sets, and iterating on skills/MCP, memory, and architecture ( thread ). Third-party coverage broadly echoed the “self-evolving” framing, including TestingCatalog and kimmonismus . Artificial Analysis places M2.7 on the cost/performance frontier : Artificial Analysis reports 50 on its Intelligence Index, matching GLM-5 (Reasoning) while costing $176 to run the full index at $0.30/$1.20 per 1M input/output tokens —less than one-third of GLM-5’s cost. They also report GDPval-AA Elo 1494 , ahead of MiMo-V2-Pro (1426) , GLM-5 (1406) , and Kimi K2.5 (1283) , plus a large hallucination reduction vs M2.5. Distribution was immediate: Ollama cloud , Trae , Yupp , OpenRouter , Vercel , Zo , opencode , and kilocode . Xiaomi’s MiMo-V2-Pro looks like a serious Chinese API-only reasoning entrant : Artificial Analysis scores it at 49 on the Intelligence Index, with 1M context , $1/$3 per 1M tokens pricing, and GDPval-AA Elo 1426 . Notably, they call out stronger token efficiency than peers and a relatively favorable AA-Omniscience score (+5) driven by lower hallucination. This follows Xiaomi’s earlier open-weight MiMo-V2-Flash (309B total / 15B active, MIT) ; V2-Pro itself is API-only for now. Mamba-3 is out and immediately being viewed through the hybrid-architecture lens : Cartesia announced Mamba-3 as an SSM optimized for an inference-heavy world, with Albert Gu noting Cartesia-backed testing and support ( link ). Early technical reactions focused less on standalone SSMs and more on plugging Mamba-3 into transformer hybrids: rasbt explicitly called out replacing Gated DeltaNet in next-gen hybrids like Qwen3.5 / Kimi Linear , while JG_Barthelemy highlighted hybrid integration and “unlocking Muon for SSMs.” Agent harnesses, skills, MCP, and the shift from “prompting” to systems design The strongest recurring theme is that harness engineering is becoming the real differentiator : Multiple posts argued that the bottleneck is no longer just the base model, but the surrounding execution environment. The Turing Post’s interview with Michael Bolin frames coding agents as a problem of tools, repo legibility, constraints, and feedback loops —what many now call harness engineering. dbreunig made a similar point about why teams stick with DSPy , and nickbaumann_ argued GPT-5.4 mini matters specifically because cheap, fast subagents change what is worth delegating. Skills are solidifying into a shared abstraction across agent stacks : A practical thread from mstockton lays out real usage patterns for SKILLS : progressive disclosure, trace inspection, session distillation, CI-triggered skills, and self-improving skills. RhysSullivan suggests distributing skills via MCP resources may solve staleness/versioning. Anthropic’s Claude Code account clarifies that a skill is not just a text snippet but a folder with scripts/assets/data , and that the key description field should specify when to trigger it ( tweet ). Open agent stacks are converging on model + runtime + harness : Harrison Chase published a walkthrough framing Claude Code, OpenClaw, Manus, etc. as the same decomposition: open model + runtime + harness , using Nemotron 3 , NVIDIA’s OpenShell , and DeepAgents . Related infrastructure releases include LangSmith Sandboxes for secure code execution, LangSmith Polly GA as an in-product debugging/improvement assistant, and a new LangChain guide on production observability for agents . MCP momentum continues, but