#llm-benchmarks

Kimi K2.7-Code: Open Weights, First-Party Numbers (2026)

June 13, 2026

Moonshot's Kimi K2.7-Code is a 1T open-weight coding model with cheap API pricing, but every launch benchmark is first-party. Here's what's actually verified.

#Kimi K2.7-Code #Moonshot AI

Claude Fable 5: Anthropic's Mythos-Class Model (2026)

June 10, 2026

Claude Fable 5 is Anthropic's most capable public model yet — a Mythos-class system at $10/$50 that quietly falls back to Claude Opus 4.8 on risky prompts.

#claude fable 5 #claude mythos 5

Nex-N2-Pro: Open-Weight Coder vs GPT-5.5 (2026)

June 10, 2026

Nex-N2-Pro is a free, open-weight 397B MoE coding model from Nex AGI, built on Qwen3.5. We check its benchmarks against GPT-5.5, Opus, and rival open models.

#nex-n2-pro #nex agi

MiniMax M3: Open-Weight Coding at 1/10 the Cost (2026)

June 9, 2026

MiniMax M3 is an open-weight coding model whose Sparse Attention runs 1M-token context at 1/20th the compute. Its benchmarks beat GPT-5.5 — with caveats.

#minimax m3 #minimax

DeepSeek V4: Open-Weight Frontier at 1/7 the Cost

May 2, 2026

DeepSeek V4 ships 1.6T MoE open weights with 1M-token context: 80.6% on SWE-bench Verified at $1.74/$3.48 per million — roughly 1/7 the output cost of Claude Opus 4.7.

#DeepSeek V4 #DeepSeek V4 Pro

GPT-5.5: OpenAI's First Retrained Base Since GPT-4.5

April 24, 2026

OpenAI released GPT-5.5 on April 23, 2026 — the first fully retrained base since GPT-4.5. Benchmarks, $5/$30 API pricing, 1M context, and Opus 4.7 compared.

#GPT-5.5 #GPT-5.5 Pro

Meta Muse Spark: Benchmarks and Strategy (2026)

April 9, 2026

Meta Muse Spark is MSL's first proprietary model, with top health and science benchmarks but coding gaps. Modes, scores, and what developers should know.

#Meta AI #Muse Spark

GPT-5.4 Beats Humans at Computer Use: What It Means

April 5, 2026

GPT-5.4 scores 75% on OSWorld, surpassing human experts at desktop tasks. What this means for AI agents, enterprise workflows, and the competition in 2026.

#GPT-5.4 #AI agents

GLM-4.7 Deep Dive: 355B MoE, 200K Context, $0.60/M Tokens

March 8, 2026

Zhipu AI's GLM-4.7 explained: 355B MoE architecture, 200K-token context, multimodal inputs, and $0.60 in / $2.20 out per million tokens on Z.ai.

#GLM‑4 #Zhipu AI