If you have struggled to keep up with open-weight releases this past month, here's an overview of the main ones in chronological order with a focus on the architecture designs.
1) Arcee AI Trinity Large (Jan 27)
Arcee's Trinity line came out of nowhere for many of us. It's a 400B MoE (13B active), plus two smaller smaller variants. Architecturally, it combines familiar ingredients: Mixture-of-Experts (MoE) + Grouped Query Attention (GQA) + Sliding Window Attention (SWA).
2) Moonshot Kimi K2.5 (Jan 27)
Similar to Kimi K2, Kimi K2.5 is one of the largest open-weight models at 1 trillion parameters. Also similar to Kimi K2, it follows a DeepSeek-like template.
3) StepFun Step 3.5 Flash (Feb 1)
Step 3.5 Flash mainly focuses on performance-throughput balance. Overall, it is somewhat to Arcee Trinity in that it uses GQA + SWA, but it's 2x smaller.
4) Qwen3-Coder-Next (Feb 3)
Qwen3-Coder-Next continues Qwen's focus on hybrid-attention for coding models. The architecture (and size) is similar to the previous 80B Qwen3-Next model, except it's fine-tuned for coding contexts.
5) z.AI GLM-5 (Feb 12)
GLM-5 is a flagship-scale release that stays close to the current “large MoE + efficiency tweaks” consensus. It also includes DeepSceek-inspired choices (for example, MLA and DeepSeek Sparse Attention).
6) MiniMax M2.5 (Feb 12)
MiniMax M2.5 is especially interesting because it is strong but staying mostly very classic with plain GQA.
7) Nanbeige 4.1 3B (Feb 13)
Nanbeige 4.1 3B is one of the interesting “small” models of this release wave. It's essentially a Llama 3-style model (similar to the Qwen3 dense models).
8) Qwen3.5 (Feb 15)
Qwen3.5 (397B-A17B) is interesting because it looks like that the Qwen team now also adopts the hybrid attention mechanism into their main (non-Next) line.
9) Ant Group Ling 2.5 / Ring 2.5 (Feb 16)
