2026-01-01 16:15:29
DeepSeek researchers detail a new mHC architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden (Vincent Chow/South China Morning Post)
https://www.scmp.com/tech/big-tech/article






















