
2025-07-23 11:45:32
🔧 Comparable performance to #ClaudeSONnet4 in agentic browser-use and tool-use tasks
💾 Trained on 7.5T tokens with 70% code ratio while preserving general and math abilities
⚡ Long-horizon reinforcement learning using 20,000 parallel environments on #AlibabaCloud infrastru…