Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design
Bin Zhu, Qianghuai Jia, Tian Lan, Junyang Ren, Feng Gu, Feihu Jiang, Longyue Wang, Zhao Xu, Weihua Luo
https://arxiv.org/abs/2603.28376 https://arxiv.org/pdf/2603.28376 https://arxiv.org/html/2603.28376
arXiv:2603.28376v1 Announce Type: new
Abstract: Deep research agents autonomously conduct open-ended investigations, integrating complex information retrieval with multi-step reasoning across diverse sources to solve real-world problems. To sustain this capability on long-horizon tasks, reliable verification is critical during both training and inference. A major bottleneck in existing paradigms stems from the lack of explicit verification mechanisms in QA data synthesis, trajectory construction, and test-time scaling. Errors introduced at each stage propagate downstream and degrade the overall agent performance. To address this, we present Marco DeepResearch, a deep research agent optimized with a verification-centric framework design at three levels: \textbf{(1)~QA Data Synthesis:} We introduce verification mechanisms to graph-based and agent-based QA synthesis to control question difficulty while ensuring answers are unique and correct; \textbf{(2)~Trajectory Construction:} We design a verification-driven trajectory synthesis method that injects explicit verification patterns into training trajectories; and \textbf{(3)~Test-time scaling:} We use Marco DeepResearch itself as a verifier at inference time and effectively improve performance on challenging questions. Extensive experimental results demonstrate that our proposed Marco DeepResearch agent significantly outperforms 8B-scale deep research agents on most challenging benchmarks, such as BrowseComp and BrowseComp-ZH. Crucially, under a maximum budget of 600 tool calls, Marco DeepResearch even surpasses or approaches several 30B-scale agents, like Tongyi DeepResearch-30B.
toXiv_bot_toot
Crosslisted article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[3/3]:
- Functional Continuous Decomposition
Teymur Aghayev
https://arxiv.org/abs/2602.20857 https://mastoxiv.page/@arXiv_eessSP_bot/116130499236089653
- SpatiaLQA: A Benchmark for Evaluating Spatial Logical Reasoning in Vision-Language Models
Xie, Zhang, Shan, Zhu, Tang, Wei, Song, Wan, Song
https://arxiv.org/abs/2602.20901 https://mastoxiv.page/@arXiv_csCV_bot/116130845273808954
- Some Simple Economics of AGI
Christian Catalini, Xiang Hui, Jane Wu
https://arxiv.org/abs/2602.20946 https://mastoxiv.page/@arXiv_econGN_bot/116130470423837005
- Multimodal MRI Report Findings Supervised Brain Lesion Segmentation with Substructures
Yubin Ge, Yongsong Huang, Xiaofeng Liu
https://arxiv.org/abs/2602.20994 https://mastoxiv.page/@arXiv_eessIV_bot/116130212832138624
- MIP Candy: A Modular PyTorch Framework for Medical Image Processing
Tianhao Fu, Yucheng Chen
https://arxiv.org/abs/2602.21033 https://mastoxiv.page/@arXiv_csCV_bot/116130864279556063
- Empirically Calibrated Conditional Independence Tests
Milleno Pan, Antoine de Mathelin, Wesley Tansey
https://arxiv.org/abs/2602.21036 https://mastoxiv.page/@arXiv_statME_bot/116130690605113562
- Is Multi-Distribution Learning as Easy as PAC Learning: Sharp Rates with Bounded Label Noise
Rafael Hanashiro, Abhishek Shetty, Patrick Jaillet
https://arxiv.org/abs/2602.21039 https://mastoxiv.page/@arXiv_statML_bot/116130572661848449
- Position-Aware Sequential Attention for Accurate Next Item Recommendations
Timur Nabiev, Evgeny Frolov
https://arxiv.org/abs/2602.21052 https://mastoxiv.page/@arXiv_csIR_bot/116130263323086316
- Motivation is Something You Need
Mehdi Acheli, Walid Gaaloul
https://arxiv.org/abs/2602.21064 https://mastoxiv.page/@arXiv_csAI_bot/116130680774678580
- An Enhanced Projection Pursuit Tree Classifier with Visual Methods for Assessing Algorithmic Impr...
Natalia da Silva, Dianne Cook, Eun-Kyung Lee
https://arxiv.org/abs/2602.21130 https://mastoxiv.page/@arXiv_statML_bot/116130610674573081
- Complexity of Classical Acceleration for $\ell_1$-Regularized PageRank
Kimon Fountoulakis, David Mart\'inez-Rubio
https://arxiv.org/abs/2602.21138 https://mastoxiv.page/@arXiv_mathOC_bot/116130547076073836
- LUMEN: Longitudinal Multi-Modal Radiology Model for Prognosis and Diagnosis
Jiang, Yang, Nath, Parida, Kulkarni, Xu, Xu, Anwar, Roth, Linguraru
https://arxiv.org/abs/2602.21142 https://mastoxiv.page/@arXiv_csCV_bot/116130871488694585
- A Benchmark for Deep Information Synthesis
Debjit Paul, et al.
https://arxiv.org/abs/2602.21143 https://mastoxiv.page/@arXiv_csAI_bot/116130692571594706
- Scaling State-Space Models on Multiple GPUs with Tensor Parallelism
Anurag Dutt, Nimit Shah, Hazem Masarani, Anshul Gandhi
https://arxiv.org/abs/2602.21144 https://mastoxiv.page/@arXiv_csDC_bot/116130520888343997
- Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions
Mame Diarra Toure, David A. Stephens
https://arxiv.org/abs/2602.21160 https://mastoxiv.page/@arXiv_statML_bot/116130618512594211
- Aletheia tackles FirstProof autonomously
Tony Feng, et al.
https://arxiv.org/abs/2602.21201 https://mastoxiv.page/@arXiv_csAI_bot/116130705679345625
- Squint: Fast Visual Reinforcement Learning for Sim-to-Real Robotics
Abdulaziz Almuzairee, Henrik I. Christensen
https://arxiv.org/abs/2602.21203 https://mastoxiv.page/@arXiv_csRO_bot/116130765974498223
toXiv_bot_toot