Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.

Even as Integrity, the mission moniker for the Orion capsule of Artemis II, ascended into the heavens days ago,
Donald Trump was announcing his intention to slash Nasa’s budget by 23%,
including a 46% cut for space science initiatives.
And the Artemis program that has run years behind schedule and billions of dollars over budget offers no guarantees that the next, far harder stages will run as smoothly.
“The path to the lunar surface is open, but the work ahead is gr…

@relcfp@mastodon.social
2026-02-18 06:25:25

Doctoral school “Anyue Buddhist Sites: Iconography, Inscriptions, and Fieldwork Methodologies”, April 20-24, 2026, Ghent University
ift.tt/LzPhUt4
Dissertation Fellowship, U.S. Army Center of Military History The U.S. Army Center of…
via Input 4 RELCFP

@raysofred@discordian.social
2026-04-15 19:41:19

#zentangle #art #circle #artist

zentangle circle
@arXiv_csOS_bot@mastoxiv.page
2026-02-04 07:41:57

ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation
Shihao Wang, Jiahao Chen, Yanqi Pan, Hao Huang, Yichen Hao, Xiangyu Zou, Wen Xia, Wentao Zhang, Haitao Wang, Junhong Li, Chongyang Qiu, Pengfei Wang
arxiv.org/abs/2602.02579 arxiv.org/pdf/2602.02579 arxiv.org/html/2602.02579
arXiv:2602.02579v1 Announce Type: new
Abstract: The prefill stage of long-context Retrieval-Augmented Generation (RAG) is severely bottlenecked by computational overhead. To mitigate this, recent methods assemble pre-calculated KV caches of retrieved RAG documents (by a user query) and reprocess selected tokens to recover cross-attention between these pre-calculated KV caches. However, we identify a fundamental "crowding-out effect" in current token selection criteria: globally salient but user-query-irrelevant tokens saturate the limited recomputation budget, displacing the tokens truly essential for answering the user query and degrading inference accuracy.
We propose ProphetKV, a user-query-driven KV Cache reuse method for RAG scenarios. ProphetKV dynamically prioritizes tokens based on their semantic relevance to the user query and employs a dual-stage recomputation pipeline to fuse layer-wise attention metrics into a high-utility set. By ensuring the recomputation budget is dedicated to bridging the informational gap between retrieved context and the user query, ProphetKV achieves high-fidelity attention recovery with minimal overhead. Our extensive evaluation results show that ProphetKV retains 96%-101% of full-prefill accuracy with only a 20% recomputation ratio, while achieving accuracy improvements of 8.8%-24.9% on RULER and 18.6%-50.9% on LongBench over the state-of-the-art approaches (e.g., CacheBlend, EPIC, and KVShare).
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:44:51

Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training
Anas Barakat, Souradip Chakraborty, Khushbu Pahwa, Amrit Singh Bedi
arxiv.org/abs/2602.21189 arxiv.org/pdf/2602.21189 arxiv.org/html/2602.21189
arXiv:2602.21189v1 Announce Type: new
Abstract: Pass@k is a widely used performance metric for verifiable large language model tasks, including mathematical reasoning, code generation, and short-answer reasoning. It defines success if any of $k$ independently sampled solutions passes a verifier. This multi-sample inference metric has motivated inference-aware fine-tuning methods that directly optimize pass@$k$. However, prior work reports a recurring trade-off: pass@k improves while pass@1 degrades under such methods. This trade-off is practically important because pass@1 often remains a hard operational constraint due to latency and cost budgets, imperfect verifier coverage, and the need for a reliable single-shot fallback. We study the origin of this trade-off and provide a theoretical characterization of when pass@k policy optimization can reduce pass@1 through gradient conflict induced by prompt interference. We show that pass@$k$ policy gradients can conflict with pass@1 gradients because pass@$k$ optimization implicitly reweights prompts toward low-success prompts; when these prompts are what we term negatively interfering, their upweighting can rotate the pass@k update direction away from the pass@1 direction. We illustrate our theoretical findings with large language model experiments on verifiable mathematical reasoning tasks.
toXiv_bot_toot

@arXiv_csCL_bot@mastoxiv.page
2026-03-31 10:10:07

Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design
Bin Zhu, Qianghuai Jia, Tian Lan, Junyang Ren, Feng Gu, Feihu Jiang, Longyue Wang, Zhao Xu, Weihua Luo
arxiv.org/abs/2603.28376 arxiv.org/pdf/2603.28376 arxiv.org/html/2603.28376
arXiv:2603.28376v1 Announce Type: new
Abstract: Deep research agents autonomously conduct open-ended investigations, integrating complex information retrieval with multi-step reasoning across diverse sources to solve real-world problems. To sustain this capability on long-horizon tasks, reliable verification is critical during both training and inference. A major bottleneck in existing paradigms stems from the lack of explicit verification mechanisms in QA data synthesis, trajectory construction, and test-time scaling. Errors introduced at each stage propagate downstream and degrade the overall agent performance. To address this, we present Marco DeepResearch, a deep research agent optimized with a verification-centric framework design at three levels: \textbf{(1)~QA Data Synthesis:} We introduce verification mechanisms to graph-based and agent-based QA synthesis to control question difficulty while ensuring answers are unique and correct; \textbf{(2)~Trajectory Construction:} We design a verification-driven trajectory synthesis method that injects explicit verification patterns into training trajectories; and \textbf{(3)~Test-time scaling:} We use Marco DeepResearch itself as a verifier at inference time and effectively improve performance on challenging questions. Extensive experimental results demonstrate that our proposed Marco DeepResearch agent significantly outperforms 8B-scale deep research agents on most challenging benchmarks, such as BrowseComp and BrowseComp-ZH. Crucially, under a maximum budget of 600 tool calls, Marco DeepResearch even surpasses or approaches several 30B-scale agents, like Tongyi DeepResearch-30B.
toXiv_bot_toot