🇺🇦 #NowPlaying on KEXP's #SundaySoul
Chuck Carbo:
🎵 Can I Be Your Squeeze
#ChuckCarbo
https://tuffcity.com/track/can-i-be-your-main-squeeze
https://open.spotify.com/track/4XSI9oimoA0oIPmpEOC6kZ
Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training
Anas Barakat, Souradip Chakraborty, Khushbu Pahwa, Amrit Singh Bedi
https://arxiv.org/abs/2602.21189 https://arxiv.org/pdf/2602.21189 https://arxiv.org/html/2602.21189
arXiv:2602.21189v1 Announce Type: new
Abstract: Pass@k is a widely used performance metric for verifiable large language model tasks, including mathematical reasoning, code generation, and short-answer reasoning. It defines success if any of $k$ independently sampled solutions passes a verifier. This multi-sample inference metric has motivated inference-aware fine-tuning methods that directly optimize pass@$k$. However, prior work reports a recurring trade-off: pass@k improves while pass@1 degrades under such methods. This trade-off is practically important because pass@1 often remains a hard operational constraint due to latency and cost budgets, imperfect verifier coverage, and the need for a reliable single-shot fallback. We study the origin of this trade-off and provide a theoretical characterization of when pass@k policy optimization can reduce pass@1 through gradient conflict induced by prompt interference. We show that pass@$k$ policy gradients can conflict with pass@1 gradients because pass@$k$ optimization implicitly reweights prompts toward low-success prompts; when these prompts are what we term negatively interfering, their upweighting can rotate the pass@k update direction away from the pass@1 direction. We illustrate our theoretical findings with large language model experiments on verifiable mathematical reasoning tasks.
toXiv_bot_toot
#LB Kkkkkk, IA em magia até faz sentido. Magia não passa de alucinação e a IA é especialista nisso kkkk.
https://laserdisc.party/@checkervest/115690683668534351
India-listed RRP Semiconductor's stock surged 55,000% in the 20 months through Dec. 17, despite negative revenue; source: India's SEBI is examining the surge (Chiranjivi Chakraborty/Bloomberg)
https://www.bloomberg.com/news/articles/20
New #ThingUmbrella example to create a parametric, grid layout-based calibration sheet for black and white photography development. The sheet includes different swatches and gradients to measure results/responses of different exposure times and developer solutions/processes. The sheet also includes a placeholder for a custom test image to be added later...
All sheet components are pa…
"While the rest of us headed into years of immiseration, the filthy rich carried on regardless – and they did so with the willing aid of the centre-left elite, whether Peter #Mandelson or the French Socialists or the US Democrats." -- Aditya Chakrabortty
#Kleptocracy
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[3/5]:
- Look-Ahead Reasoning on Learning Platforms
Haiqing Zhu, Tijana Zrnic, Celestine Mendler-D\"unner
https://arxiv.org/abs/2511.14745 https://mastoxiv.page/@arXiv_csLG_bot/115575981129228810
- Deep Gaussian Process Proximal Policy Optimization
Matthijs van der Lende, Juan Cardenas-Cartagena
https://arxiv.org/abs/2511.18214 https://mastoxiv.page/@arXiv_csLG_bot/115610315210502140
- Spectral Concentration at the Edge of Stability: Information Geometry of Kernel Associative Memory
Akira Tamamori
https://arxiv.org/abs/2511.23083 https://mastoxiv.page/@arXiv_csLG_bot/115644325602130493
- xGR: Efficient Generative Recommendation Serving at Scale
Sun, Liu, Zhang, Wu, Yang, Liang, Li, Ma, Liang, Ren, Zhang, Liu, Zhang, Qian, Yang
https://arxiv.org/abs/2512.11529 https://mastoxiv.page/@arXiv_csLG_bot/115723008170311172
- Credit Risk Estimation with Non-Financial Features: Evidence from a Synthetic Istanbul Dataset
Atalay Denknalbant, Emre Sezdi, Zeki Furkan Kutlu, Polat Goktas
https://arxiv.org/abs/2512.12783 https://mastoxiv.page/@arXiv_csLG_bot/115729287232895097
- The Semantic Illusion: Certified Limits of Embedding-Based Hallucination Detection in RAG Systems
Debu Sinha
https://arxiv.org/abs/2512.15068 https://mastoxiv.page/@arXiv_csLG_bot/115740048142898391
- Towards Reproducibility in Predictive Process Mining: SPICE -- A Deep Learning Library
Stritzel, H\"uhnerbein, Rauch, Zarate, Fleischmann, Buck, Lischka, Frey
https://arxiv.org/abs/2512.16715 https://mastoxiv.page/@arXiv_csLG_bot/115745910810427061
- Differentially private Bayesian tests
Abhisek Chakraborty, Saptati Datta
https://arxiv.org/abs/2401.15502 https://mastoxiv.page/@arXiv_statML_bot/111843467510507382
- SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning
Paul Mangold, Sergey Samsonov, Safwan Labbi, Ilya Levin, Reda Alami, Alexey Naumov, Eric Moulines
https://arxiv.org/abs/2402.04114
- Adjusting Model Size in Continual Gaussian Processes: How Big is Big Enough?
Guiomar Pescador-Barrios, Sarah Filippi, Mark van der Wilk
https://arxiv.org/abs/2408.07588 https://mastoxiv.page/@arXiv_statML_bot/112965266196097314
- Non-Perturbative Trivializing Flows for Lattice Gauge Theories
Mathis Gerdes, Pim de Haan, Roberto Bondesan, Miranda C. N. Cheng
https://arxiv.org/abs/2410.13161 https://mastoxiv.page/@arXiv_heplat_bot/113327593338897860
- Dynamic PET Image Prediction Using a Network Combining Reversible and Irreversible Modules
Sun, Zhang, Xia, Sun, Chen, Yang, Liu, Zhu, Liu
https://arxiv.org/abs/2410.22674 https://mastoxiv.page/@arXiv_eessIV_bot/113401026110345647
- Targeted Learning for Variable Importance
Xiaohan Wang, Yunzhe Zhou, Giles Hooker
https://arxiv.org/abs/2411.02221 https://mastoxiv.page/@arXiv_statML_bot/113429912435819479
- Refined Analysis of Federated Averaging and Federated Richardson-Romberg
Paul Mangold, Alain Durmus, Aymeric Dieuleveut, Sergey Samsonov, Eric Moulines
https://arxiv.org/abs/2412.01389 https://mastoxiv.page/@arXiv_statML_bot/113588027268311334
- Embedding-Driven Data Distillation for 360-Degree IQA With Residual-Aware Refinement
Abderrezzaq Sendjasni, Seif-Eddine Benkabou, Mohamed-Chaker Larabi
https://arxiv.org/abs/2412.12667 https://mastoxiv.page/@arXiv_csCV_bot/113672538318570349
- 3D Cell Oversegmentation Correction via Geo-Wasserstein Divergence
Peter Chen, Bryan Chang, Olivia A Creasey, Julie Beth Sneddon, Zev J Gartner, Yining Liu
https://arxiv.org/abs/2502.01890 https://mastoxiv.page/@arXiv_csCV_bot/113949981686723660
- DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents
Shashank Sharma, Janina Hoffmann, Vinay Namboodiri
https://arxiv.org/abs/2502.01956 https://mastoxiv.page/@arXiv_csRO_bot/113949997485625086
- Foundation for unbiased cross-validation of spatio-temporal models for species distribution modeling
Diana Koldasbayeva, Alexey Zaytsev
https://arxiv.org/abs/2502.03480
- GraphCompNet: A Position-Aware Model for Predicting and Compensating Shape Deviations in 3D Printing
Juheon Lee (Rachel), Lei (Rachel), Chen, Juan Carlos Catana, Hui Wang, Jun Zeng
https://arxiv.org/abs/2502.09652 https://mastoxiv.page/@arXiv_csCV_bot/114017924551186136
- LookAhead Tuning: Safer Language Models via Partial Answer Previews
Liu, Wang, Luo, Yuan, Sun, Liang, Zhang, Zhou, Hooi, Deng
https://arxiv.org/abs/2503.19041 https://mastoxiv.page/@arXiv_csCL_bot/114227502448008352
- Constraint-based causal discovery with tiered background knowledge and latent variables in single...
Christine W. Bang, Vanessa Didelez
https://arxiv.org/abs/2503.21526 https://mastoxiv.page/@arXiv_statML_bot/114238919468512990
toXiv_bot_toot
Made new test prints on some off-cuts, using a slightly stronger developer solution than usual to see impact on max. depth. The main image (Eagle Creek, Oregon) is using 18% sodium acetate (curve corrected negative), the test strips are of 20% and 15% solutions (both uncorrected). The phone capture doesn't really show the differences too well, but I think I will go for the 18-20% from now on...
(Btw. The original image is here: