Tootfinder

@arXiv_csCL_bot@mastoxiv.page
2025-09-11 09:45:53

LLM Ensemble for RAG: Role of Context Length in Zero-Shot Question Answering for BioASQ Challenge
Dima Galat, Diego Molla-Aliod
https://arxiv.org/abs/2509.08596 https://

LLM Ensemble for RAG: Role of Context Length in Zero-Shot Question Answering for BioASQ Challenge
Biomedical question answering (QA) poses significant challenges due to the need for precise interpretation of specialized knowledge drawn from a vast, complex, and rapidly evolving corpus. In this work, we explore how large language models (LLMs) can be used for information retrieval (IR), and an ensemble of zero-shot models can accomplish state-of-the-art performance on a domain-specific Yes/No QA task. Evaluating our approach on the BioASQ challenge tasks, we show that ensembles can outperfor…

@philip@mastodon.mallegolhansen.com
2025-09-12 00:51:15

@… Haha, I appreciate that.
It’s a good question. My mind immediately *wants* to answer all the ones you didn’t ask (Taking guns away from everyone, getting to pick and choose who gets them, etc.)
But the question as (purposefully I’m sure) posed, is a tough one.

@cowboys@darktundra.xyz
2025-09-09 20:29:41

Cowboys starter gets angry at reporters over Micah Parson question https://www.sportingnews.com/us/ncaa-football/penn-state/news/cowboys-starter-angry-reporters-over-micah-parson-question/80f3074307cb…

Cowboys starter gets angry at reporters over Micah Parson question
Dallas Cowboys starter expressed frustration at reporters when asked about Micah Parsons following his recent trade to the Green Bay Packers.

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:16:09

Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
Sharut Gupta, Shobhita Sundaram, Chenyu Wang, Stefanie Jegelka, Phillip Isola
https://arxiv.org/abs/2510.08492

Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
Traditional multimodal learners find unified representations for tasks like visual question answering, but rely heavily on paired datasets. However, an overlooked yet potentially powerful question is: can one leverage auxiliary unpaired multimodal data to directly enhance representation learning in a target modality? We introduce UML: Unpaired Multimodal Learner, a modality-agnostic training paradigm in which a single model alternately processes inputs from different modalities while sharing pa…

@Techmeme@techhub.social
2025-10-11 19:45:54

Interviews with security researchers about AI's potential for large-scale destruction, as experts remain divided and global regulatory frameworks lag (Stephen Witt/New York Times)
https://www.

Opinion | The A.I. Prompt That Could End the World
A destructive A.I., like a nuclear bomb, is now a concrete possibility; the question is whether anyone will be reckless enough to build one.

@eglassman@hci.social
2025-09-11 04:18:25

Sincere question for the HCI community (and all other international research communities who disseminate research primarily at conferences):
1. When any conference is in the US, will international folks risk coming? I already know prominent folks who say they won't.
2. When any conference is outside the US, will any international students within the states risk going? My PhD student has been advised not to, so I'm giving his talk overseas for him.
How will we all …

@arXiv_statML_bot@mastoxiv.page
2025-10-10 09:27:09

High-dimensional Analysis of Synthetic Data Selection
Parham Rezaei, Filip Kovacevic, Francesco Locatello, Marco Mondelli
https://arxiv.org/abs/2510.08123 https://

High-dimensional Analysis of Synthetic Data Selection
Despite the progress in the development of generative models, their usefulness in creating synthetic data that improve prediction performance of classifiers has been put into question. Besides heuristic principles such as "synthetic data should be close to the real data distribution", it is actually not clear which specific properties affect the generalization error. Our paper addresses this question through the lens of high-dimensional regression. Theoretically, we show that, for linear models…

@arXiv_csCV_bot@mastoxiv.page
2025-10-09 10:13:31

StaR-KVQA: Structured Reasoning Traces for Implicit-Knowledge Visual Question Answering
Zhihao Wen, Wenkang Wei, Yuan Fang, Xingtong Yu, Hui Zhang, Weicheng Zhu, Xin Zhang
https://arxiv.org/abs/2510.06638

StaR-KVQA: Structured Reasoning Traces for Implicit-Knowledge Visual Question Answering
Knowledge-based Visual Question Answering (KVQA) requires models to ground entities in images and reason over factual knowledge. We study its implicit-knowledge variant, IK-KVQA, where a multimodal large language model (MLLM) is the sole knowledge source, without external retrieval. Yet, MLLMs lack explicit reasoning supervision and produce inconsistent justifications, and generalize poorly after standard supervised fine-tuning (SFT). We present StaR-KVQA (Structured Reasoning Traces for IK-KVQ…

@kurtsh@mastodon.social
2025-09-11 05:08:27

What a cool trivia question from the Lord of the Rings...
▶️ There was Another Way to Destroy the Ring and Defeat Sauron
https://youtube.com/watch?v=Qhnc8TbUrKM&si=gqLtw6pBQcg4byg7

@egallager@social.treehouse.systems
2025-09-11 19:36:11

Unix question: is there a version of seq(1) for letters instead of numbers?

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 10:20:11

Avoiding Knowledge Edit Skipping in Multi-hop Question Answering with Guided Decomposition
Yi Liu, Xiangrong Zhu, Xiangyu Liu, Wei Wei, Wei Hu
https://arxiv.org/abs/2509.07555 h…

Avoiding Knowledge Edit Skipping in Multi-hop Question Answering with Guided Decomposition
In a rapidly evolving world where information updates swiftly, knowledge in large language models (LLMs) becomes outdated quickly. Retraining LLMs is not a cost-effective option, making knowledge editing (KE) without modifying parameters particularly necessary. We find that although existing retrieval-augmented generation (RAG)-based KE methods excel at editing simple knowledge, they struggle with KE in multi-hop question answering due to the issue of "edit skipping", which refers to skipping t…

@arXiv_statME_bot@mastoxiv.page
2025-09-11 08:33:23

Generative AI as a Safety Net for Survey Question Refinement
Erica Ann Metheney, Lauren Yehle
https://arxiv.org/abs/2509.08702 https://arxiv.org/pdf/2509.0…

Generative AI as a Safety Net for Survey Question Refinement
Writing survey questions that easily and accurately convey their intent to a variety of respondents is a demanding and high-stakes task. Despite the extensive literature on best practices, the number of considerations to keep in mind is vast and even small errors can render collected data unusable for its intended purpose. The process of drafting initial questions, checking for known sources of error, and developing solutions to those problems requires considerable time, expertise, and financia…

@saraislet@infosec.exchange
2025-09-11 00:02:50

#AdviceRequested!
We want to buy an electric car! It's exciting but also daunting to make car buying decisions, and harder to evaluate with electric than it was for gas.
Safety and reliability are the highest priorities — which was easier to evaluate with models like the Honda Civic that's been around for decades
Lucid looks really nice, but I question the relia…

@brian_gettler@mas.to
2025-11-10 13:20:24

Dear author,
Thank you for submitting your article manuscript. While we trust that many in the research community would welcome a study of vitamin, drug, and disease interaction as a timely intervention, we question its suitability for publication in a history journal.
The editors

@hex@kolektiva.social
2025-10-11 18:28:48

Here's your regular reminder:
There is no debate over if cars will or will not be part of the future. They will not. They are a luxury we can no longer afford. The question is only if we will choose to rid our future of cars, or allow cars to rid us of our future.
#FuckCars

@frankel@mastodon.top
2025-11-11 09:30:05

Monorepo vs Multi-repo vs #Git submodule vs Git Subtree: A Complete Guide for Developers
https://levelup.gitcon…

Monorepo vs Multi-repo vs Git submodule vs Git Subtree: A Complete Guide for Developers
As projects grow, one question quietly becomes a big one:

@cdamian@rls.social
2025-10-11 15:38:00

‘It’s a question of humanity’: how a small Spanish town made headlines over its immigration stance | Spain | The Guardian
https://www.theguardian.com/world/2025/oct/11/small-spanish-town-headlines-immigration-villamalea
> …

@akosma@mastodon.online
2025-10-10 18:46:55

"Chatbots are turning on the flattery, patience, and support. Microsoft AI CEO Mustafa Suleyman said the “cool thing” about the company’s AI personal assistant is that it doesn’t “judge you for asking a stupid question.” It exhibits “kindness and empathy.” Here’s the rub: We need people to judge us. We need people to call us out for making stupid statements. Friction and conflict are key to developing resilience and learning how to function in society."

Love Algorithmically | No Mercy / No Malice
For six hours, my AI avatar roamed the Earth. I receive 20 to 30 thoughtful emails a day asking for professional and investment advice. I can only answer a fraction of them. One of my former graduate student instructors, now at Google, approached me with a solution. The Google Labs project ingested my podcasts, newsletters, […]

@arXiv_csCY_bot@mastoxiv.page
2025-10-09 08:46:51

Asking For It: Question-Answering for Predicting Rule Infractions in Online Content Moderation
Mattia Samory, Diana Pamfile, Andrew To, Shruti Phadke
https://arxiv.org/abs/2510.06350

Asking For It: Question-Answering for Predicting Rule Infractions in Online Content Moderation
Online communities rely on a mix of platform policies and community-authored rules to define acceptable behavior and maintain order. However, these rules vary widely across communities, evolve over time, and are enforced inconsistently, posing challenges for transparency, governance, and automation. In this paper, we model the relationship between rules and their enforcement at scale, introducing ModQ, a novel question-answering framework for rule-sensitive content moderation. Unlike prior clas…

@sascha_wolfer@fediscience.org
2025-10-10 06:05:44

One other thing, while we don't claim that our mixed-effects logit model is the perfect way to account for non-independence between languages, we don't think it's correct, as Xia & Lindell assert, to just claim that our results are "counterintuitive", the fix-eff estimates are "unreliable" and that the high model fits are "unrealistic." Whether a mix model better captures the data-generat. process is ultimately an empirical question, not one to be decided by assertion. Take, for instance, our finding that once random effects for either subregion or language family are included, the estimated effect of L1_population reverses direction—from the negative value reported by Xia & Lindell et al. to a positive one.

@NFL@darktundra.xyz
2025-10-10 19:11:32

NFL Week 6 injury report: Bengals' Ja'Marr Chase questionable vs. Packers due to illness

https://www.cbssports.com/nfl/news/nfl-week-6-injury-report-updates-tracker/

NFL Week 6 injury report: Jalen Carter's status in question for Eagles; Giants shorthanded at WR
Injuries are piling up around the league, with playoff hopefuls monitoring key starters ahead of this weekend's slate

@BBC3MusicBot@mastodonapp.uk
2025-10-10 20:45:50

🔊 #NowPlaying on #BBCRadio3:
#TheEssay
- The Meaning and Magic of Music
How does music convey meaning to the listener? Catherine Coldstream examines this question in the context of classical music and religious faith.
Relisten now 👇
https://www.bbc.co.uk/programmes/m0029pvm

@arXiv_mathDS_bot@mastoxiv.page
2025-10-10 09:00:19

Stability with respect to periodic switching laws does not imply global stability under arbitrary switching
Ian D. Morris
https://arxiv.org/abs/2510.08074 https://

Stability with respect to periodic switching laws does not imply global stability under arbitrary switching
R. Shorten, F. Wirth, O. Mason, K. Wulff and C. King have asked whether a linear switched system is guaranteed to be globally uniformly stable under arbitrary switching if it is known that every trajectory induced by a periodic switching law converges exponentially to the origin. Positive answers to this question have previously been announced for linear switched systems of order two and three. We answer this question negatively in all higher orders by constructing a fourth-order linear switche…

@kexpmusicbot@mastodonapp.uk
2025-09-08 06:23:37

🇺🇦 #NowPlaying on KEXP's #Expansions
A.B.O.:
🎵 This Question
#ABO
https://deepclicks.bandcamp.com/track/this-question-original-mix

@ThatHoarder@mastodon.online
2025-11-10 09:34:03

It's a question that comes up for a lot of us, so in the latest episode I talk about potential ways to cope with it.
Find the podcast by searching for That Hoarder: Overcome Compulsive Hoarding podcast in your podcast player.
#hoarding #hoardingdisorder

@raiders@darktundra.xyz
2025-11-06 16:28:05

The Biggest Question Facing Each NFL Team in the Second Half of 2025 https://www.foxsports.com/stories/nfl/biggest-question-facing-each-nfl-team-second-half-2025

The Biggest Question Facing Each NFL Team in the Second Half of 2025
The NFL has hit the mid-point of the 2025 season, and every team has at least one huge question they'll need to answer in the second half.

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 10:02:41

The Role of Exploration Modules in Small Language Models for Knowledge Graph Question Answering
Yi-Jie Cheng, Oscar Chew, Yun-Nung Chen
https://arxiv.org/abs/2509.07399 https://…

The Role of Exploration Modules in Small Language Models for Knowledge Graph Question Answering
Integrating knowledge graphs (KGs) into the reasoning processes of large language models (LLMs) has emerged as a promising approach to mitigate hallucination. However, existing work in this area often relies on proprietary or extremely large models, limiting accessibility and scalability. In this study, we investigate the capabilities of existing integration methods for small language models (SLMs) in KG-based question answering and observe that their performance is often constrained by their l…

@bobmueller@mastodon.world
2025-10-10 22:00:06

An interesting piece about the death and life of Edgar Allan Poe. #taphephobia
https://lithub.com/to-haunt-and-be-haunted-on-the-exhumation-of-edgar-allen-po…

To Haunt and Be Haunted: On the Exhumation of Edgar Allen Poe
Imagine that you’ve been buried alive. Don’t question the specifics, whether you’ve been kidnapped or unfortunate enough to fall victim to poorly trained pathologists and morticians, for either way…

@arXiv_csAI_bot@mastoxiv.page
2025-10-10 10:18:29

VoiceAgentBench: Are Voice Assistants ready for agentic tasks?
Dhruv Jain, Harshit Shukla, Gautam Rajeev, Ashish Kulkarni, Chandra Khatri, Shubham Agarwal
https://arxiv.org/abs/2510.07978

VoiceAgentBench: Are Voice Assistants ready for agentic tasks?
Large-scale Speech Language Models (SpeechLMs) have enabled voice assistants capable of understanding natural spoken queries and performing complex tasks. However, existing speech benchmarks primarily focus on isolated capabilities such as transcription, or question-answering, and do not systematically evaluate agentic scenarios encompassing multilingual and cultural understanding, as well as adversarial robustness. To address this, we introduce VoiceAgentBench, a comprehensive benchmark design…

@tante@tldr.nettime.org
2025-10-09 21:20:05

"In the end, we are defined not just by our actions, but by the actions we tolerate."
Mike Monteiro with another banger
(Original title: How to eat with others)
https://buttondown.com/monteiro/archive/how-to-eat-with-others/

How to eat with others
This is the last painting I did at my old studio. I left it behind. You can support my shenanigans for a mere $2/mo. This week’s question comes to us from...

@arXiv_physicsdataan_bot@mastoxiv.page
2025-09-10 08:32:01

Revisiting the Question of Information Content of EXAFS Spectra through a Bayesian Approach
Lucy Haddad, Diego Gianolio, Andrei Sapelkin
https://arxiv.org/abs/2509.07950 https:/…

Revisiting the Question of Information Content of EXAFS Spectra through a Bayesian Approach
Over the last several decades the Shannon-Nyquist criterion has been widely used as a measure of the maximum information content in EXAFS spectra and provided an upper limit on the number of parameters used in fitting data. However, the criterion implicitly assumes independent parameters which is never the case in EXAFS analysis. Here we introduce a new criterion to measure the information content in EXAFS based on Bayesian approach that lifts the above condition. We test the new criterion by f…

@memeorandum@universeodon.com
2025-11-06 12:51:04

The question Mamdani won't answer (Politico)
https://www.politico.com/newsletters/playbook/2025/11/06/the-question-mamdani-wont-answer-00639459
http://www.memeorandum.com/251106/p8#a251106p8

@arXiv_csRO_bot@mastoxiv.page
2025-09-10 10:16:41

Temporal Counterfactual Explanations of Behaviour Tree Decisions
Tamlin Love, Antonio Andriella, Guillem Aleny\`a
https://arxiv.org/abs/2509.07674 https://…

Temporal Counterfactual Explanations of Behaviour Tree Decisions
Explainability is a critical tool in helping stakeholders understand robots. In particular, the ability for robots to explain why they have made a particular decision or behaved in a certain way is useful in this regard. Behaviour trees are a popular framework for controlling the decision-making of robots and other software systems, and thus a natural question to ask is whether or not a system driven by a behaviour tree is capable of answering "why" questions. While explainability for behaviour…

@arXiv_mathNT_bot@mastoxiv.page
2025-09-11 08:32:03

Number of integers represented by families of binary forms III: fewnomials
Etienne Fouvry, Michel Waldschmidt
https://arxiv.org/abs/2509.08335 https://arxi…

Number of integers represented by families of binary forms III: fewnomials
In a series of papers we investigated the following question: given a family $\calF$ of binary forms having nonzero discriminant and integer coefficients, for each $d\geqslant 3$, we estimate the number of integers $m$ with $|m|\leqslant N$ which are represented by an element in $\calF$ of degree $\geqslant d$. Under suitable assumptions, asymptotically as $N\to\infty$, the main term in the estimate is given by the forms in $\calF$ having degree $d$ (if any), while the forms of degree…

@arXiv_quantph_bot@mastoxiv.page
2025-09-10 10:25:01

All you need is controlled-V: universality of a standard two-qubit gate by catalytic embedding
Robin Kaarsgaard
https://arxiv.org/abs/2509.07578 https://ar…

All you need is controlled-V: universality of a standard two-qubit gate by catalytic embedding
We present an encoding that renders the controlled-V gate (also known as controlled-$\sqrt{X}$) computationally universal in isolation. Specifically, we show that this gate can simulate the universal Clifford+Toffoli gate set with at most two clean auxiliary qubits and a constant overhead in gate count, and that an additional auxiliary qubit suffices to simulate Clifford+T. Our result settles an open question on the expressiveness of De Vos' gate set based on Negators, and shows that the two-qu…

@arXiv_mathAG_bot@mastoxiv.page
2025-10-09 10:03:51

On the diagonal of quartic hypersurfaces and $(2,3)$-complete intersection $n$-folds
Elia Fiammengo, Morten L\"uders
https://arxiv.org/abs/2510.07111 https://

On the diagonal of quartic hypersurfaces and $(2,3)$-complete intersection $n$-folds
We study the question of the existence of a decomposition of the diagonal for quartic and $(2,3)$-complete intersection $n$-folds. Using cycle-theoretic techniques of Lange, Pavic and Schreieder we reduce the question via a degeneration argument to the existence of such a decomposition for cubic hypersurfaces and their essential dimension. A result of Voisin on the essential dimension of complex cubic hypersurfaces of odd dimension (and of dimension four) then yields conditional statements that…

@servelan@newsie.social
2025-11-09 18:24:34

Now you know they're lying: different responses to the same question, multiple answers, indicate lying about the real reason
Tariffs aren't meant for revenue and will shrink over time, Bessent says
https://www.axios.com/2025/11/09/trump-tariffs-bessent-tax-revenue

@hex@kolektiva.social
2025-09-11 19:48:52

I also have this idea that somewhere there's a a transbian policule coven who take money for hexes and curses against fascists and use it to fund the revolution.
Unlreated question... if a coven is a legally registered church, shouldn't paying for a hex be a legally tax deductable charitable donation? Asking for a friend.

@drgeraint@glasgow.social
2025-10-09 21:29:23

Interesting question on road.cc: is there a bit of road that you just hate #cycling on?
For me, a short stretch of southbound Mearns Rd at Mearns Kirk. Slight rise, poor surface, from Eaglesham Rd to the roundabout:
https://www.…

OpenStreetMap
OpenStreetMap is a map of the world, created by people like you and free to use under an open license.

@arXiv_csHC_bot@mastoxiv.page
2025-10-07 09:59:32

Multi-Hop Question Answering: When Can Humans Help, and Where do They Struggle?
Jinyan Su, Claire Cardie, Jennifer Healey
https://arxiv.org/abs/2510.04493 https://

Multi-Hop Question Answering: When Can Humans Help, and Where do They Struggle?
Multi-hop question answering is a challenging task for both large language models (LLMs) and humans, as it requires recognizing when multi-hop reasoning is needed, followed by reading comprehension, logical reasoning, and knowledge integration. To better understand how humans might collaborate effectively with AI, we evaluate the performance of crowd workers on these individual reasoning subtasks. We find that while humans excel at knowledge integration (97\% accuracy), they often fail to recog…

@grifferz@social.bitfolk.com
2025-10-09 13:04:12

Love it when someone finally responds to me after literally months of silence, I ask a question and they respond correcting me and saying they've taken action now anyway "to avoid further iteration and delay".

@arXiv_astrophSR_bot@mastoxiv.page
2025-09-10 09:33:51

SN 2022xlp: The second-known well-observed, intermediate-luminosity Iax supernova
D. B\'anhidi, B. Barna, T. Szalai, J. Vink\'o, I. B. B\'ir\'o, K. A. Bostroem, I. Cs\'anyi, K. W. Davis, R. J. Foley, L. Galbany, S. W. Jha, D. A. Howell, L. A. Kwok, A. P\'al, C. Pellegrino, C. Rojas-Bravo, P. Sz\'ekely, K. Taggart, G. Terreran, S. Tinyanont

SN 2022xlp: The second-known well-observed, intermediate-luminosity Iax supernova
We present a detailed analysis of type Iax supernova SN 2022xlp. With a V-band absolute magnitude light curve peaking at $M_{max}(V) = -16.04 \pm 0.25$ mag, this object is regarded as the second determined well-observed Iax supernova in the intermediate luminosity range after SN 2019muj. Our research aims to explore the question of whether the physical properties vary continuously across the entire luminosity range. We also investigate the chemical abundance profiles and the characteristic phys…

@arXiv_csSI_bot@mastoxiv.page
2025-10-10 08:26:09

From Keywords to Clusters: AI-Driven Analysis of YouTube Comments to Reveal Election Issue Salience in 2024
Raisa M. Simoes, Timoteo Kelly, Eduardo J. Simoes, Praveen Rao
https://arxiv.org/abs/2510.07821

From Keywords to Clusters: AI-Driven Analysis of YouTube Comments to Reveal Election Issue Salience in 2024
This paper aims to explore two competing data science methodologies to attempt answering the question, "Which issues contributed most to voters' choice in the 2024 presidential election?" The methodologies involve novel empirical evidence driven by artificial intelligence (AI) techniques. By using two distinct methods based on natural language processing and clustering analysis to mine over eight thousand user comments on election-related YouTube videos from one right leaning journal, Wall Stre…

@arXiv_mathGT_bot@mastoxiv.page
2025-09-10 09:11:51

The $L^p$-diameter of the space of contractible loops
Michael Brandenbursky, Egor Shelukhin
https://arxiv.org/abs/2509.07270 https://arxiv.org/pdf/2509.072…

The $L^p$-diameter of the space of contractible loops
We prove that the space of contractible simple loops of a given fixed area in any compact oriented surface has infinite diameter as a homogeneous space of the group of area-preserving diffeomorphisms endowed with the $L^p$-metric. As a special case, this resolves the $L^p$-metric analogue of the well-known question in symplectic topology regarding the space of equators on the two-sphere. Our methods involve a new class of functionals on a normed group, which are more general than quasi-morphism…

@aral@mastodon.ar.al
2025-11-07 18:04:48

“The real question, then, is not ‘what can we do?’, but ‘what are we afraid to do?’ Whose comfort are we protecting when we ask safe questions? Whose illusions do we preserve through politeness? Solidarity is not an optic; it is a disruption. It is noisy, uncomfortable, often isolating. It pulls reputation apart rather than polishing it.
…
We are too fluent in the language of outrage, too comfortable in the posture of virtue. History will not absolve spectatorship, even when specta…

When solidarity becomes spectacle
Francesca Albanese’s visit to South Africa exposed a truth we prefer not to face: that our moral witness has hardened into ritual. We watch, we clap, we call it solidarity.

@cellfourteen@social.petertoushkov.eu
2025-10-07 19:28:19

If you haven't re-watched your 720p pirated copy from 2016 of The Matrix lately, do it now, if only for the showy dialogue at the beginning. So full of hope and righteousness, pure 90s bullet-time fuck-the-establishment attitude.
#Movies #SciFi

Trinity: "It's the question that brought you here. You know the question?"

@arXiv_csDS_bot@mastoxiv.page
2025-10-10 08:36:09

Dynamic Connectivity with Expected Polylogarithmic Worst-Case Update Time
Simon Meierhans, Maximilian Probst Gutenberg
https://arxiv.org/abs/2510.08297 https://

Dynamic Connectivity with Expected Polylogarithmic Worst-Case Update Time
Whether a graph $G=(V,E)$ is connected is arguably its most fundamental property. Naturally, connectivity was the first characteristic studied for dynamic graphs, i.e. graphs that undergo edge insertions and deletions. While connectivity algorithms with polylogarithmic amortized update time have been known since the 90s, achieving worst-case guarantees has proven more elusive. Two recent breakthroughs have made important progress on this question: (1) Kapron, King and Mountjoy [SODA'13; Best …

@davidaugust@mastodon.online
2025-09-08 04:16:33

"When you work on anything, you want to find the range of impulses. which ones get portrayed is another question, but you want to have that complexity and that fullness, even if you’re playing a cartoon character."
—Willem Dafoe
#acting #coaching

@arXiv_astrophIM_bot@mastoxiv.page
2025-10-10 08:38:39

Probing the Origin of Water in Planets within Habitable Zones by HWO
Yasuhiro Hasegawa, Courtney Dressing, Ludmila Carone
https://arxiv.org/abs/2510.07349 https://

Probing the Origin of Water in Planets within Habitable Zones by HWO
How do habitable environments arise and evolve within the context of their planetary systems? This is one fundamental question, and it can be addressed partly by identifying how planets in habitable zones obtain water. Historically, astronomers considered that water was delivered to the Earth via dynamical shake-up by Jupiter, which took place during the formation and post-formation eras (e.g., $\lesssim 100$ Myr). This hypothesis has recently been challenged by a more dynamic view of planet fo…

@matzekult@chaos.social
2025-09-09 12:06:08

Martial Arts have always been a part of #StarTrek. But we have come quite a bit since the famous hand chop, as we can see with the subject of today's #TrekTriviaTuesday question.
As always no googling and no spoiling the answer for others. Please boost after voting! :BoostOK:
V…

@sascha_wolfer@fediscience.org
2025-10-10 06:06:17

Finally, what Xia & Lindell call a "separation problem" is, in our view, a feature of our approach and not a bug.
If, e.g., all languages in a family are polysynthetic (or none are), that’s not a statistical artefact – it’s the signal. The outcome is well associated with genealogy, showing that family membership captures someth genuinely informative about the process. When the model finds that family explains a large share of the variance, that's not a failure–it's evidence that phylogenetic structure dominates the pattern.
So while Xia & Lindell insist that "autocorrelation due to relationships and distance cannot be captured in family or regional-level analyses", we see that as an empirical question – and we treated it as one.
The real test is whether a mixed model that explicitly represents phylogeny and geography performs worse than their alternative, where the entire shared history of languages and environments is effectively collapsed into a single dimension (an eigenvector).
In other words: we model relationships – Xia & Lindell summarise them into one number per language.

@arXiv_mathPR_bot@mastoxiv.page
2025-09-11 08:49:23

The Random Walk Pinning Model II: Upper bounds on the free energy and disorder relevance
Quentin Berger, Hubert Lacoin
https://arxiv.org/abs/2509.08769 https://

The Random Walk Pinning Model II: Upper bounds on the free energy and disorder relevance
This article investigates the question of disorder relevance for the continuous-time Random Walk Pinning Model (RWPM) and completes the results of our companion paper. The RWPM considers a continuous time random walk $X=(X_t)_{t\geq 0}$, whose law is modified by a Gibbs weight given by $\exp(β\int_0^T \mathbf{1}_{\{X_t=Y_t\}} dt)$, where $Y=(Y_t)_{t\geq 0}$ is a quenched trajectory of a second (independent) random walk and $β\geq 0$ is the inverse temperature. The random walk $Y$ has the same…

@arXiv_csNI_bot@mastoxiv.page
2025-09-10 07:51:41

TEGRA: A Flexible & Scalable NextGen Mobile Core
Bilal Saleem, Omar Basit, Jiayi Meng, Iftekhar Alam, Ajay Thakur, Christian Maciocco, Muhammad Shahbaz, Y. Charlie Hu, Larry Peterson
https://arxiv.org/abs/2509.07410

TEGRA: A Flexible & Scalable NextGen Mobile Core
To support emerging mobile use cases (e.g., AR/VR, autonomous driving, and massive IoT), next-generation mobile cores for 5G and 6G are being re-architected as service-based architectures (SBAs) running on both private and public clouds. However, current performance optimization strategies for scaling these cores still revert to traditional NFV-based techniques, such as consolidating functions into rigid, monolithic deployments on dedicated servers. This raises a critical question: Is there an …

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:05:59

Counterfactual Identifiability via Dynamic Optimal Transport
Fabio De Sousa Ribeiro, Ainkaran Santhirasekaram, Ben Glocker
https://arxiv.org/abs/2510.08294 https://

Counterfactual Identifiability via Dynamic Optimal Transport
We address the open question of counterfactual identification for high-dimensional multivariate outcomes from observational data. Pearl (2000) argues that counterfactuals must be identifiable (i.e., recoverable from the observed data distribution) to justify causal claims. A recent line of work on counterfactual inference shows promising results but lacks identification, undermining the causal validity of its estimates. To address this, we establish a foundation for multivariate counterfactual …

@karlauerbach@sfba.social
2025-09-08 19:33:49

I have a question about the following:
Space-X satellites are relatively low orbit - they go around the world, their radio/signal footprint on the ground goes around the world with them.
So how does a single country, the US, issue "license" for radio spectrum that apply outside of the US geographic borders?
"SpaceX buys wireless spectrum from EchoStar in $17 billion deal"

@blackknight95857669@social.linux.pizza
2025-11-09 02:02:51

Ordered a refurb Samsung S20 FE from Newegg, arrived today. I question the "Grade A - Excellent" rating that Reebeio gave it. Has mars all around the edge of the case and the back. Also has what looks to be a pressure crack in the back panel next to the cameras, like someone sat on it while in some kind of protective case and just put enough weight on it to crack the phone case a tiny bit.
However, the screen looks perfect and it powers on and boots to initialization mode no …

@arXiv_csCV_bot@mastoxiv.page
2025-09-10 10:43:31

D-LEAF: Localizing and Correcting Hallucinations in Multimodal LLMs via Layer-to-head Attention Diagnostics
Tiancheng Yang, Lin Zhang, Jiaye Lin, Guimin Hu, Di Wang, Lijie Hu
https://arxiv.org/abs/2509.07864

D-LEAF: Localizing and Correcting Hallucinations in Multimodal LLMs via Layer-to-head Attention Diagnostics
Multimodal Large Language Models (MLLMs) achieve strong performance on tasks like image captioning and visual question answering, but remain prone to hallucinations, where generated text conflicts with the visual input. Prior work links this partly to insufficient visual attention, but existing attention-based detectors and mitigation typically apply uniform adjustments across layers and heads, obscuring where errors originate. In this paper, we first show these methods fail to accurately local…

@Techmeme@techhub.social
2025-09-09 20:16:23

Ramp says it has hit $1B in annualized revenue, after saying it had hit $700M in March; it was valued at $22.5B in July (Julie Bort/TechCrunch)
https://techcrunch.com/2025/09/09/ramp-says-it-has-hit-1b-in-annualized-revenue/

Ramp says it has hit $1B in annualized revenue | TechCrunch
Ramp answered any lingering question as to why investors valued the company at $22.5 billion just 45 days after they valued it at $16 billion.

@arXiv_csCL_bot@mastoxiv.page
2025-09-11 09:47:23

Memorization in Large Language Models in Medicine: Prevalence, Characteristics, and Implications
Anran Li, Lingfei Qian, Mengmeng Du, Yu Yin, Yan Hu, Zihao Sun, Yihang Fu, Erica Stutz, Xuguang Ai, Qianqian Xie, Rui Zhu, Jimin Huang, Yifan Yang, Siru Liu, Yih-Chung Tham, Lucila Ohno-Machado, Hyunghoon Cho, Zhiyong Lu, Hua Xu, Qingyu Chen
https://

Memorization in Large Language Models in Medicine: Prevalence, Characteristics, and Implications
Large Language Models (LLMs) have demonstrated significant potential in medicine. To date, LLMs have been widely applied to tasks such as diagnostic assistance, medical question answering, and clinical information synthesis. However, a key open question remains: to what extent do LLMs memorize medical training data. In this study, we present the first comprehensive evaluation of memorization of LLMs in medicine, assessing its prevalence (how frequently it occurs), characteristics (what is memor…

@cowboys@darktundra.xyz
2025-09-11 15:04:58

Cowboys Defense Faces HUGE Question vs Giants! https://www.youtube.com/watch?v=Yy1kKR4n28A

@arXiv_csAI_bot@mastoxiv.page
2025-09-10 10:12:31

Aligning LLMs for the Classroom with Knowledge-Based Retrieval -- A Comparative RAG Study
Amay Jain, Liu Cui, Si Chen
https://arxiv.org/abs/2509.07846 https://

Aligning LLMs for the Classroom with Knowledge-Based Retrieval -- A Comparative RAG Study
Large language models like ChatGPT are increasingly used in classrooms, but they often provide outdated or fabricated information that can mislead students. Retrieval Augmented Generation (RAG) improves reliability of LLMs by grounding responses in external resources. We investigate two accessible RAG paradigms, vector-based retrieval and graph-based retrieval to identify best practices for classroom question answering (QA). Existing comparative studies fail to account for pedagogical factors s…

@memeorandum@universeodon.com
2025-11-06 01:50:56

Hakeem Jeffries dodges question on whether Mamdani is future of Democratic Party (Fox News)
https://www.foxnews.com/politics/hakeem-jeffries-dodges-question-whether-mamdani-future-democratic-party
http://www.memeorandum.com/251105/p164#a251105p164

@arXiv_statML_bot@mastoxiv.page
2025-10-10 09:37:19

PAC Learnability in the Presence of Performativity
Ivan Kirev, Lyuben Baltadzhiev, Nikola Konstantinov
https://arxiv.org/abs/2510.08335 https://arxiv.org/p…

PAC Learnability in the Presence of Performativity
Following the wide-spread adoption of machine learning models in real-world applications, the phenomenon of performativity, i.e. model-dependent shifts in the test distribution, becomes increasingly prevalent. Unfortunately, since models are usually trained solely based on samples from the original (unshifted) distribution, this performative shift may lead to decreased test-time performance. In this paper, we study the question of whether and when performative binary classification problems are…

@NFL@darktundra.xyz
2025-10-07 11:59:49

Could early bye weeks be a good thing? Why there's an advantage and how six teams are approaching them https://www.espn.com/nfl/story/_/id/46509162/nfl-bye-weeks-2025-advantage-question-steelers-packers-falcons-be…

Does an early bye week offer an advantage for NFL teams? - ESPN
The Eagles and Chargers revamped their offenses during a Week 5 bye last year. Will this year's early bye group (Steelers, Packers, Bears, Falcons) be as aggressive?

@arXiv_mathAG_bot@mastoxiv.page
2025-09-10 08:56:51

The nonexistence of sections of Stiefel varieties and stably free modules
Sebastian Gant
https://arxiv.org/abs/2509.07263 https://arxiv.org/pdf/2509.07263

The nonexistence of sections of Stiefel varieties and stably free modules
Let $V_r(\mathbb{A}^n)$ denote the Stiefel variety ${\rm GL}_n/{\rm GL}_{n-r}$ over a field. There is a natural projection $p: V_{r+\ell}(\mathbb{A}^n) \to V_r(\mathbb{A}^n)$. The question of whether this projection admits a section was asked by M. Raynaud in 1968. We focus on the case of $r \ge 2$ and provide examples of triples $(r,n,\ell)$ for which a section does not exist. Our results produce examples of stably free modules that do not have free summands of a given rank. To this end, we al…

@arXiv_csCY_bot@mastoxiv.page
2025-10-10 07:54:18

Exploring the Viability of the Updated World3 Model for Examining the Impact of Computing on Planetary Boundaries
Nara Guliyeva, Eshta Bhardwaj, Christoph Becker
https://arxiv.org/abs/2510.07634

Exploring the Viability of the Updated World3 Model for Examining the Impact of Computing on Planetary Boundaries
The influential Limits to Growth report introduced a system dynamics-based model to demonstrate global dynamics of the world's population, industry, natural resources, agriculture, and pollution between 1900-2100. In current times, the rapidly expanding trajectory of data center development, much of it linked to AI, uses increasing amounts of natural resources. The extraordinary amount of resources claimed warrants the question of how computing trajectories contribute to exceeding planetary bou…

@arXiv_csCL_bot@mastoxiv.page
2025-09-11 09:19:43

Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression
Nathaniel Imel, Noga Zaslavsky
https://arxiv.org/abs/2509.08093 https://

Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression
Converging evidence suggests that systems of semantic categories across human languages achieve near-optimal compression via the Information Bottleneck (IB) complexity-accuracy principle. Large language models (LLMs) are not trained for this objective, which raises the question: are LLMs capable of evolving efficient human-like semantic systems? To address this question, we focus on the domain of color as a key testbed of cognitive theories of categorization and replicate with LLMs (Gemini 2.0-…

@raiders@darktundra.xyz
2025-09-05 19:03:48

Patriots Rookie’s Status in Question vs. Raiders https://www.si.com/nfl/patriots/news/new-england-patriots-will-campbell-status-question-raiders

New England Patriots Rookie’s Status in Question vs. Las Vegas Raiders
The New England Patriots could be losing a major piece on offense vs. the Las Vegas Raiders.

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:05:19

Opponent Shaping in LLM Agents
Marta Emili Garcia Segura, Stephen Hailes, Mirco Musolesi
https://arxiv.org/abs/2510.08255 https://arxiv.org/pdf/2510.08255

Opponent Shaping in LLM Agents
Large Language Models (LLMs) are increasingly being deployed as autonomous agents in real-world environments. As these deployments scale, multi-agent interactions become inevitable, making it essential to understand strategic behavior in such systems. A central open question is whether LLM agents, like reinforcement learning agents, can shape the learning dynamics and influence the behavior of others through interaction alone. In this paper, we present the first investigation of opponent shapin…

@arXiv_mathDS_bot@mastoxiv.page
2025-10-10 09:14:39

On roundness of rotation sets
Boris Perrot, Jan Boro\'nski, Alex Clark
https://arxiv.org/abs/2510.08235 https://arxiv.org/pdf/2510.08235

On roundness of rotation sets
Motivated by the question whether a round disk can be realized as the rotation set of a torus diffeomorphism, we study the roundness of rotation sets of a parametric family of torus diffeomorphisms $F_ρ$, where the parameter $ρ$ ranges over irrational numbers in $(0,1)$. Each $F_ρ$ is a Kwapisz-like diffeomorphism with a 2-dimensional non-polygonal rotation set $$Λ'_ρ= \operatorname{conv}\left(\left\{(\pm\frac{\lceil mρ\rceil}{m+n+1}, \pm\frac{\lceil nρ\rceil}{m+n+1}): m, n \in \mathbb…

@arXiv_csCV_bot@mastoxiv.page
2025-09-11 10:07:33

BcQLM: Efficient Vision-Language Understanding with Distilled Q-Gated Cross-Modal Fusion
Sike Xiang, Shuang Chen, Amir Atapour-Abarghouei
https://arxiv.org/abs/2509.08715 https:…

BcQLM: Efficient Vision-Language Understanding with Distilled Q-Gated Cross-Modal Fusion
As multimodal large language models (MLLMs) advance, their large-scale architectures pose challenges for deployment in resource-constrained environments. In the age of large models, where energy efficiency, computational scalability and environmental sustainability are paramount, the development of lightweight and high-performance models is critical for real-world applications. As such, we propose a lightweight MLLM framework for end-to-end visual question answering. Our proposed approach centr…

@servelan@newsie.social
2025-09-04 15:01:10

Firefighters question leaders’ role in Washington immigration raid
https://www.dailykos.com/stories/2025/9/4/2341497/-Firefighters-question-leaders-role-in-Washington-immigration-raid

@cowboys@darktundra.xyz
2025-10-10 15:04:23

Mailbag: Pick your poison stopping run, pass? https://www.dallascowboys.com/news/mailbag-pick-your-poison-stopping-run-pass

Mailbag: Pick your poison stopping run, pass?
Having seen improvements on defense the last two weeks, the question becomes, is this a "pick your poison” situation? Defend the pass or stop the run?

@davidaugust@mastodon.online
2025-11-08 16:16:33

"When you work on anything, you want to find the range of impulses. which ones get portrayed is another question, but you want to have that complexity and that fullness, even if you’re playing a cartoon character."
—Willem Dafoe
#acting #coaching

@aral@mastodon.ar.al
2025-09-07 08:13:04

Hey @… – you are complicit in genocide.
Just noting it down for the history books.
#EU #israel

Looping (@Looping@anticapitalist.party)
La guerre à Gaza, un "génocide" pour Teresa Ribera : l'UE se désolidarise de sa commissaire. "Ce n'est pas à la Commission de juger de cette question et de cette définition, mais bien aux tribunaux, et il n'y a pas eu de décision du Collège [des commissaires] sur ce sujet particulier", a déclaré Paula Pinho, porte-parole en chef de la Commission. Non, ce n'est pas aux tribunaux de décider. https://fr.euronews.com/2025/09/05/la-guerre-a-gaza-un-genocide-pour-teresa-ribera-lue-se-d…

@NFL@darktundra.xyz
2025-10-08 20:21:21

NFL Week 6 injury report: Jalen Carter's status in question for Eagles; Giants shorthanded at WR

https://www.cbssports.com/nfl/news/nfl-week-6-injury-report-updates-tracker/

NFL Week 6 injury report: Jalen Carter's status in question for Eagles; Giants shorthanded at WR
Injuries are piling up around the league, with playoff hopefuls monitoring key starters ahead of this weekend's slate

@cowboys@darktundra.xyz
2025-10-10 14:13:33

Mailbag: Pick your poison stopping run, pass? https://www.dallascowboys.com/news/mailbag-pick-your-poison-stopping-run-pass

Mailbag: Pick your poison stopping run, pass?
Having seen improvements on defense the last two weeks, the question becomes, is this a "pick your poison” situation? Defend the pass or stop the run?

@arXiv_csCL_bot@mastoxiv.page
2025-09-08 10:10:40

KERAG: Knowledge-Enhanced Retrieval-Augmented Generation for Advanced Question Answering
Yushi Sun, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, Lei Chen
https://arxiv.org/abs/2509.04716

KERAG: Knowledge-Enhanced Retrieval-Augmented Generation for Advanced Question Answering
Retrieval-Augmented Generation (RAG) mitigates hallucination in Large Language Models (LLMs) by incorporating external data, with Knowledge Graphs (KGs) offering crucial information for question answering. Traditional Knowledge Graph Question Answering (KGQA) methods rely on semantic parsing, which typically retrieves knowledge strictly necessary for answer generation, thus often suffer from low coverage due to rigid schema requirements and semantic ambiguity. We present KERAG, a novel KG-based…

@servelan@newsie.social
2025-09-04 16:14:19

Bill Cassidy Traps RFK Jr. With Nobel Prize Question
https://www.mediaite.com/media/news/senate-republican-bill-cassidy-masterfully-traps-rfk-jr-with-nobel-prize-question/

@raiders@darktundra.xyz
2025-11-04 23:18:22

Cowboys will get a chance to answer the question of whether a big trade would help https://www.foxsports.com/articles/nfl/cowboys-will-get-a-chance-to-answer-the-question-of-whether-a-big-trade-would-help

Cowboys will get a chance to answer the question of whether a big trade would help
Trade was the talk of opening week after the Dallas Cowboys sent star pass rusher Micah Parsons to the Green Bay Packers in a blockbuster

@arXiv_mathDS_bot@mastoxiv.page
2025-10-10 08:59:19

Mean dimension and rate-distortion function revisited
Rui Yang
https://arxiv.org/abs/2510.08051 https://arxiv.org/pdf/2510.08051

Mean dimension and rate-distortion function revisited
Around the mean dimensions and rate-distortion functions, using some tools from local entropy theory this paper establishes the following main results: $(1)$ We prove that for non-ergodic measures associated with almost sure processes, the mean Rényi information dimension coincides with the information dimension rate. This answers a question posed by Gutman and Śpiewak (in Around the variational principle for metric mean dimension, \emph{Studia Math.} \textbf{261}(2021) 345-360). $(2)$ We…

@Techmeme@techhub.social
2025-08-28 05:31:00

A look at India's rationale for banning online real-money games, with IT minister Ashwini Vaishnaw citing 450M people losing a combined ~$2.3B to them (Vivek Kaul/Newslaundry)
https://www.newslaundry.com/2025/08/27/the-rs-444-question-why-in…

The Rs 444 question: Why India banned online money games
The real challenge lies in balancing innovation with protection, profit with responsibility, and freedom with regulation.

@arXiv_csCL_bot@mastoxiv.page
2025-10-07 12:02:02

FocusMed: A Large Language Model-based Framework for Enhancing Medical Question Summarization with Focus Identification
Chao Liu, Ling Luo, Tengxiao Lv, Huan Zhuang, Lejing Yu, Jian Wang, Hongfei Lin
https://arxiv.org/abs/2510.04671

FocusMed: A Large Language Model-based Framework for Enhancing Medical Question Summarization with Focus Identification
With the rapid development of online medical platforms, consumer health questions (CHQs) are inefficient in diagnosis due to redundant information and frequent non-professional terms. The medical question summary (MQS) task aims to transform CHQs into streamlined doctors' frequently asked questions (FAQs), but existing methods still face challenges such as poor identification of question focus and model hallucination. This paper explores the potential of large language models (LLMs) in the MQS …

@NFL@darktundra.xyz
2025-09-03 22:16:23

Lamar Jackson contract: Ravens QB sidesteps question about extension, says he's 'not worried about that'

https://www.cbssports.com/nfl/news/lamar-j…

Lamar Jackson contract: Ravens QB sidesteps question about extension, says he's 'not worried about that'
Jackson is due for a new deal that would benefit both him and the Ravens

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 10:22:41

MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware Retrieval
Xixi Wu, Yanchao Tan, Nan Hou, Ruiyang Zhang, Hong Cheng
https://arxiv.org/abs/2509.07666 htt…

MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware Retrieval
Document Understanding is a foundational AI capability with broad applications, and Document Question Answering (DocQA) is a key evaluation task. Traditional methods convert the document into text for processing by Large Language Models (LLMs), but this process strips away critical multi-modal information like figures. While Large Vision-Language Models (LVLMs) address this limitation, their constrained input size makes multi-page document comprehension infeasible. Retrieval-augmented generatio…

@arXiv_csCL_bot@mastoxiv.page
2025-10-10 16:33:51

Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[1/7]:
- Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering
Pragya Srivastava, Manuj Malik, Vivek Gupta, Tanuja Ganu, Dan Roth

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 10:31:01

SimpleQA Verified: A Reliable Factuality Benchmark to Measure Parametric Knowledge
Lukas Haas, Gal Yona, Giovanni D'Antonio, Sasha Goldshtein, Dipanjan Das
https://arxiv.org/abs/2509.07968

SimpleQA Verified: A Reliable Factuality Benchmark to Measure Parametric Knowledge
We introduce SimpleQA Verified, a 1,000-prompt benchmark for evaluating Large Language Model (LLM) short-form factuality based on OpenAI's SimpleQA. It addresses critical limitations in OpenAI's benchmark, including noisy and incorrect labels, topical biases, and question redundancy. SimpleQA Verified was created through a rigorous multi-stage filtering process involving de-duplication, topic balancing, and source reconciliation to produce a more reliable and challenging evaluation set, alongsi…

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 10:28:41

Are Humans as Brittle as Large Language Models?
Jiahui Li, Sean Papay, Roman Klinger
https://arxiv.org/abs/2509.07869 https://arxiv.org/pdf/2509.07869

Are Humans as Brittle as Large Language Models?
The output of large language models (LLM) is unstable, due to both non-determinism of the decoding process as well as to prompt brittleness. While the intrinsic non-determinism of LLM generation may mimic existing uncertainty in human annotations through distributional shifts in outputs, it is largely assumed, yet unexplored, that the prompt brittleness effect is unique to LLMs. This raises the question: do human annotators show similar sensitivity to instruction changes? If so, should prompt b…

@arXiv_csCL_bot@mastoxiv.page
2025-10-10 11:02:59

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window
Qiaoyu Tang, Hao Xiang, Le Yu, Bowen Yu, Yaojie Lu, Xianpei Han, Le Sun, WenJuan Zhang, Pengbo Wang, Shixuan Liu, Zhenru Zhang, Jianhong Tu, Hongyu Lin, Junyang Lin
https://arxiv.org/abs/2510.08276

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window
While recent advances in reasoning models have demonstrated cognitive behaviors through reinforcement learning, existing approaches struggle to invoke deep reasoning capabilities in multi-turn agents with long-horizon interactions. We propose DeepMiner, a novel framework that elicits such abilities by introducing high-difficulty training tasks and dynamic context window. DeepMiner presents a reverse construction method to generate complex but verifiable question-answer pairs from authentic web …

@arXiv_csCL_bot@mastoxiv.page
2025-10-10 10:52:09

AI Knowledge Assist: An Automated Approach for the Creation of Knowledge Bases for Conversational AI Agents
Md Tahmid Rahman Laskar, Julien Bouvier Tremblay, Xue-Yong Fu, Cheng Chen, Shashi Bhushan TN
https://arxiv.org/abs/2510.08149

AI Knowledge Assist: An Automated Approach for the Creation of Knowledge Bases for Conversational AI Agents
The utilization of conversational AI systems by leveraging Retrieval Augmented Generation (RAG) techniques to solve customer problems has been on the rise with the rapid progress of Large Language Models (LLMs). However, the absence of a company-specific dedicated knowledge base is a major barrier to the integration of conversational AI systems in contact centers. To this end, we introduce AI Knowledge Assist, a system that extracts knowledge in the form of question-answer (QA) pairs from histo…

@arXiv_csCL_bot@mastoxiv.page
2025-10-06 10:17:59

StepChain GraphRAG: Reasoning Over Knowledge Graphs for Multi-Hop Question Answering
Tengjun Ni, Xin Yuan, Shenghong Li, Kai Wu, Ren Ping Liu, Wei Ni, Wenjie Zhang
https://arxiv.org/abs/2510.02827

StepChain GraphRAG: Reasoning Over Knowledge Graphs for Multi-Hop Question Answering
Recent progress in retrieval-augmented generation (RAG) has led to more accurate and interpretable multi-hop question answering (QA). Yet, challenges persist in integrating iterative reasoning steps with external knowledge retrieval. To address this, we introduce StepChain GraphRAG, a framework that unites question decomposition with a Breadth-First Search (BFS) Reasoning Flow for enhanced multi-hop QA. Our approach first builds a global index over the corpus; at inference time, only retrieved …

@arXiv_csCL_bot@mastoxiv.page
2025-09-08 10:11:10

Research on Multi-hop Inference Optimization of LLM Based on MQUAKE Framework
Zucheng Liang, Wenxin Wei, Kaijie Zhang, Hongyi Chen
https://arxiv.org/abs/2509.04770 https://

Research on Multi-hop Inference Optimization of LLM Based on MQUAKE Framework
Accurately answering complex questions has consistently been a significant challenge for Large Language Models (LLMs). To address this, this paper proposes a multi-hop question decomposition method for complex questions, building upon research within the MQUAKE framework. Utilizing the LLAMA3 model, we systematically investigate the impact of multi-hop question decomposition within knowledge graphs on model comprehension and reasoning accuracy, both before and after model training. In our exper…

@arXiv_csCL_bot@mastoxiv.page
2025-10-06 08:45:09

AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering
Ziqing Wang, Chengsheng Mao, Xiaole Wen, Yuan Luo, Kaize Ding
https://arxiv.org/abs/2510.02328

AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering
Medical Multimodal Large Language Models (Med-MLLMs) have shown great promise in medical visual question answering (Med-VQA). However, when deployed in low-resource settings where abundant labeled data are unavailable, existing Med-MLLMs commonly fail due to their medical reasoning capability bottlenecks: (i) the intrinsic reasoning bottleneck that ignores the details from the medical image; (ii) the extrinsic reasoning bottleneck that fails to incorporate specialized medical knowledge. To addr…

@arXiv_csCL_bot@mastoxiv.page
2025-10-06 10:14:09

Uncertainty as Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question-Answering
Yavuz Bakman, Sungmin Kang, Zhiqi Huang, Duygu Nur Yaldiz, Catarina G. Bel\'em, Chenyang Zhu, Anoop Kumar, Alfy Samuel, Salman Avestimehr, Daben Liu, Sai Praneeth Karimireddy
https://arxiv.org/abs/2510.02671

Uncertainty as Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question-Answering
Uncertainty Quantification (UQ) research has primarily focused on closed-book factual question answering (QA), while contextual QA remains unexplored, despite its importance in real-world applications. In this work, we focus on UQ for the contextual QA task and propose a theoretically grounded approach to quantify epistemic uncertainty. We begin by introducing a task-agnostic, token-level uncertainty measure defined as the cross-entropy between the predictive distribution of the given model and…

Tootfinder

Opt-in global Mastodon full text search. Join the index!