Tootfinder

@arXiv_csIR_bot@mastoxiv.page
2025-06-03 07:39:40

GLoSS: Generative Language Models with Semantic Search for Sequential Recommendation
Krishna Acharya, Aleksandr V. Petrov, Juba Ziani
https://arxiv.org/abs/2506.01910

GLoSS: Generative Language Models with Semantic Search for Sequential Recommendation
We propose Generative Low-rank language model with Semantic Search (GLoSS), a generative recommendation framework that combines large language models with dense retrieval for sequential recommendation. Unlike prior methods such as GPT4Rec, which rely on lexical matching via BM25, GLoSS uses semantic search to retrieve relevant items beyond lexical matching. For query generation, we employ 4-bit quantized LlaMA-3 models fine-tuned with low-rank adaptation (LoRA), enabling efficient training and …

@arXiv_csCY_bot@mastoxiv.page
2025-06-03 07:19:09

Beyond Monoliths: Expert Orchestration for More Capable, Democratic, and Safe Large Language Models
Philip Quirke, Narmeen Oozeer, Chaithanya Bandi, Amir Abdullah, Jason Hoelscher-Obermaier, Jeff M. Phillips, Joshua Greaves, Clement Neo, Fazl Barez, Shriyash Upadhyay
https://arxiv.org/abs/2506.00051

Beyond Monoliths: Expert Orchestration for More Capable, Democratic, and Safe Large Language Models
This position paper argues that the prevailing trajectory toward ever larger, more expensive generalist foundation models controlled by a handful of big companies limits innovation and constrains progress. We challenge this approach by advocating for an "Expert Orchestration" framework as a superior alternative that democratizes LLM advancement. Our proposed framework intelligently selects from thousands of existing models based on query requirements and decomposition, focusing on identifying w…

@arXiv_csCR_bot@mastoxiv.page
2025-06-02 09:57:08

This https://arxiv.org/abs/2409.17275 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…

On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains
Retrieval-Augmented Generation (RAG) has been empirically shown to enhance the performance of large language models (LLMs) in knowledge-intensive domains such as healthcare, finance, and legal contexts. Given a query, RAG retrieves relevant documents from a corpus and integrates them into the LLMs' generation process. In this study, we investigate the adversarial robustness of RAG, focusing specifically on examining the retrieval system. First, across 225 different setup combinations of corpus,…

@arXiv_csIR_bot@mastoxiv.page
2025-07-02 09:01:00

MassTool: A Multi-Task Search-Based Tool Retrieval Framework for Large Language Models
Jianghao Lin, Xinyuan Wang, Xinyi Dai, Menghui Zhu, Bo Chen, Ruiming Tang, Yong Yu, Weinan Zhang
https://arxiv.org/abs/2507.00487

MassTool: A Multi-Task Search-Based Tool Retrieval Framework for Large Language Models
Tool retrieval is a critical component in enabling large language models (LLMs) to interact effectively with external tools. It aims to precisely filter the massive tools into a small set of candidates for the downstream tool-augmented LLMs. However, most existing approaches primarily focus on optimizing tool representations, often neglecting the importance of precise query comprehension. To address this gap, we introduce MassTool, a multi-task search-based framework designed to enhance both qu…

@arXiv_condmatmtrlsci_bot@mastoxiv.page
2025-07-02 09:45:20

Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer
Satadeep Bhattacharjee, Seung-Cheol Lee
https://arxiv.org/abs/2507.00683

Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer
The recently proposed physics-based framework by Huo and Johnson~\cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive the corresponding effective Hamiltonian for every attention head. From these Hamiltonians we obtain a…

@arXiv_csDB_bot@mastoxiv.page
2025-05-30 07:16:53

TailorSQL: An NL2SQL System Tailored to Your Query Workload
Kapil Vaidya, Jialin Ding, Sebastian Kosak, David Kernert, Chuan Lei, Xiao Qin, Abhinav Tripathy, Ramesh Balan, Balakrishnan Narayanaswamy, Tim Kraska
https://arxiv.org/abs/2505.23039

TailorSQL: An NL2SQL System Tailored to Your Query Workload
NL2SQL (natural language to SQL) translates natural language questions into SQL queries, thereby making structured data accessible to non-technical users, serving as the foundation for intelligent data applications. State-of-the-art NL2SQL techniques typically perform translation by retrieving database-specific information, such as the database schema, and invoking a pre-trained large language model (LLM) using the question and retrieved information to generate the SQL query. However, existin…

@arXiv_csMM_bot@mastoxiv.page
2025-06-03 07:22:17

Small Stickers, Big Meanings: A Multilingual Sticker Semantic Understanding Dataset with a Gamified Approach
Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang
https://arxiv.org/abs/2506.01668

Small Stickers, Big Meanings: A Multilingual Sticker Semantic Understanding Dataset with a Gamified Approach
Stickers, though small, are a highly condensed form of visual expression, ubiquitous across messaging platforms and embraced by diverse cultures, genders, and age groups. Despite their popularity, sticker retrieval remains an underexplored task due to the significant human effort and subjectivity involved in constructing high-quality sticker query datasets. Although large language models (LLMs) excel at general NLP tasks, they falter when confronted with the nuanced, intangible, and highly spec…

@arXiv_csLO_bot@mastoxiv.page
2025-07-01 09:29:03

Querying Attack-Fault-Defense Trees: Property Specification in Smart Grid and Aerospace Case Studies
Reza Soltani, Stefano M. Nicoletti, Milan Lopuha\"a-Zwakenberg, Mari\"elle Stoelinga
https://arxiv.org/abs/2506.23789

Querying Attack-Fault-Defense Trees: Property Specification in Smart Grid and Aerospace Case Studies
This paper introduces AFDL, a logic-based framework for reasoning about safety, security, and defense interactions in Attack-Fault-Defense Trees, which is a model that captures all safety, security, and defense domains in a single framework. We showcase both AFDL and propose a structured domain specific query language, LangAFDL, which enables domain experts to express complex analysis goals through intuitive templates. LangAFDL supports both Boolean and quantified queries as well as minimal cut…

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:54:19

Leveraging LLM-Assisted Query Understanding for Live Retrieval-Augmented Generation
Guanting Dong, Xiaoxi Li, Yuyao Zhang, Mengjie Deng
https://arxiv.org/abs/2506.21384

Leveraging LLM-Assisted Query Understanding for Live Retrieval-Augmented Generation
Real-world live retrieval-augmented generation (RAG) systems face significant challenges when processing user queries that are often noisy, ambiguous, and contain multiple intents. While RAG enhances large language models (LLMs) with external knowledge, current systems typically struggle with such complex inputs, as they are often trained or evaluated on cleaner data. This paper introduces Omni-RAG, a novel framework designed to improve the robustness and effectiveness of RAG systems in live, o…

@arXiv_csIR_bot@mastoxiv.page
2025-07-02 08:39:29

Read the Docs Before Rewriting: Equip Rewriter with Domain Knowledge via Continual Pre-training
Qi Wang, Yixuan Cao, Yifan Liu, Jiangtao Zhao, Ping Luo
https://arxiv.org/abs/2507.00477

Read the Docs Before Rewriting: Equip Rewriter with Domain Knowledge via Continual Pre-training
A Retrieval-Augmented Generation (RAG)-based question-answering (QA) system enhances a large language model's knowledge by retrieving relevant documents based on user queries. Discrepancies between user queries and document phrasings often necessitate query rewriting. However, in specialized domains, the rewriter model may struggle due to limited domain-specific knowledge. To resolve this, we propose the R\&R (Read the doc before Rewriting) rewriter, which involves continual pre-training on pro…

@arXiv_csLG_bot@mastoxiv.page
2025-06-25 07:36:59

HI-SQL: Optimizing Text-to-SQL Systems through Dynamic Hint Integration
Ganesh Parab, Zishan Ahmad, Dagnachew Birru
https://arxiv.org/abs/2506.18916 https:…

HI-SQL: Optimizing Text-to-SQL Systems through Dynamic Hint Integration
Text-to-SQL generation bridges the gap between natural language and databases, enabling users to query data without requiring SQL expertise. While large language models (LLMs) have significantly advanced the field, challenges remain in handling complex queries that involve multi-table joins, nested conditions, and intricate operations. Existing methods often rely on multi-step pipelines that incur high computational costs, increase latency, and are prone to error propagation. To address these l…

@arXiv_csDB_bot@mastoxiv.page
2025-05-30 07:17:10

KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
Jang-Hyun Kim, Jinuk Kim, Sangwoo Kwon, Jae W. Lee, Sangdoo Yun, Hyun Oh Song
https://arxiv.org/abs/2505.23416

KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
Transformer-based large language models (LLMs) cache context as key-value (KV) pairs during inference. As context length grows, KV cache sizes expand, leading to substantial memory overhead and increased attention latency. This paper introduces KVzip, a query-agnostic KV cache eviction method enabling effective reuse of compressed KV caches across diverse queries. KVzip quantifies the importance of a KV pair using the underlying LLM to reconstruct original contexts from cached KV pairs, subsequ…

@arXiv_csDC_bot@mastoxiv.page
2025-06-18 08:12:32

D\'ej\`a Vu: Efficient Video-Language Query Engine with Learning-based Inter-Frame Computation Reuse
Jinwoo Hwang, Daeun Kim, Sangyeop Lee, Yoonsung Kim, Guseul Heo, Hojoon Kim, Yunseok Jeong, Tadiwos Meaza, Eunhyeok Park, Jeongseob Ahn, Jongse Park
https://arxiv.org/abs/2506.14107

Déjà Vu: Efficient Video-Language Query Engine with Learning-based Inter-Frame Computation Reuse
Recently, Video-Language Models (VideoLMs) have demonstrated remarkable capabilities, offering significant potential for flexible and powerful video query systems. These models typically rely on Vision Transformers (ViTs), which process video frames individually to extract visual embeddings. However, generating embeddings for large-scale videos requires ViT inferencing across numerous frames, posing a major hurdle to real-world deployment and necessitating solutions for integration into scalabl…

@arXiv_csIR_bot@mastoxiv.page
2025-06-03 07:21:46

Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval
Seongwan Park, Taeklim Kim, Youngjoong Ko
https://arxiv.org/abs/2506.00041

Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval
Despite their strong performance, Dense Passage Retrieval (DPR) models suffer from a lack of interpretability. In this work, we propose a novel interpretability framework that leverages Sparse Autoencoders (SAEs) to decompose previously uninterpretable dense embeddings from DPR models into distinct, interpretable latent concepts. We generate natural language descriptions for each latent concept, enabling human interpretations of both the dense embeddings and the query-document similarity scores…

@arXiv_csCR_bot@mastoxiv.page
2025-05-30 09:51:42

This https://arxiv.org/abs/2411.06426 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…

SequentialBreak: Large Language Models Can be Fooled by Embedding Jailbreak Prompts into Sequential Prompt Chains
As the integration of the Large Language Models (LLMs) into various applications increases, so does their susceptibility to misuse, raising significant security concerns. Numerous jailbreak attacks have been proposed to assess the security defense of LLMs. Current jailbreak attacks mainly rely on scenario camouflage, prompt obfuscation, prompt optimization, and prompt iterative optimization to conceal malicious prompts. In particular, sequential prompt chains in a single query can lead LLMs to …

@arXiv_csDB_bot@mastoxiv.page
2025-05-29 07:17:12

Query, Don't Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries
Josefa Lia Stoisser, Marc Boubnovski Martell, Kaspar M\"artens, Lawrence Phillips, Stephen Michael Town, Rory Donovan-Maiye, Julien Fauqueur
https://arxiv.org/abs/2505.21801

Query, Don't Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries
Electronic health records (EHRs) contain richly structured, longitudinal data essential for predictive modeling, yet stringent privacy regulations (e.g., HIPAA, GDPR) often restrict access to individual-level records. We introduce Query, Don't Train (QDT): a structured-data foundation-model interface enabling tabular inference via LLM-generated SQL over EHRs. Instead of training on or accessing individual-level examples, QDT uses a large language model (LLM) as a schema-aware query planner to g…

@arXiv_eessIV_bot@mastoxiv.page
2025-06-23 08:43:40

InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang
https://arxiv.org/abs/2506.15745

InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
Modern multimodal large language models (MLLMs) can reason over hour-long video, yet their key-value (KV) cache grows linearly with time--quickly exceeding the fixed memory of phones, AR glasses, and edge robots. Prior compression schemes either assume the whole video and user query are available offline or must first build the full cache, so memory still scales with stream length. InfiniPot-V is the first training-free, query-agnostic framework that enforces a hard, length-independent memory c…

@arXiv_csAI_bot@mastoxiv.page
2025-06-18 08:04:15

Lightweight Relevance Grader in RAG
Taehee Jeong
https://arxiv.org/abs/2506.14084 https://arxiv.org/pdf/2506.14084

Lightweight Relevance Grader in RAG
Retrieval-Augmented Generation (RAG) addresses limitations of large language models (LLMs) by leveraging a vector database to provide more accurate and up-to-date information. When a user submits a query, RAG executes a vector search to find relevant documents, which are then used to generate a response. However, ensuring the relevance of retrieved documents with a query would be a big challenge. To address this, a secondary model, known as a relevant grader, can be served to verify its relevan…

@arXiv_csHC_bot@mastoxiv.page
2025-06-19 08:23:34

Optimizing Web-Based AI Query Retrieval with GPT Integration in LangChain A CoT-Enhanced Prompt Engineering Approach
Wenqi Guan, Yang Fang
https://arxiv.org/abs/2506.15512

Optimizing Web-Based AI Query Retrieval with GPT Integration in LangChain A CoT-Enhanced Prompt Engineering Approach
Large Language Models have brought a radical change in the process of remote learning students, among other aspects of educative activities. Current retrieval of remote learning resources lacks depth in contextual meaning that provides comprehensive information on complex student queries. This work proposes a novel approach to enhancing remote learning retrieval by integrating GPT-based models within the LangChain framework. We achieve this system in a more intuitive and productive manner using…

@arXiv_csDB_bot@mastoxiv.page
2025-05-30 09:51:22

This https://arxiv.org/abs/2505.21801 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csDB_…

@arXiv_csCR_bot@mastoxiv.page
2025-06-26 09:21:00

Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models
Wanli Peng, Xin Chen, Hang Fu, XinYu He, Xue Yiming, Juan Wen
https://arxiv.org/abs/2506.19889

Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models
Recent advances in large language models (LLMs) have made a profound impact on our society and also raised new security concerns. Particularly, due to the remarkable inference ability of LLMs, the privacy violation attack (PVA), revealed by Staab et al., introduces serious personal privacy issues. Existing defense methods mainly leverage LLMs to anonymize the input query, which requires costly inference time and cannot gain satisfactory defense performance. Moreover, directly rejecting the PVA …

@arXiv_csPL_bot@mastoxiv.page
2025-06-03 16:09:21

This https://arxiv.org/abs/2505.14690 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csPL_…

SGL: A Structured Graphics Language
This paper introduces SGL, a graphics language that is aesthetically similar to SQL. SGL is based on traditional grammars of graphics, as well as Vega-Lite's composition algebra. SGL demonstrates that the grammatical approach to graphics lends itself naturally to a SQL-like language. As a graphical counterpart to SQL, SGL facilitates the addition of visualization capabilities to SQL query interfaces. This paper presents components of the SGL language alongside examples. Comparisons to SQL and e…

@arXiv_csSE_bot@mastoxiv.page
2025-06-16 10:15:59

Invocable APIs derived from NL2SQL datasets for LLM Tool-Calling Evaluation
Benjamin Elder, Anupama Murthi, Jungkoo Kang, Ankita Rajaram Naik, Kiran Kate, Kinjal Basu, Danish Contractor
https://arxiv.org/abs/2506.11266

Invocable APIs derived from NL2SQL datasets for LLM Tool-Calling Evaluation
Large language models (LLMs) are routinely deployed as agentic systems, with access to tools that interact with live environments to accomplish tasks. In enterprise deployments these systems need to interact with API collections that can be extremely large and complex, often backed by databases. In order to create datasets with such characteristics, we explore how existing NL2SQL (Natural Language to SQL query) datasets can be used to automatically create NL2API datasets. Specifically, this wor…

@arXiv_csSD_bot@mastoxiv.page
2025-06-11 08:08:45

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang, Zixin Zhang, Bin Wang, Bo Li, Buyun Ma, Changxin Miao, Changyi Wan, Chen Xu, Dapeng Shi, Dingyuan Hu, Enle…

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a dual-codebook audio tokenizer for linguistic and semantic feature extraction, a 130-billion-parameter…

@arXiv_csCL_bot@mastoxiv.page
2025-06-12 09:06:21

Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
Wuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen, Xi Ye
https://arxiv.org/abs/2506.09944

Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
Recent work has identified retrieval heads (Wu et al., 2025b), a subset of attention heads responsible for retrieving salient information in long-context language models (LMs), as measured by their copy-paste behavior in Needle-in-a-Haystack tasks. In this paper, we introduce QRHEAD (Query-Focused Retrieval Head), an improved set of attention heads that enhance retrieval from long context. We identify QRHEAD by aggregating attention scores with respect to the input query, using a handful of exa…

@arXiv_csIR_bot@mastoxiv.page
2025-05-30 09:53:40

This https://arxiv.org/abs/2502.17057 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…

ExpandR: Teaching Dense Retrievers Beyond Queries with LLM Guidance
Large language models (LLMs) have demonstrated significant potential in enhancing dense retrieval through query augmentation. However, most existing methods treat the LLM and the retriever as separate modules, overlooking the alignment between generation and ranking objectives. In this work, we propose ExpandR, a unified LLM-augmented dense retrieval framework that jointly optimizes both the LLM and the retriever. ExpandR employs the LLM to generate semantically rich query expansions, which are…

@arXiv_csIR_bot@mastoxiv.page
2025-06-24 11:43:30

Harnessing the Power of Reinforcement Learning for Language-Model-Based Information Retriever via Query-Document Co-Augmentation
Jingming Liu, Yumeng Li, Wei Shi, Yao-Xiang Ding, Hui Su, Kun Zhou
https://arxiv.org/abs/2506.18670

Harnessing the Power of Reinforcement Learning for Language-Model-Based Information Retriever via Query-Document Co-Augmentation
Recent studies have proposed leveraging Large Language Models (LLMs) as information retrievers through query rewriting. However, for challenging corpora, we argue that enhancing queries alone is insufficient for robust semantic matching; the LLM should also have sufficient understanding of the corpus by directly handling and augmenting the documents themselves. To this end, we present an LLM-based retriever empowered to augment both user queries and corpus documents, with its policy fully explo…

@erc_bk@fosstodon.org
2025-05-08 15:40:22

I feel like ChatGPT is taking a jab at other types of models here.

Images shows the response of ChatGPT to a query about map artifacts generated by a random forest model. It describes some of the predictions of the model as hallucinations which is a common critique of large language models.

@arXiv_csIR_bot@mastoxiv.page
2025-06-30 08:46:00

PentaRAG: Large-Scale Intelligent Knowledge Retrieval for Enterprise LLM Applications
Abu Hanif Muhammad Syarubany, Chang Dong Yoo
https://arxiv.org/abs/2506.21593

PentaRAG: Large-Scale Intelligent Knowledge Retrieval for Enterprise LLM Applications
Enterprise deployments of large-language model (LLM) demand continuously changing document collections with sub-second latency and predictable GPU cost requirements that classical Retrieval-Augmented Generation (RAG) pipelines only partially satisfy. We present PentaRAG, a five-layer module that routes each query through two instant caches (fixed key-value and semantic), a memory-recall mode that exploits the LLM's own weights, an adaptive session memory, and a conventional retrieval-augmentati…

@arXiv_csDB_bot@mastoxiv.page
2025-06-17 09:28:39

Datrics Text2SQL: A Framework for Natural Language to SQL Query Generation
Tetiana Gladkykh, Kyrylo Kirykov
https://arxiv.org/abs/2506.12234 https://

Datrics Text2SQL: A Framework for Natural Language to SQL Query Generation
Text-to-SQL systems enable users to query databases using natural language, democratizing access to data analytics. However, they face challenges in understanding ambiguous phrasing, domain-specific vocabulary, and complex schema relationships. This paper introduces Datrics Text2SQL, a Retrieval-Augmented Generation (RAG)-based framework designed to generate accurate SQL queries by leveraging structured documentation, example-based learning, and domain-specific rules. The system builds a rich K…

@arXiv_csAR_bot@mastoxiv.page
2025-06-04 07:17:25

Hardware-Centric Analysis of DeepSeek's Multi-Head Latent Attention
Robin Geens, Marian Verhelst
https://arxiv.org/abs/2506.02523 https://

Hardware-Centric Analysis of DeepSeek's Multi-Head Latent Attention
Multi-Head Latent Attention (MLA), introduced in DeepSeek-V2, improves the efficiency of large language models by projecting query, key, and value tensors into a compact latent space. This architectural change reduces the KV-cache size and significantly lowers memory bandwidth demands, particularly in the autoregressive decode phase. This letter presents the first hardware-centric analysis of MLA, comparing it to conventional Multi-Head Attention (MHA) and evaluating its implications for accele…

@arXiv_csDC_bot@mastoxiv.page
2025-06-16 07:26:29

SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding
Ziyi Zhang, Ziheng Jiang, Chengquan Jiang, Menghan Yu, Size Zheng, Haibin Lin, Henry Hoffmann, Xin Liu
https://arxiv.org/abs/2506.11309

SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding
Low-latency decoding for large language models (LLMs) is crucial for applications like chatbots and code assistants, yet generating long outputs remains slow in single-query settings. Prior work on speculative decoding (which combines a small draft model with a larger target model) and tensor parallelism has each accelerated decoding. However, conventional approaches fail to apply both simultaneously due to imbalanced compute requirements (between draft and target models), KV-cache inconsistenc…

@arXiv_csNI_bot@mastoxiv.page
2025-06-05 07:20:19

NetPress: Dynamically Generated LLM Benchmarks for Network Applications
Yajie Zhou, Jiajun Ruan, Eric S. Wang, Sadjad Fouladi, Francis Y. Yan, Kevin Hsieh, Zaoxing Liu
https://arxiv.org/abs/2506.03231

NetPress: Dynamically Generated LLM Benchmarks for Network Applications
Despite growing interest in domain-specific benchmarking of large language models (LLMs) and agents, current evaluations remain limited to static, small-scale datasets, especially in high-stakes tasks like network operations that demand reliability for deployments. We present NetPress, an automated benchmark generation framework for evaluating LLM agents in network applications. NetPress introduces a unified abstraction with state and action, enabling dynamic generation of diverse query sets al…

@arXiv_csIR_bot@mastoxiv.page
2025-06-23 08:51:30

Revela: Dense Retriever Learning via Language Modeling
Fengyu Cai, Tong Chen, Xinran Zhao, Sihao Chen, Hongming Zhang, Sherry Tongshuang Wu, Iryna Gurevych, Heinz Koeppl
https://arxiv.org/abs/2506.16552

Revela: Dense Retriever Learning via Language Modeling
Dense retrievers play a vital role in accessing external and specialized knowledge to augment language models (LMs). Training dense retrievers typically requires annotated query-document pairs, which are costly and hard to obtain in specialized domains such as code-motivating growing interest in self-supervised retriever learning. Since LMs are trained to capture token-level dependencies through a self-supervised learning objective (i.e., next-token prediction), we can analogously cast retrieva…

@arXiv_csAI_bot@mastoxiv.page
2025-06-03 18:16:23

This https://arxiv.org/abs/2505.24226 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…

E^2GraphRAG: Streamlining Graph-based RAG for High Efficiency and Effectiveness
Graph-based RAG methods like GraphRAG have shown promising global understanding of the knowledge base by constructing hierarchical entity graphs. However, they often suffer from inefficiency and rely on manually pre-defined query modes, limiting practical use. In this paper, we propose E^2GraphRAG, a streamlined graph-based RAG framework that improves both Efficiency and Effectiveness. During the indexing stage, E^2GraphRAG constructs a summary tree with large language models and an entity grap…

@arXiv_csDB_bot@mastoxiv.page
2025-06-09 07:27:22

Training-Free Query Optimization via LLM-Based Plan Similarity
Nikita Vasilenko, Alexander Demin, Vladimir Boorlakov
https://arxiv.org/abs/2506.05853 https…

Training-Free Query Optimization via LLM-Based Plan Similarity
Large language model (LLM) embeddings offer a promising new avenue for database query optimization. In this paper, we explore how pre-trained execution plan embeddings can guide SQL query execution without the need for additional model training. We introduce LLM-PM (LLM-based Plan Mapping), a framework that embeds the default execution plan of a query, finds its k nearest neighbors among previously executed plans, and recommends database hintsets based on neighborhood voting. A lightweight cons…

@arXiv_csIR_bot@mastoxiv.page
2025-06-17 10:07:05

SPOT: Bridging Natural Language and Geospatial Search for Investigative Journalists
Lynn Khellaf, Ipek Baris Schlicht, Tilman Mirass, Julia Bayer, Tilman Wagner, Ruben Bouwmeester
https://arxiv.org/abs/2506.13188

SPOT: Bridging Natural Language and Geospatial Search for Investigative Journalists
OpenStreetMap (OSM) is a vital resource for investigative journalists doing geolocation verification. However, existing tools to query OSM data such as Overpass Turbo require familiarity with complex query languages, creating barriers for non-technical users. We present SPOT, an open source natural language interface that makes OSM's rich, tag-based geographic data more accessible through intuitive scene descriptions. SPOT interprets user inputs as structured representations of geospatial objec…

@arXiv_csIR_bot@mastoxiv.page
2025-06-16 07:50:09

TongSearch-QR: Reinforced Query Reasoning for Retrieval
Xubo Qin, Jun Bai, Jiaqi Li, Zixia Jia, Zilong Zheng
https://arxiv.org/abs/2506.11603 https://

TongSearch-QR: Reinforced Query Reasoning for Retrieval
Traditional information retrieval (IR) methods excel at textual and semantic matching but struggle in reasoning-intensive retrieval tasks that require multi-hop inference or complex semantic understanding between queries and documents. One promising solution is to explicitly rewrite or augment queries using large language models (LLMs) to elicit reasoning-relevant content prior to retrieval. However, the widespread use of large-scale language models like GPT-4 or LLaMA3-70B remains impractical …

@arXiv_csIR_bot@mastoxiv.page
2025-06-06 07:19:02

Exp4Fuse: A Rank Fusion Framework for Enhanced Sparse Retrieval using Large Language Model-based Query Expansion
Lingyuan Liu, Mengxiang Zhang
https://arxiv.org/abs/2506.04760

Exp4Fuse: A Rank Fusion Framework for Enhanced Sparse Retrieval using Large Language Model-based Query Expansion
Large Language Models (LLMs) have shown potential in generating hypothetical documents for query expansion, thereby enhancing information retrieval performance. However, the efficacy of this method is highly dependent on the quality of the generated documents, which often requires complex prompt strategies and the integration of advanced dense retrieval techniques. This can be both costly and computationally intensive. To mitigate these limitations, we explore the use of zero-shot LLM-based que…

@arXiv_csIR_bot@mastoxiv.page
2025-06-06 07:19:23

GOLFer: Smaller LM-Generated Documents Hallucination Filter & Combiner for Query Expansion in Information Retrieval
Lingyuan Liu, Mengxiang Zhang
https://arxiv.org/abs/2506.04762

GOLFer: Smaller LM-Generated Documents Hallucination Filter & Combiner for Query Expansion in Information Retrieval
Large language models (LLMs)-based query expansion for information retrieval augments queries with generated hypothetical documents with LLMs. However, its performance relies heavily on the scale of the language models (LMs), necessitating larger, more advanced LLMs. This approach is costly, computationally intensive, and often has limited accessibility. To address these limitations, we introduce GOLFer - Smaller LMs-Generated Documents Hallucination Filter & Combiner - a novel method leveragin…

@arXiv_csDB_bot@mastoxiv.page
2025-06-04 13:32:54

This https://arxiv.org/abs/2505.19988 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csDB_…

Automatic Metadata Extraction for Text-to-SQL
Large Language Models (LLMs) have recently become sophisticated enough to automate many tasks ranging from pattern finding to writing assistance to code generation. In this paper, we examine text-to-SQL generation. We have observed from decades of experience that the most difficult part of query development lies in understanding the database contents. These experiences inform the direction of our research. Text-to-SQL benchmarks such as SPIDER and Bird contain extensive metadata that is gener…

@arXiv_csIR_bot@mastoxiv.page
2025-06-17 10:14:17

Tree-Based Text Retrieval via Hierarchical Clustering in RAGFrameworks: Application on Taiwanese Regulations
Chia-Heng Yu, Yen-Lung Tsai
https://arxiv.org/abs/2506.13607

Tree-Based Text Retrieval via Hierarchical Clustering in RAGFrameworks: Application on Taiwanese Regulations
Traditional Retrieval-Augmented Generation (RAG) systems employ brute-force inner product search to retrieve the top-k most similar documents, then combined with the user query and passed to a language model. This allows the model to access external knowledge and reduce hallucinations. However, selecting an appropriate k value remains a significant challenge in practical applications: a small k may fail to retrieve sufficient information, while a large k can introduce excessive and irrelevant c…

@arXiv_csDB_bot@mastoxiv.page
2025-06-03 16:03:48

This https://arxiv.org/abs/2503.00600 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csDB_…

Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing Systems
AI-augmented data processing systems (DPSs) integrate large language models (LLMs) into query pipelines, allowing powerful semantic operations on structured and unstructured data. However, the reliability (a.k.a. trust) of these systems is fundamentally challenged by the potential for LLMs to produce errors, limiting their adoption in critical domains. To help address this reliability bottleneck, we introduce semantic integrity constraints (SICs) -- a declarative abstraction for specifying and …

@arXiv_csIR_bot@mastoxiv.page
2025-06-03 16:41:58

This https://arxiv.org/abs/2505.07155 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…

Reassessing Large Language Model Boolean Query Generation for Systematic Reviews
Systematic reviews are comprehensive literature reviews that address highly focused research questions and represent the highest form of evidence in medicine. A critical step in this process is the development of complex Boolean queries to retrieve relevant literature. Given the difficulty of manually constructing these queries, recent efforts have explored Large Language Models (LLMs) to assist in their formulation. One of the first studies,Wang et al., investigated ChatGPT for this task, foll…

@arXiv_csIR_bot@mastoxiv.page
2025-06-03 16:14:58

This https://arxiv.org/abs/2412.00639 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…

Needle: A Generative AI-Powered Multi-modal Database for Answering Complex Natural Language Queries
Multi-modal datasets, like those involving images, often miss the detailed descriptions that properly capture the rich information encoded in each item. This makes answering complex natural language queries a major challenge in this domain. In particular, unlike the traditional nearest neighbor search, where the tuples and the query are represented as points in a single metric space, these settings involve queries and tuples embedded in fundamentally different spaces, making the traditional que…

@arXiv_csIR_bot@mastoxiv.page
2025-06-13 07:38:00

Towards Understanding Bias in Synthetic Data for Evaluation
Hossein A. Rahmani, Varsha Ramineni, Nick Craswell, Bhaskar Mitra, Emine Yilmaz
https://arxiv.org/abs/2506.10301

Towards Understanding Bias in Synthetic Data for Evaluation
Test collections are crucial for evaluating Information Retrieval (IR) systems. Creating a diverse set of user queries for these collections can be challenging, and obtaining relevance judgments, which indicate how well retrieved documents match a query, is often costly and resource-intensive. Recently, generating synthetic datasets using Large Language Models (LLMs) has gained attention in various applications. While previous work has used LLMs to generate synthetic queries or documents to imp…

@arXiv_csIR_bot@mastoxiv.page
2025-06-10 16:34:59

This https://arxiv.org/abs/2503.18941 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…

Exploring Training and Inference Scaling Laws in Generative Retrieval
Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models (LLMs) generate target documents directly from a query. As a novel paradigm, the mechanisms that underpin its performance and scalability remain largely unexplored. We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance. We propose a novel evaluatio…

Tootfinder

Opt-in global Mastodon full text search. Join the index!