Tootfinder

@lysander07@sigmoid.social
2025-05-08 08:03:00

Next stop on our NLP timeline (as part of the #ISE2025 lecture) was Terry Winograd's SHRDLU, an early natural language understanding system developed in 1968-70 that could manipulate blocks in a virtual world.
Winograd, T. Procedures as a Representation for Data in a Computer Program for Understanding Natural Language. MIT AI Technical Report 235.

Slide from the Information Service Engineering 2025 lecture, Natural Language Processing 01, A Brief History of NLP, NLP Timeline. The picture depicts a timeline in the middle from top to bottom. There is a marker placed at 1970. Left of the timeline, a screenshot of the SHRDLU system is shown displaying a block world in simple line graphics. On the right side, the following text is displayed: SHRDLU was an early natural language understanding system developed by Terry Winograd in 1968-70 that …

@arXiv_csCR_bot@mastoxiv.page
2025-07-08 07:48:00

Unveiling Privacy Policy Complexity: An Exploratory Study Using Graph Mining, Machine Learning, and Natural Language Processing
Vijayalakshmi Ramasamy, Seth Barrett, Gokila Dorai, Jessica Zumbach
https://arxiv.org/abs/2507.02968

Unveiling Privacy Policy Complexity: An Exploratory Study Using Graph Mining, Machine Learning, and Natural Language Processing
Privacy policy documents are often lengthy, complex, and difficult for non-expert users to interpret, leading to a lack of transparency regarding the collection, processing, and sharing of personal data. As concerns over online privacy grow, it is essential to develop automated tools capable of analyzing privacy policies and identifying potential risks. In this study, we explore the potential of interactive graph visualizations to enhance user understanding of privacy policies by representing p…

@arXiv_csLG_bot@mastoxiv.page
2025-06-03 21:51:54

This https://arxiv.org/abs/2505.19433 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…

Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
Post-training compression reduces the computational and memory costs of large language models (LLMs), enabling resource-efficient deployment. However, existing compression benchmarks only focus on language modeling (e.g., perplexity) and natural language understanding tasks (e.g., GLUE accuracy), ignoring the agentic capabilities - workflow, tool use/function call, long-context understanding and real-world application. We introduce the Agent Compression Benchmark (ACBench), the first comprehens…

@arXiv_csCV_bot@mastoxiv.page
2025-07-04 10:24:31

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
Jiaer Xia, Bingkui Tong, Yuhang Zang, Rui Shao, Kaiyang Zhou
https://arxiv.org/abs/2507.02859

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in interpreting images using natural language. However, without using large-scale datasets for retraining, these models are difficult to adapt to specialized vision tasks, e.g., chart understanding. This problem is caused by a mismatch between pre-training and downstream datasets: pre-training datasets primarily concentrate on scenes and objects but contain limited information about specialized, non-object images…

@arXiv_csAI_bot@mastoxiv.page
2025-06-03 18:04:31

This https://arxiv.org/abs/2505.07453 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…

How well do LLMs reason over tabular data, really?
Large Language Models (LLMs) excel in natural language tasks, but less is known about their reasoning capabilities over tabular data. Prior analyses devise evaluation strategies that poorly reflect an LLM's realistic performance on tabular queries. Moreover, we have a limited understanding of the robustness of LLMs towards realistic variations in tabular inputs. Therefore, we ask: Can general-purpose LLMs reason over tabular data, really?, and focus on two questions 1) are tabular reasoning cap…

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:59:39

skLEP: A Slovak General Language Understanding Benchmark
Marek \v{S}uppa, Andrej Ridzik, Daniel Hl\'adek, Tom\'a\v{s} Jav\r{u}rek, Vikt\'oria Ondrejov\'a, Krist\'ina S\'asikov\'a, Martin Tamajka, Mari\'an \v{S}imko
https://arxiv.org/abs/2506.21508

skLEP: A Slovak General Language Understanding Benchmark
In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datasets tailored for Slovak and meticulously translated established English NLU resources. Within this p…

@arXiv_csSE_bot@mastoxiv.page
2025-06-03 17:04:18

This https://arxiv.org/abs/2411.03079 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…

Utilizing Precise and Complete Code Context to Guide LLM in Automatic False Positive Mitigation
Static Application Security Testing (SAST) tools are critical to software quality, identifying potential code issues early in development. However, they often produce false positive warnings that require manual review, slowing down development. Thus, automating false positive mitigation (FPM) is essential. The advent of Large Language Models (LLMs), with their strong abilities in natural language and code understanding, offers promising avenues for FPM. Yet current LLM-based FPM method faces tw…

@arXiv_astrophIM_bot@mastoxiv.page
2025-07-03 08:32:00

SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars
Xiaosheng Zhao, Yang Huang, Guirong Xue, Xiao Kong, Jifeng Liu, Xiaoyu Tang, Timothy C. Beers, Yuan-Sen Ting, A-Li Luo
https://arxiv.org/abs/2507.01939

SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars
In recent years, large language models (LLMs) have transformed natural language understanding through vast datasets and large-scale parameterization. Inspired by this success, we present SpecCLIP, a foundation model framework that extends LLM-inspired methodologies to stellar spectral analysis. Stellar spectra, akin to structured language, encode rich physical and chemical information about stars. By training foundation models on large-scale spectral datasets, our goal is to learn robust and in…

@arXiv_csMM_bot@mastoxiv.page
2025-07-04 08:44:01

VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning
Siran Chen, Boyu Chen, Chenyun Yu, Yuxiao Luo, Ouyang Yi, Lei Cheng, Chengxiang Zhuo, Zang Li, Yali Wang
https://arxiv.org/abs/2507.02626

VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning
Owing to powerful natural language processing and generative capabilities, large language model (LLM) agents have emerged as a promising solution for enhancing recommendation systems via user simulation. However, in the realm of video recommendation, existing studies predominantly resort to prompt-based simulation using frozen LLMs and encounter the intricate challenge of multimodal content understanding. This frequently results in suboptimal item modeling and user preference learning, thereby …

@arXiv_csLO_bot@mastoxiv.page
2025-06-13 07:43:10

StepProof: Step-by-step verification of natural language mathematical proofs
Xiaolin Hu, Qinghua Zhou, Bogdan Grechuk, Ivan Y. Tyukin
https://arxiv.org/abs/2506.10558

StepProof: Step-by-step verification of natural language mathematical proofs
Interactive theorem provers (ITPs) are powerful tools for the formal verification of mathematical proofs down to the axiom level. However, their lack of a natural language interface remains a significant limitation. Recent advancements in large language models (LLMs) have enhanced the understanding of natural language inputs, paving the way for autoformalization - the process of translating natural language proofs into formal proofs that can be verified. Despite these advancements, existing aut…

@arXiv_csDB_bot@mastoxiv.page
2025-05-29 07:17:06

StreamLink: Large-Language-Model Driven Distributed Data Engineering System
Dawei Feng, Di Mei, Huiri Tan, Lei Ren, Xianying Lou, Zhangxi Tan
https://arxiv.org/abs/2505.21575

StreamLink: Large-Language-Model Driven Distributed Data Engineering System
Large Language Models (LLMs) have shown remarkable proficiency in natural language understanding (NLU), opening doors for innovative applications. We introduce StreamLink - an LLM-driven distributed data system designed to improve the efficiency and accessibility of data engineering tasks. We build StreamLink on top of distributed frameworks such as Apache Spark and Hadoop to handle large data at scale. One of the important design philosophies of StreamLink is to respect user data privacy by ut…

@arXiv_csMA_bot@mastoxiv.page
2025-07-02 08:34:30

State and Memory is All You Need for Robust and Reliable AI Agents
Matthew Muhoberac, Atharva Parikh, Nirvi Vakharia, Saniya Virani, Aco Radujevic, Savannah Wood, Meghav Verma, Dimitri Metaxotos, Jeyaraman Soundararajan, Thierry Masquelin, Alexander G. Godfrey, Sean Gardner, Dobrila Rudnicki, Sam Michael, Gaurav Chopra
https://a…

State and Memory is All You Need for Robust and Reliable AI Agents
Large language models (LLMs) have enabled powerful advances in natural language understanding and generation. Yet their application to complex, real-world scientific workflows remain limited by challenges in memory, planning, and tool integration. Here, we introduce SciBORG (Scientific Bespoke Artificial Intelligence Agents Optimized for Research Goals), a modular agentic framework that allows LLM-based agents to autonomously plan, reason, and achieve robust and reliable domain-specific task ex…

@arXiv_qbioNC_bot@mastoxiv.page
2025-06-24 09:19:19

Challenges in Grounding Language in the Real World
Peter Lindes, Kaoutar Skiker
https://arxiv.org/abs/2506.17375 https://arxiv.org/pd…

Challenges in Grounding Language in the Real World
A long-term goal of Artificial Intelligence is to build a language understanding system that allows a human to collaborate with a physical robot using language that is natural to the human. In this paper we highlight some of the challenges in doing this, and propose a solution that integrates the abilities of a cognitive agent capable of interactive task learning in a physical robot with the linguistic abilities of a large language model. We also point the way to an initial implementation of th…

@arXiv_csHC_bot@mastoxiv.page
2025-06-23 11:40:20

Capturing Visualization Design Rationale
Maeve Hutchinson, Radu Jianu, Aidan Slingsby, Jo Wood, Pranava Madhyastha
https://arxiv.org/abs/2506.16571 https:/…

Capturing Visualization Design Rationale
Prior natural language datasets for data visualization have focused on tasks such as visualization literacy assessment, insight generation, and visualization generation from natural language instructions. These studies often rely on controlled setups with purpose-built visualizations and artificially constructed questions. As a result, they tend to prioritize the interpretation of visualizations, focusing on decoding visualizations rather than understanding their encoding. In this paper, we pre…

@arXiv_econGN_bot@mastoxiv.page
2025-06-03 16:34:48

This https://arxiv.org/abs/2504.15448 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_eco…

Visualizing Public Opinion on X: A Real-Time Sentiment Dashboard Using VADER and DistilBERT
In the age of social media, understanding public sentiment toward major corporations is crucial for investors, policymakers, and researchers. This paper presents a comprehensive sentiment analysis system tailored for corporate reputation monitoring, combining Natural Language Processing (NLP) and machine learning techniques to accurately interpret public opinion in real time. The methodology integrates a hybrid sentiment detection framework leveraging both rule-based models (VADER) and transfor…

@arXiv_csRO_bot@mastoxiv.page
2025-06-13 07:53:30

A Navigation Framework Utilizing Vision-Language Models
Yicheng Duan, Kaiyu tang
https://arxiv.org/abs/2506.10172 https://arxiv.org/p…

A Navigation Framework Utilizing Vision-Language Models
Vision-and-Language Navigation (VLN) presents a complex challenge in embodied AI, requiring agents to interpret natural language instructions and navigate through visually rich, unfamiliar environments. Recent advances in large vision-language models (LVLMs), such as CLIP and Flamingo, have significantly improved multimodal understanding but introduced new challenges related to computational cost and real-time deployment. In this project, we propose a modular, plug-and-play navigation framework…

@arXiv_csCR_bot@mastoxiv.page
2025-06-26 09:45:50

SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models
Dipayan Saha, Shams Tarek, Hasan Al Shaikh, Khan Thamid Hasan, Pavan Sai Nalluri, Md. Ajoad Hasan, Nashmin Alam, Jingbo Zhou, Sujan Kumar Saha, Mark Tehranipoor, Farimah Farahmandi
https://arxiv.org/abs/2506.20415…

SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models
Ensuring the security of complex system-on-chips (SoCs) designs is a critical imperative, yet traditional verification techniques struggle to keep pace due to significant challenges in automation, scalability, comprehensiveness, and adaptability. The advent of large language models (LLMs), with their remarkable capabilities in natural language understanding, code generation, and advanced reasoning, presents a new paradigm for tackling these issues. Moving beyond monolithic models, an agentic ap…

@arXiv_csIR_bot@mastoxiv.page
2025-06-23 09:44:00

eSapiens: A Real-World NLP Framework for Multimodal Document Understanding and Enterprise Knowledge Processing
Isaac Shi, Zeyuan Li, Wenli Wang, Lewei He, Yang Yang, Tianyu Shi
https://arxiv.org/abs/2506.16768

eSapiens: A Real-World NLP Framework for Multimodal Document Understanding and Enterprise Knowledge Processing
We introduce eSapiens, a unified question-answering system designed for enterprise settings, which bridges structured databases and unstructured textual corpora via a dual-module architecture. The system combines a Text-to-SQL planner with a hybrid Retrieval-Augmented Generation (RAG) pipeline, enabling natural language access to both relational data and free-form documents. To enhance answer faithfulness, the RAG module integrates dense and sparse retrieval, commercial reranking, and a citatio…

@arXiv_csSE_bot@mastoxiv.page
2025-06-24 12:06:40

Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories
Islem Bouzenia, Michael Pradel
https://arxiv.org/abs/2506.18824 ht…

Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories
Large Language Model (LLM)-based agents are increasingly employed to automate complex software engineering tasks such as program repair and issue resolution. These agents operate by autonomously generating natural language thoughts, invoking external tools, and iteratively refining their solutions. Despite their widespread adoption, the internal decision-making processes of these agents remain largely unexplored, limiting our understanding of their operational dynamics and failure modes. In thi…

@arXiv_csAR_bot@mastoxiv.page
2025-06-23 08:05:39

DeepRTL2: A Versatile Model for RTL-Related Tasks
Yi Liu, Hongji Zhang, Yunhao Zhou, Zhengyuan Shi, Changran Xu, Qiang Xu
https://arxiv.org/abs/2506.15697 …

DeepRTL2: A Versatile Model for RTL-Related Tasks
The integration of large language models (LLMs) into electronic design automation (EDA) has significantly advanced the field, offering transformative benefits, particularly in register transfer level (RTL) code generation and understanding. While previous studies have demonstrated the efficacy of fine-tuning LLMs for these generation-based tasks, embedding-based tasks, which are equally critical to EDA workflows, have been largely overlooked. These tasks, including natural language code search,…

@arXiv_csDB_bot@mastoxiv.page
2025-06-17 09:28:39

Datrics Text2SQL: A Framework for Natural Language to SQL Query Generation
Tetiana Gladkykh, Kyrylo Kirykov
https://arxiv.org/abs/2506.12234 https://

Datrics Text2SQL: A Framework for Natural Language to SQL Query Generation
Text-to-SQL systems enable users to query databases using natural language, democratizing access to data analytics. However, they face challenges in understanding ambiguous phrasing, domain-specific vocabulary, and complex schema relationships. This paper introduces Datrics Text2SQL, a Retrieval-Augmented Generation (RAG)-based framework designed to generate accurate SQL queries by leveraging structured documentation, example-based learning, and domain-specific rules. The system builds a rich K…

@arXiv_csRO_bot@mastoxiv.page
2025-06-23 11:50:30

CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity
Guang Yin, Yitong Li, Yixuan Wang, Dale McConachie, Paarth Shah, Kunimatsu Hashimoto, Huan Zhang, Katherine Liu, Yunzhu Li
https://arxiv.org/abs/2506.16652

CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity
Natural language instructions for robotic manipulation tasks often exhibit ambiguity and vagueness. For instance, the instruction "Hang a mug on the mug tree" may involve multiple valid actions if there are several mugs and branches to choose from. Existing language-conditioned policies typically rely on end-to-end models that jointly handle high-level semantic understanding and low-level action generation, which can result in suboptimal performance due to their lack of modularity and interpret…

@arXiv_csDC_bot@mastoxiv.page
2025-06-12 07:25:31

EdgeProfiler: A Fast Profiling Framework for Lightweight LLMs on Edge Using Analytical Model
Alyssa Pinnock, Shakya Jayakody, Kawsher A Roxy, Md Rubel Ahmed
https://arxiv.org/abs/2506.09061

EdgeProfiler: A Fast Profiling Framework for Lightweight LLMs on Edge Using Analytical Model
This paper introduces EdgeProfiler, a fast profiling framework designed for evaluating lightweight Large Language Models (LLMs) on edge systems. While LLMs offer remarkable capabilities in natural language understanding and generation, their high computational, memory, and power requirements often confine them to cloud environments. EdgeProfiler addresses these challenges by providing a systematic methodology for assessing LLM performance in resource-constrained edge settings. The framework pro…

@arXiv_physicsgeoph_bot@mastoxiv.page
2025-05-28 07:34:40

SeisCoDE: 3D Seismic Interpretation Foundation Model with Contrastive Self-Distillation Learning
Goodluck Archibong, Ardiansyah Koeshidayatullah, Umair Waheed, Weichang Li, Dicky Harishidayat, Motaz Alfarraj
https://arxiv.org/abs/2505.20518

SeisCoDE: 3D Seismic Interpretation Foundation Model with Contrastive Self-Distillation Learning
Seismic interpretation is vital for understanding subsurface structures but remains labor-intensive, subjective, and computationally demanding. While deep learning (DL) offers promise, its success hinges on large, high-quality datasets, often scarce in geophysics. Foundation Models (FMs), which have shown significant success in fields like natural language processing and computer vision, offer a transformative opportunity for seismic interpretation by enabling knowledge transfer and generalizat…

@arXiv_csCL_bot@mastoxiv.page
2025-06-23 08:16:40

Rethinking LLM Training through Information Geometry and Quantum Metrics
Riccardo Di Sipio
https://arxiv.org/abs/2506.15830 https://a…

Rethinking LLM Training through Information Geometry and Quantum Metrics
Optimization in large language models (LLMs) unfolds over high-dimensional parameter spaces with non-Euclidean structure. Information geometry frames this landscape using the Fisher information metric, enabling more principled learning via natural gradient descent. Though often impractical, this geometric lens clarifies phenomena such as sharp minima, generalization, and observed scaling laws. We argue that curvature-aware approaches deepen our understanding of LLM training. Finally, we specula…

@lysander07@sigmoid.social
2025-05-13 16:25:32

Last week, our students learned how to conduct a proper evaluation for an NLP experiment. To this end, we introduced a small textcorpus with sentences about Joseph Fourier, who counts as one of the discoverers of the greenhouse effect, responsible for global warming.

Tootfinder

Opt-in global Mastodon full text search. Join the index!