Tootfinder

@arXiv_csCL_bot@mastoxiv.page
2025-06-30 10:21:30

Why Are Parsing Actions for Understanding Message Hierarchies Not Random?
Daichi Kato, Ryo Ueda, Yusuke Miyao
https://arxiv.org/abs/2506.22366 https://

Why Are Parsing Actions for Understanding Message Hierarchies Not Random?
If humans understood language by randomly selecting parsing actions, it might have been necessary to construct a robust symbolic system capable of being interpreted under any hierarchical structure. However, human parsing strategies do not seem to follow such a random pattern. Why is that the case? In fact, a previous study on emergent communication using models with hierarchical biases have reported that agents adopting random parsing strategies$\unicode{x2013}$ones that deviate significantly …

@arXiv_csSE_bot@mastoxiv.page
2025-06-25 09:35:30

Lost in Translation? Converting RegExes for Log Parsing into Dynatrace Pattern Language
Julian Fragner, Christian Macho, Bernhard Dieber, Martin Pinzger
https://arxiv.org/abs/2506.19539

Lost in Translation? Converting RegExes for Log Parsing into Dynatrace Pattern Language
Log files provide valuable information for detecting and diagnosing problems in enterprise software applications and data centers. Several log analytics tools and platforms were developed to help filter and extract information from logs, typically using regular expressions (RegExes). Recent commercial log analytics platforms provide domain-specific languages specifically designed for log parsing, such as Grok or the Dynatrace Pattern Language (DPL). However, users who want to migrate to these p…

@arXiv_hepth_bot@mastoxiv.page
2025-06-26 09:50:30

The Phases of Chaos
Tarek Anous, Diego M. Hofman
https://arxiv.org/abs/2506.20542 https://arxiv.org/pdf/2506.20542

The Phases of Chaos
We develop a novel physical picture to understand certain universal properties of the GUE matrix model which are typically ascribed to quantum chaos, i.e. the ramp and the plateau. We argue that these features should instead be associated with a pattern of spontaneous (or weak explicit) symmetry breaking. In this language, the GUE matrix model corresponds to an effective theory that describes the symmetry-broken phase, and where the Hermitian matrix of the GUE should be understood as a massive …

@arXiv_csFL_bot@mastoxiv.page
2025-05-27 13:29:02

This https://arxiv.org/abs/2405.07671 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csFL_…

Constructing a BPE Tokenization DFA
Many natural language processing systems operate over tokenizations of text to address the open-vocabulary problem. In this paper, we give and analyze an algorithm for the efficient construction of deterministic finite automata (DFA) designed to operate directly on tokenizations produced by the popular byte pair encoding (BPE) technique. This makes it possible to apply many existing techniques and algorithms to the tokenized case, such as pattern matching, equivalence checking of tokenization d…

@arXiv_csCY_bot@mastoxiv.page
2025-06-25 08:47:10

LLM-Based Social Simulations Require a Boundary
Zengqing Wu, Run Peng, Takayuki Ito, Chuan Xiao
https://arxiv.org/abs/2506.19806 https://

LLM-Based Social Simulations Require a Boundary
This position paper argues that large language model (LLM)-based social simulations should establish clear boundaries to meaningfully contribute to social science research. While LLMs offer promising capabilities for modeling human-like agents compared to traditional agent-based modeling, they face fundamental limitations that constrain their reliability for social pattern discovery. The core issue lies in LLMs' tendency towards an ``average persona'' that lacks sufficient behavioral heterogene…

@arXiv_csCV_bot@mastoxiv.page
2025-06-19 14:35:59

Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/4]:
- Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
Chuwei Luo, Guozhi Tang, Qi Zheng, Cong Yao, Lianwen Jin, Chenliang Li, Yang Xue, Luo Si

@stsquad@mastodon.org.uk
2025-06-17 20:28:33

This is everything I could never get out of #sonicpi:

Strudel REPL
Strudel is a music live coding environment for the browser, porting the TidalCycles pattern language to JavaScript.

@arXiv_csSE_bot@mastoxiv.page
2025-06-05 07:23:17

Multi-Language Detection of Design Pattern Instances
Hugo Andrade, Jo\~ao Bispo, Filipe F. Correia
https://arxiv.org/abs/2506.03903 https://

Multi-Language Detection of Design Pattern Instances
Code comprehension is often supported by source code analysis tools which provide more abstract views over software systems, such as those detecting design patterns. These tools encompass analysis of source code and ensuing extraction of relevant information. However, the analysis of the source code is often specific to the target programming language. We propose DP-LARA, a multi-language pattern detection tool that uses the multi-language capability of the LARA framework to support finding p…

@arXiv_csCR_bot@mastoxiv.page
2025-06-16 07:22:59

Bhatt Conjectures: On Necessary-But-Not-Sufficient Benchmark Tautology for Human Like Reasoning
Manish Bhatt
https://arxiv.org/abs/2506.11423 https://

Bhatt Conjectures: On Necessary-But-Not-Sufficient Benchmark Tautology for Human Like Reasoning
Debates about whether Large Language or Reasoning Models (LLMs/LRMs) truly reason or merely pattern-match suffer from shifting goal posts. In my personal opinion, two analytic--hence "tautological"--benchmarks cut through that fog in my mental model. In this paper, I attempt to write down my mental model in concrete terms.

@arXiv_csCV_bot@mastoxiv.page
2025-06-17 18:53:11

Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[5/5]:
Imagine, Verify, Execute: Memory-Guided Agentic Exploration with Vision-Language Models

@arXiv_csCE_bot@mastoxiv.page
2025-06-12 07:18:49

Superstudent intelligence in thermodynamics
Rebecca Loubet, Pascal Zittlau, Marco Hoffmann, Luisa Vollmer, Sophie Fellenz, Heike Leitte, Fabian Jirasek, Johannes Lenhard, Hans Hasse
https://arxiv.org/abs/2506.09822

Superstudent intelligence in thermodynamics
In this short note, we report and analyze a striking event: OpenAI's large language model o3 has outwitted all students in a university exam on thermodynamics. The thermodynamics exam is a difficult hurdle for most students, where they must show that they have mastered the fundamentals of this important topic. Consequently, the failure rates are very high, A-grades are rare - and they are considered proof of the students' exceptional intellectual abilities. This is because pattern learning does…

@arXiv_csSE_bot@mastoxiv.page
2025-06-06 09:40:38

This https://arxiv.org/abs/2506.03903 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…

@arXiv_csDB_bot@mastoxiv.page
2025-06-04 13:32:54

This https://arxiv.org/abs/2505.19988 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csDB_…

Automatic Metadata Extraction for Text-to-SQL
Large Language Models (LLMs) have recently become sophisticated enough to automate many tasks ranging from pattern finding to writing assistance to code generation. In this paper, we examine text-to-SQL generation. We have observed from decades of experience that the most difficult part of query development lies in understanding the database contents. These experiences inform the direction of our research. Text-to-SQL benchmarks such as SPIDER and Bird contain extensive metadata that is gener…

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:19:57

Benford's Curse: Tracing Digit Bias to Numerical Hallucination in LLMs
Jiandong Shao, Yao Lu, Jianfei Yang
https://arxiv.org/abs/2506.01734 https://

Benford's Curse: Tracing Digit Bias to Numerical Hallucination in LLMs
Large Language Models (LLMs) exhibit impressive performance on complex reasoning tasks, yet they frequently fail on basic numerical problems, producing incorrect outputs. Inspired by Benford's Law -- a statistical pattern where lower digits occur more frequently as leading digits -- we hypothesize that the long-tailed digit distributions in web-collected corpora may be learned by LLMs during pretraining, leading to biased numerical generation. To investigate the hypothesis, we first examine whe…

@arXiv_csSE_bot@mastoxiv.page
2025-06-09 07:53:42

Survey of LLM Agent Communication with MCP: A Software Design Pattern Centric Review
Anjana Sarkar, Soumyendu Sarkar
https://arxiv.org/abs/2506.05364 https…

Survey of LLM Agent Communication with MCP: A Software Design Pattern Centric Review
This survey investigates how classical software design patterns can enhance the reliability and scalability of communication in Large Language Model (LLM)-driven agentic AI systems, focusing particularly on the Model Context Protocol (MCP). It examines the foundational architectures of LLM-based agents and their evolution from isolated operation to sophisticated, multi-agent collaboration, addressing key communication hurdles that arise in this transition. The study revisits well-established pa…

Tootfinder

Opt-in global Mastodon full text search. Join the index!