Tootfinder

No exact results. Similar results found.

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:20:07

MaXIFE: Multilingual and Cross-lingual Instruction Following Evaluation
Yile Liu, Ziwei Ma, Xiu Jiang, Jinglu Hu, Jing Chang, Liang Li
https://arxiv.org/abs/2506.01776

MaXIFE: Multilingual and Cross-lingual Instruction Following Evaluation
With the rapid adoption of large language models (LLMs) in natural language processing, the ability to follow instructions has emerged as a key metric for evaluating their practical utility. However, existing evaluation methods often focus on single-language scenarios, overlooking the challenges and differences present in multilingual and cross-lingual contexts. To address this gap, we introduce MaXIFE: a comprehensive evaluation benchmark designed to assess instruction-following capabilities a…

@arXiv_csNI_bot@mastoxiv.page
2025-06-06 07:19:59

A Framework Leveraging Large Language Models for Autonomous UAV Control in Flying Networks
Diana Nunes, Ricardo Amorim, Pedro Ribeiro, Andr\'e Coelho, Rui Campos
https://arxiv.org/abs/2506.04404

A Framework Leveraging Large Language Models for Autonomous UAV Control in Flying Networks
This paper proposes FLUC, a modular framework that integrates open-source Large Language Models (LLMs) with Unmanned Aerial Vehicle (UAV) autopilot systems to enable autonomous control in Flying Networks (FNs). FLUC translates high-level natural language commands into executable UAV mission code, bridging the gap between operator intent and UAV behaviour. FLUC is evaluated using three open-source LLMs - Qwen 2.5, Gemma 2, and LLaMA 3.2 - across scenarios involving code generation and mission …

@arXiv_csMA_bot@mastoxiv.page
2025-06-06 07:19:22

Spore in the Wild: Case Study on Spore.fun, a Real-World Experiment of Sovereign Agent Open-ended Evolution on Blockchain with TEEs
Botao Amber Hu, Helena Rong
https://arxiv.org/abs/2506.04236

Spore in the Wild: Case Study on Spore.fun, a Real-World Experiment of Sovereign Agent Open-ended Evolution on Blockchain with TEEs
In Artificial Life (ALife) research, replicating Open-Ended Evolution (OEE)-the continuous emergence of novelty observed in biological life-has traditionally been pursued within isolated closed system simulations, such as Tierra and Avida, which have typically plateaued after an initial burst of novelty, failing to achieve sustained OEE. Scholars suggest that OEE requires an "open" system that continually exchanges information or energy with its environment. A recent technological innovation in…

@arXiv_csCR_bot@mastoxiv.page
2025-06-04 07:28:30

CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale
Zhun Wang, Tianneng Shi, Jingxuan He, Matthew Cai, Jialin Zhang, Dawn Song
https://arxiv.org/abs/2506.02548

CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously. Thoroughly assessing their cybersecurity capabilities is critical and urgent, given the high stakes in this domain. However, existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope. To address this gap, we introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities fo…

@arXiv_hepph_bot@mastoxiv.page
2025-06-03 17:53:42

This https://arxiv.org/abs/2501.16423 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_hepp…

Theory of neutrino slow flavor evolution. Part II. Space-time evolution of linear instabilities
Slow flavor evolution (defined as driven by neutrino masses and not necessarily ``slow'') is receiving fresh attention in the context of compact astrophysical environments. In Part~I of this series, we have studied the slow-mode dispersion relation following our recently developed analogy to plasma waves. The concept of resonance between flavor waves in the linear regime and propagating neutrinos is the defining feature of this approach. It is best motivated for weak instabilities, which probab…

@hanno@mastodon.social
2025-07-03 09:25:56

Do we do #ICanHazPDF on Mastodon? https://saemobilus.sae.org/papers/demonstration-a-dme-dimethyl-ether-fuelled-city-bus-2000-01-2005

2000-01-2005: Demonstration of a DME (Dimethyl Ether) Fuelled City Bus - Technical Paper

The aim of the project was to demonstrate and evaluate the feasibility of DME (Dimethyl Ether) fuelled buses, through laboratory and field tests.

The performance and emission targets of the HD DME engine has been successfully demonstrated and the bus has been converted to accommodate the DME engine and the fuel tank system. Two DME filling stations have been build. Additives for DME lubrication and odor have been selected. A …

@arXiv_csAI_bot@mastoxiv.page
2025-06-03 17:57:30

This https://arxiv.org/abs/2503.07792 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…

Efficient Neural Clause-Selection Reinforcement
Clause selection is arguably the most important choice point in saturation-based theorem proving. Framing it as a reinforcement learning (RL) task is a way to challenge the human-designed heuristics of state-of-the-art provers and to instead automatically evolve -- just from prover experiences -- their potentially optimal replacement. In this work, we present a neural network architecture for scoring clauses for clause selection that is powerful yet efficient to evaluate. Following RL principle…

@arXiv_condmatstrel_bot@mastoxiv.page
2025-06-05 07:31:15

Violation of Luttinger's theorem in one-dimensional interacting fermions
Meng Gao, Yin Zhong
https://arxiv.org/abs/2506.04064 https://

Violation of Luttinger's theorem in one-dimensional interacting fermions
Using the density matrix renormalization group method, we systematically investigate the evolution of the Luttinger integral in the one-dimensional generalized t-V model as a function of filling and interaction strength, identifying three representative phases. In the weak-coupling regime, the zero-frequency Green's function shows a branch-cut structure at the Fermi momentum, and the Luttinger integral accurately reflects the particle density. As the interaction increases, the spectral weight n…

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:20:56

RewardBench 2: Advancing Reward Model Evaluation
Saumya Malik, Valentina Pyatkin, Sander Land, Jacob Morrison, Noah A. Smith, Hannaneh Hajishirzi, Nathan Lambert
https://arxiv.org/abs/2506.01937

RewardBench 2: Advancing Reward Model Evaluation
Reward models are used throughout the post-training of language models to capture nuanced signals from preference data and provide a training target for optimization across instruction following, reasoning, safety, and more domains. The community has begun establishing best practices for evaluating reward models, from the development of benchmarks that test capabilities in specific skill areas to others that test agreement with human preferences. At the same time, progress in evaluation has not…

@arXiv_csMA_bot@mastoxiv.page
2025-06-06 07:19:51

CPU-Based Layout Design for Picker-to-Parts Pallet Warehouses
Timo Looms, Lin Xie
https://arxiv.org/abs/2506.04266 https://arxiv.org/…

CPU-Based Layout Design for Picker-to-Parts Pallet Warehouses
Picker-to-parts pallet warehouses often face inefficiencies due to conventional layouts causing excessive travel distances and high labor requirements. This study introduces a novel layout design inspired by CPU architecture, partitioning warehouse space into specialized zones, namely Performance (P), Efficiency (E), and Shared (S). Discrete-event simulation is used to evaluate this design against traditional rectangular (random and ABC storage) and Flying-V layouts. Results demonstrate signifi…

Tootfinder

Opt-in global Mastodon full text search. Join the index!