Tootfinder

No exact results. Similar results found.

@arXiv_csRO_bot@mastoxiv.page
2025-09-03 13:20:13

Non-conflicting Energy Minimization in Reinforcement Learning based Robot Control
Skand Peri, Akhil Perincherry, Bikram Pandit, Stefan Lee
https://arxiv.org/abs/2509.01765 https…

Non-conflicting Energy Minimization in Reinforcement Learning based Robot Control
Efficient robot control often requires balancing task performance with energy expenditure. A common approach in reinforcement learning (RL) is to penalize energy use directly as part of the reward function. This requires carefully tuning weight terms to avoid undesirable trade-offs where energy minimization harms task success. In this work, we propose a hyperparameter-free gradient optimization method to minimize energy expenditure without conflicting with task performance. Inspired by recent w…

@arXiv_csLG_bot@mastoxiv.page
2025-07-04 10:22:21

ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning
Ruiyang Zhou, Shuozhe Li, Amy Zhang, Liu Leqi
https://arxiv.org/abs/2507.02834

ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning
Recent advances in large language models have been driven by reinforcement learning (RL)-style post-training, which improves reasoning by optimizing model outputs based on reward or preference signals. GRPO-style approaches implement this by using self-generated samples labeled by an outcome-based verifier. However, these methods depend heavily on the model's initial ability to produce positive samples. They primarily refine what the model already knows (distribution sharpening) rather than ena…

@arXiv_csCV_bot@mastoxiv.page
2025-07-03 10:28:30

Self-Reinforcing Prototype Evolution with Dual-Knowledge Cooperation for Semi-Supervised Lifelong Person Re-Identification
Kunlun Xu, Fan Zhuo, Jiangmeng Li, Xu Zou, Jiahuan Zhou
https://arxiv.org/abs/2507.01884

Self-Reinforcing Prototype Evolution with Dual-Knowledge Cooperation for Semi-Supervised Lifelong Person Re-Identification
Current lifelong person re-identification (LReID) methods predominantly rely on fully labeled data streams. However, in real-world scenarios where annotation resources are limited, a vast amount of unlabeled data coexists with scarce labeled samples, leading to the Semi-Supervised LReID (Semi-LReID) problem where LReID methods suffer severe performance degradation. Existing LReID methods, even when combined with semi-supervised strategies, suffer from limited long-term adaptation performance du…

@arXiv_csCL_bot@mastoxiv.page
2025-07-03 10:13:10

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models
Chengao Li, Hanyu Zhang, Yunkun Xu, Hongyan Xue, Xiang Ao, Qing He
https://arxiv.org/abs/2507.01915

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models
Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful technique for aligning large language models (LLMs) with human preferences. However, effectively aligning LLMs with diverse human preferences remains a significant challenge, particularly when they are conflict. To address this issue, we frame human value alignment as a multi-objective optimization problem, aiming to maximize a set of potentially conflicting objectives. We introduce Gradient-Adaptive Policy Optimization…

@arXiv_csNI_bot@mastoxiv.page
2025-08-04 09:37:30

Energy Efficient Trajectory Control and Resource Allocation in Multi-UAV-assisted MEC via Deep Reinforcement Learning
Saichao Liu, Geng Sun, Chuang Zhang, Xuejie Liu, Jiacheng Wang, Changyuan Zhao, Dusit Niyato
https://arxiv.org/abs/2508.00261

Energy Efficient Trajectory Control and Resource Allocation in Multi-UAV-assisted MEC via Deep Reinforcement Learning
Mobile edge computing (MEC) is a promising technique to improve the computational capacity of smart devices (SDs) in Internet of Things (IoT). However, the performance of MEC is restricted due to its fixed location and limited service scope. Hence, we investigate an unmanned aerial vehicle (UAV)-assisted MEC system, where multiple UAVs are dispatched and each UAV can simultaneously provide computing service for multiple SDs. To improve the performance of system, we formulated a UAV-based trajec…

@arXiv_csDC_bot@mastoxiv.page
2025-07-04 07:55:01

SAKURAONE: Empowering Transparent and Open AI Platforms through Private-Sector HPC Investment in Japan
Fumikazu Konishi
https://arxiv.org/abs/2507.02124 ht…

SAKURAONE: Empowering Transparent and Open AI Platforms through Private-Sector HPC Investment in Japan
SAKURAONE is a managed high performance computing (HPC) cluster developed and operated by the SAKURA Internet Research Center. It reinforces the ``KOKARYOKU PHY'' configuration of bare-metal GPU servers and is designed as a cluster computing resource optimized for advanced workloads, including large language model (LLM) training. In the ISC 2025 edition of the TOP500 list, SAKURAONE was ranked \textbf{49th} in the world based on its High Performance Linpack (HPL) score, demonstrating its glob…

@arXiv_csMM_bot@mastoxiv.page
2025-07-04 08:44:01

VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning
Siran Chen, Boyu Chen, Chenyun Yu, Yuxiao Luo, Ouyang Yi, Lei Cheng, Chengxiang Zhuo, Zang Li, Yali Wang
https://arxiv.org/abs/2507.02626

VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning
Owing to powerful natural language processing and generative capabilities, large language model (LLM) agents have emerged as a promising solution for enhancing recommendation systems via user simulation. However, in the realm of video recommendation, existing studies predominantly resort to prompt-based simulation using frozen LLMs and encounter the intricate challenge of multimodal content understanding. This frequently results in suboptimal item modeling and user preference learning, thereby …

@arXiv_csLG_bot@mastoxiv.page
2025-09-01 09:47:22

Beyond expected value: geometric mean optimization for long-term policy performance in reinforcement learning
Xinyi Sheng, Dominik Baumann
https://arxiv.org/abs/2508.21443 https…

Beyond expected value: geometric mean optimization for long-term policy performance in reinforcement learning
Reinforcement learning (RL) algorithms typically optimize the expected cumulative reward, i.e., the expected value of the sum of scalar rewards an agent receives over the course of a trajectory. The expected value averages the performance over an infinite number of trajectories. However, when deploying the agent in the real world, this ensemble average may be uninformative for the performance of individual trajectories. Thus, in many applications, optimizing the long-term performance of individ…

@arXiv_csRO_bot@mastoxiv.page
2025-09-04 09:46:21

CTBC: Contact-Triggered Blind Climbing for Wheeled Bipedal Robots with Instruction Learning and Reinforcement Learning
Rankun Li, Hao Wang, Qi Li, Zhuo Han, Yifei Chu, Linqi Ye, Wende Xie, Wenlong Liao
https://arxiv.org/abs/2509.02986

CTBC: Contact-Triggered Blind Climbing for Wheeled Bipedal Robots with Instruction Learning and Reinforcement Learning
In recent years, wheeled bipedal robots have gained increasing attention due to their advantages in mobility, such as high-speed locomotion on flat terrain. However, their performance on complex environments (e.g., staircases) remains inferior to that of traditional legged robots. To overcome this limitation, we propose a general contact-triggered blind climbing (CTBC) framework for wheeled bipedal robots. Upon detecting wheel-obstacle contact, the robot triggers a leg-lifting motion to overcom…

@arXiv_csLG_bot@mastoxiv.page
2025-09-04 13:37:16

Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[5/5]:
- Search-Based Credit Assignment for Offline Preference-Based Reinforcement Learning
Xiancheng Gao, Yufeng Shi, Wengang Zhou, Houqiang Li

Tootfinder

Opt-in global Mastodon full text search. Join the index!