Tootfinder

No exact results. Similar results found.

@arXiv_csLG_bot@mastoxiv.page
2025-08-12 12:07:03

Stackelberg Coupling of Online Representation Learning and Reinforcement Learning
Fernando Martinez, Tao Li, Yingdong Lu, Juntao Chen
https://arxiv.org/abs/2508.07452 https://…

Stackelberg Coupling of Online Representation Learning and Reinforcement Learning
Integrated, end-to-end learning of representations and policies remains a cornerstone of deep reinforcement learning (RL). However, to address the challenge of learning effective features from a sparse reward signal, recent trends have shifted towards adding complex auxiliary objectives or fully decoupling the two processes, often at the cost of increased design complexity. This work proposes an alternative to both decoupling and naive end-to-end learning, arguing that performance can be signif…

@muz4now@mastodon.world
2025-08-13 15:16:02

Guru Puppy – I’m Learning From My Dog #puppy https://muz4now.com/2022/guru-puppy-im-learning-from-my-dog

Guru Puppy - I'm Learning From My Dog
Stan Stewart - @muz4now

@arXiv_csCL_bot@mastoxiv.page
2025-09-12 09:53:49

DeMeVa at LeWiDi-2025: Modeling Perspectives with In-Context Learning and Label Distribution Learning
Daniil Ignatev, Nan Li, Hugh Mee Wong, Anh Dang, Shane Kaszefski Yaschuk
https://arxiv.org/abs/2509.09524

DeMeVa at LeWiDi-2025: Modeling Perspectives with In-Context Learning and Label Distribution Learning
This system paper presents the DeMeVa team's approaches to the third edition of the Learning with Disagreements shared task (LeWiDi 2025; Leonardelli et al., 2025). We explore two directions: in-context learning (ICL) with large language models, where we compare example sampling strategies; and label distribution learning (LDL) methods with RoBERTa (Liu et al., 2019b), where we evaluate several fine-tuning methods. Our contributions are twofold: (1) we show that ICL can effectively predict anno…

@arXiv_csAI_bot@mastoxiv.page
2025-09-12 08:14:09

Anti-Money Laundering Machine Learning Pipelines; A Technical Analysis on Identifying High-risk Bank Clients with Supervised Learning
Khashayar Namdar, Pin-Chien Wang, Tushar Raju, Steven Zheng, Fiona Li, Safwat Tahmin Khan
https://arxiv.org/abs/2509.09127

Anti-Money Laundering Machine Learning Pipelines; A Technical Analysis on Identifying High-risk Bank Clients with Supervised Learning
Anti-money laundering (AML) actions and measurements are among the priorities of financial institutions, for which machine learning (ML) has shown to have a high potential. In this paper, we propose a comprehensive and systematic approach for developing ML pipelines to identify high-risk bank clients in a dataset curated for Task 1 of the University of Toronto 2023-2024 Institute for Management and Innovation (IMI) Big Data and Artificial Intelligence Competition. The dataset included 195,789 c…

@arXiv_csLG_bot@mastoxiv.page
2025-09-12 09:53:39

Quantum Machine Learning, Quantitative Trading, Reinforcement Learning, Deep Learning
Jun-Hao Chen, Yu-Chien Huang, Yun-Cheng Tsai, Samuel Yen-Chi Chen
https://arxiv.org/abs/2509.09176

Quantum Machine Learning, Quantitative Trading, Reinforcement Learning, Deep Learning
The convergence of quantum-inspired neural networks and deep reinforcement learning offers a promising avenue for financial trading. We implemented a trading agent for USD/TWD by integrating Quantum Long Short-Term Memory (QLSTM) for short-term trend prediction with Quantum Asynchronous Advantage Actor-Critic (QA3C), a quantum-enhanced variant of the classical A3C. Trained on data from 2000-01-01 to 2025-04-30 (80\% training, 20\% testing), the long-only agent achieves 11.87\% return over aroun…

@arXiv_csAI_bot@mastoxiv.page
2025-08-14 07:38:52

MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement
Weitao Jia, Jinghui Lu, Haiyang Yu, Siqi Wang, Guozhi Tang, An-Lan Wang, Weijie Yin, Dingkang Yang, Yuxiang Nie, Bin Shan, Hao Feng, Irene Li, Kun Yang, Han Wang, Jingqun Tang, Teng Fu, Changhong Jin, Chao Feng, Xiaohui Lv, Can Huang
https://arxiv.org/abs/2508.09670…

MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement
Recent advances demonstrate that reinforcement learning with verifiable rewards (RLVR) significantly enhances the reasoning capabilities of large language models (LLMs). However, standard RLVR faces challenges with reward sparsity, where zero rewards from consistently incorrect candidate answers provide no learning signal, particularly in challenging tasks. To address this, we propose Multi-Expert Mutual Learning GRPO (MEML-GRPO), an innovative framework that utilizes diverse expert prompts as …

@arXiv_csLG_bot@mastoxiv.page
2025-08-12 11:45:33

Pref-GUIDE: Continual Policy Learning from Real-Time Human Feedback via Preference-Based Learning
Zhengran Ji, Boyuan Chen
https://arxiv.org/abs/2508.07126 https://

Pref-GUIDE: Continual Policy Learning from Real-Time Human Feedback via Preference-Based Learning
Training reinforcement learning agents with human feedback is crucial when task objectives are difficult to specify through dense reward functions. While prior methods rely on offline trajectory comparisons to elicit human preferences, such data is unavailable in online learning scenarios where agents must adapt on the fly. Recent approaches address this by collecting real-time scalar feedback to guide agent behavior and train reward models for continued learning after human feedback becomes un…

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 10:44:00

What Do Temporal Graph Learning Models Learn?
Abigail J. Hayes, Tobias Schumacher, Markus Strohmaier
https://arxiv.org/abs/2510.09416 https://arxiv.org/pdf…

What Do Temporal Graph Learning Models Learn?
Learning on temporal graphs has become a central topic in graph representation learning, with numerous benchmarks indicating the strong performance of state-of-the-art models. However, recent work has raised concerns about the reliability of benchmark results, noting issues with commonly used evaluation protocols and the surprising competitiveness of simple heuristics. This contrast raises the question of which properties of the underlying graphs temporal graph learning models actually use to f…

@arXiv_csLG_bot@mastoxiv.page
2025-09-12 10:08:39

Cough Classification using Few-Shot Learning
Yoga Disha Sendhil Kumar, Manas V Shetty, Sudip Vhaduri
https://arxiv.org/abs/2509.09515 https://arxiv.org/pdf…

Cough Classification using Few-Shot Learning
This paper investigates the effectiveness of few-shot learning for respiratory sound classification, focusing on coughbased detection of COVID-19, Flu, and healthy conditions. We leverage Prototypical Networks with spectrogram representations of cough sounds to address the challenge of limited labeled data. Our study evaluates whether few-shot learning can enable models to achieve performance comparable to traditional deep learning approaches while using significantly fewer training samples. Ad…

@arXiv_csLG_bot@mastoxiv.page
2025-09-12 10:08:19

PIPES: A Meta-dataset of Machine Learning Pipelines
Cynthia Moreira Maia, Lucas B. V. de Amorim, George D. C. Cavalcanti, Rafael M. O. Cruz
https://arxiv.org/abs/2509.09512 http…

PIPES: A Meta-dataset of Machine Learning Pipelines
Solutions to the Algorithm Selection Problem (ASP) in machine learning face the challenge of high computational costs associated with evaluating various algorithms' performances on a given dataset. To mitigate this cost, the meta-learning field can leverage previously executed experiments shared in online repositories such as OpenML. OpenML provides an extensive collection of machine learning experiments. However, an analysis of OpenML's records reveals limitations. It lacks diversity in pipeli…

Tootfinder

Opt-in global Mastodon full text search. Join the index!