Tootfinder

No exact results. Similar results found.

@arXiv_csHC_bot@mastoxiv.page
2025-09-15 09:37:11

The Language of Approval: Identifying the Drivers of Positive Feedback Online
Agam Goyal, Charlotte Lambert, Eshwar Chandrasekharan
https://arxiv.org/abs/2509.10370 https://

The Language of Approval: Identifying the Drivers of Positive Feedback Online
Positive feedback via likes and awards is central to online governance, yet which attributes of users' posts elicit rewards -- and how these vary across authors and communities -- remains unclear. To examine this, we combine quasi-experimental causal inference with predictive modeling on 11M posts from 100 subreddits. We identify linguistic patterns and stylistic attributes causally linked to rewards, controlling for author reputation, timing, and community context. For example, overtly complic…

@Mediagazer@mstdn.social
2025-11-13 22:31:05

Sources: the first-round bid deadline for WBD is November 20; Paramount wants the full company, and Comcast and Netflix are eying movie/TV studios and HBO Max (Wall Street Journal)
https://www.wsj.com/business/media/paramou

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:33:30

Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Markov Likelihood
Xingyu Lin, Yilin Wen, En Wang, Du Su, Wenbin Liu, Chenfu Bao, Zhonghou Lv
https://arxiv.org/abs/2510.09369

Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Markov Likelihood
Group Relative Policy Optimization (GRPO) has significantly advanced the reasoning ability of large language models (LLMs), particularly by boosting their mathematical performance. However, GRPO and related entropy-regularization methods still face challenges rooted in the sparse token rewards inherent to chain-of-thought (CoT). Current approaches often rely on undifferentiated token-level entropy adjustments, which frequently lead to entropy collapse or model collapse. In this work, we propose…

@arXiv_csIR_bot@mastoxiv.page
2025-10-15 08:48:12

Reinforced Preference Optimization for Recommendation
Junfei Tan, Yuxin Chen, An Zhang, Junguang Jiang, Bin Liu, Ziru Xu, Han Zhu, Jian Xu, Bo Zheng, Xiang Wang
https://arxiv.org/abs/2510.12211

Reinforced Preference Optimization for Recommendation
Recent breakthroughs in large language models (LLMs) have fundamentally shifted recommender systems from discriminative to generative paradigms, where user behavior modeling is achieved by generating target items conditioned on historical interactions. Yet current generative recommenders still suffer from two core limitations: the lack of high-quality negative modeling and the reliance on implicit rewards. Reinforcement learning with verifiable rewards (RLVR) offers a natural solution by enabli…

@arXiv_csSI_bot@mastoxiv.page
2025-09-15 08:31:21

TikTok Rewards Divisive Political Messaging During the 2025 German Federal Election
Kirill Solovev, Chiara Drolsbach, Emma Demirel, Nicolas Pr\"ollochs
https://arxiv.org/abs/2509.10336

TikTok Rewards Divisive Political Messaging During the 2025 German Federal Election
Short-form video platforms like TikTok reshape how politicians communicate and have become important tools for electoral campaigning. Yet it remains unclear what kinds of political messages gain traction in these fast-paced, algorithmically curated environments, which are particularly popular among younger audiences. In this study, we use computational content analysis to analyze a comprehensive dataset of N=25,292 TikTok videos posted by German politicians in the run-up to the 2025 German fede…

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 10:46:40

BaNEL: Exploration Posteriors for Generative Modeling Using Only Negative Rewards
Sangyun Lee, Brandon Amos, Giulia Fanti
https://arxiv.org/abs/2510.09596 https://

BaNEL: Exploration Posteriors for Generative Modeling Using Only Negative Rewards
Today's generative models thrive with large amounts of supervised data and informative reward functions characterizing the quality of the generation. They work under the assumptions that the supervised data provides knowledge to pre-train the model, and the reward function provides dense information about how to further improve the generation quality and correctness. However, in the hardest instances of important problems, two problems arise: (1) the base generative model attains a near-zero re…

@Techmeme@techhub.social
2025-11-10 23:20:45

The US Treasury and IRS issue guidance allowing crypto products to offer staking rewards under a new safe harbor (Sander Lutz/Decrypt)
https://decrypt.co/348044/ethereum-solana-etfs-green-light-staking-us-treasury-irs-guidance

Ethereum, Solana ETFs Get Green Light for Staking via US Treasury, IRS Crypto Fund Guidance
The IRS and Treasury said trusts can now generate staking rewards for crypto ETF investors without fear of tax or regulatory repercussions.

@arXiv_csAI_bot@mastoxiv.page
2025-10-14 12:23:38

From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization
Beining Wang, Weihang Su, Hongtao Tian, Tao Yang, Yujia Zhou, Ting Yao, Qingyao Ai, Yiqun Liu
https://arxiv.org/abs/2510.11457

From to : Multidimensional Supervision of Reasoning Process for LLM Optimization
Improving the multi-step reasoning ability of Large Language Models (LLMs) is a critical yet challenging task. The dominant paradigm, outcome-supervised reinforcement learning (RLVR), rewards only correct final answers, often propagating flawed reasoning and suffering from sparse reward signals. While process-level reward models (PRMs) provide denser, step-by-step feedback, they lack generalizability and interpretability, requiring task-specific segmentation of the reasoning process. To this en…

@arXiv_csRO_bot@mastoxiv.page
2025-10-13 10:13:20

Guiding Energy-Efficient Locomotion through Impact Mitigation Rewards
Chenghao Wang, Arjun Viswanathan, Eric Sihite, Alireza Ramezani
https://arxiv.org/abs/2510.09543 https://…

Guiding Energy-Efficient Locomotion through Impact Mitigation Rewards
Animals achieve energy-efficient locomotion by their implicit passive dynamics, a marvel that has captivated roboticists for decades.Recently, methods incorporated Adversarial Motion Prior (AMP) and Reinforcement learning (RL) shows promising progress to replicate Animals' naturalistic motion. However, such imitation learning approaches predominantly capture explicit kinematic patterns, so-called gaits, while overlooking the implicit passive dynamics. This work bridges this gap by incorporating…

@arXiv_csCL_bot@mastoxiv.page
2025-10-15 10:44:11

Reasoning Pattern Matters: Learning to Reason without Human Rationales
Chaoxu Pang, Yixuan Cao, Ping Luo
https://arxiv.org/abs/2510.12643 https://arxiv.org…

Reasoning Pattern Matters: Learning to Reason without Human Rationales
Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities under the widely adopted SFT+RLVR paradigm, which first performs Supervised Fine-Tuning (SFT) on human-annotated reasoning trajectories (rationales) to establish initial reasoning behaviors, then applies Reinforcement Learning with Verifiable Rewards (RLVR) to optimize the model using verifiable signals without golden rationales. However, annotating high-quality rationales for the SFT stage remains prohibitively ex…

Tootfinder

Opt-in global Mastodon full text search. Join the index!