LLMs are now part of our daily work, making coding easier. Join Ivan Dolgov at this year's Berlin Buzzwords to learn how they built an in-house LLM for AI code completion in JetBrains products, covering design choices, data preparation, training and model evaluation.
Learn more: https://
Chance and Mass Interpretations of Probabilities in Markov Decision Processes (Extended Version)
Yun Chen Tsai, Kittiphon Phalakarn, S. Akshay, Ichiro Hasuo
https://arxiv.org/abs/2506.10377
EXPO: Stable Reinforcement Learning with Expressive Policies
Perry Dong, Qiyang Li, Dorsa Sadigh, Chelsea Finn
https://arxiv.org/abs/2507.07986 https://arxiv.org/pdf/2507.07986 https://arxiv.org/html/2507.07986
arXiv:2507.07986v1 Announce Type: new
Abstract: We study the problem of training and fine-tuning expressive policies with online reinforcement learning (RL) given an offline dataset. Training expressive policy classes with online RL present a unique challenge of stable value maximization. Unlike simpler Gaussian policies commonly used in online RL, expressive policies like diffusion and flow-matching policies are parameterized by a long denoising chain, which hinders stable gradient propagation from actions to policy parameters when optimizing against some value function. Our key insight is that we can address stable value maximization by avoiding direct optimization over value with the expressive policy and instead construct an on-the-fly RL policy to maximize Q-value. We propose Expressive Policy Optimization (EXPO), a sample-efficient online RL algorithm that utilizes an on-the-fly policy to maximize value with two parameterized policies -- a larger expressive base policy trained with a stable imitation learning objective and a light-weight Gaussian edit policy that edits the actions sampled from the base policy toward a higher value distribution. The on-the-fly policy optimizes the actions from the base policy with the learned edit policy and chooses the value maximizing action from the base and edited actions for both sampling and temporal-difference (TD) backup. Our approach yields up to 2-3x improvement in sample efficiency on average over prior methods both in the setting of fine-tuning a pretrained policy given offline data and in leveraging offline data to train online.
toXiv_bot_toot
Revealing Dark Matter's Role in Neutron Stars Anisotropy: A Bayesian Approach Using Multi-messenger Observations
Xue-Zhi Liu, Premachand Mahapatra, Chun Huang, Ayush Hazarika, Chiranjeeb Singha, Prasanta Kumar Das
https://arxiv.org/abs/2506.08376
When Trump encounters a demographic he doesn’t care for, he “disappears” them–in many cases, the very record of their existence.
Meet America’s new “missing persons," and the govt campaign to delete statistical evidence of anyone who doesn't fit Trump’s idea of our new golden age... including Jackie Robinson & Rosa Parks.
(taken from Catherine Rampell on BlueSky)
☑️ The people Trump is trying to delete from government records - Washington Post
Weighted Parameter Estimators of the Generalized Extreme Value Distribution in the Presence of Missing Observations
James H. McVittie, Orla A. Murphy
https://arxiv.org/abs/2506.15964
A Data Science Approach to Calcutta High Court Judgments: An Efficient LLM and RAG-powered Framework for Summarization and Similar Cases Retrieval
Puspendu Banerjee, Aritra Mazumdar, Wazib Ansar, Saptarsi Goswami, Amlan Chakrabarti
https://arxiv.org/abs/2507.01058
Reinforcement Learning for Optimal Control of Spin Magnetometers
Logan W. Cooke, Stefanie Czischek
https://arxiv.org/abs/2506.21475 https://
excellent analysis of covid state of play
From Julia Doubleday.
"Each time a new wave crops up, the media scrambles to let the public know that COVID is spreading “again” ... But each time, it fails to inform the public that nearly half of COVID cases are asymptomatic, that COVID looks different in different patients, that vaccines do not prevent infections, that rapid tests have high false negative rates, and that COVID is fully airborne.
"Taken together, the virus I’m describing is much more difficult to control than the one the press presents. The press frames the virus as something that can be halted by familiarizing yourself with the symptoms, staying home once you feel sick and test positive, and avoided altogether by simply getting vaccinated or keeping ones’ distance from sick people. ...
"The misinformation that reigns in liberal spaces is not the result of accidental miscommunication. People don’t know that the virus is asymptomatic 40% of the time because there is simply no universe where that virus is controllable without an elimination strategy, or a day-to-day mitigation strategy."
#covid #misinformation #denial #CovidIsAirborne
Constructing Evidence-Based Tailoring Variables for Adaptive Interventions
John J. Dziak, Inbal Nahum-Shani
https://arxiv.org/abs/2506.03054 https://