Tootfinder

@arXiv_mathOC_bot@mastoxiv.page
2025-06-02 07:27:40

Convex Approximations of Random Constrained Markov Decision Processes
V Varagapriya, Vikas Vikram Singh, Abdel Lisser
https://arxiv.org/abs/2505.24815 http…

Convex Approximations of Random Constrained Markov Decision Processes
Constrained Markov decision processes (CMDPs) are used as a decision-making framework to study the long-run performance of a stochastic system. It is well-known that a stationary optimal policy of a CMDP problem under discounted cost criterion can be obtained by solving a linear programming problem when running costs and transition probabilities are exactly known. In this paper, we consider a discounted cost CMDP problem where the running costs and transition probabilities are defined using ran…

@arXiv_csRO_bot@mastoxiv.page
2025-06-30 09:37:00

An Introduction to Zero-Order Optimization Techniques for Robotics
Armand Jordana, Jianghan Zhang, Joseph Amigo, Ludovic Righetti
https://arxiv.org/abs/2506.22087

An Introduction to Zero-Order Optimization Techniques for Robotics
Zero-order optimization techniques are becoming increasingly popular in robotics due to their ability to handle non-differentiable functions and escape local minima. These advantages make them particularly useful for trajectory optimization and policy optimization. In this work, we propose a mathematical tutorial on random search. It offers a simple and unifying perspective for understanding a wide range of algorithms commonly used in robotics. Leveraging this viewpoint, we classify many trajec…

@arXiv_econEM_bot@mastoxiv.page
2025-07-29 07:58:31

Sequential Decision Problems with Missing Feedback
Filippo Palomba
https://arxiv.org/abs/2507.19596 https://arxiv.org/pdf/2507.19596

Sequential Decision Problems with Missing Feedback
This paper investigates the challenges of optimal online policy learning under missing data. State-of-the-art algorithms implicitly assume that rewards are always observable. I show that when rewards are missing at random, the Upper Confidence Bound (UCB) algorithm maintains optimal regret bounds; however, it selects suboptimal policies with high probability as soon as this assumption is relaxed. To overcome this limitation, I introduce a fully nonparametric algorithm-Doubly-Robust Upper Confid…

@arXiv_physicsoptics_bot@mastoxiv.page
2025-06-17 12:03:29

Inverse design of the transmission matrix in a random system using Reinforcement Learning
Yuhao Kang
https://arxiv.org/abs/2506.13057 https://

Inverse design of the transmission matrix in a random system using Reinforcement Learning
This work presents an approach to the inverse design of scattering systems by modifying the transmission matrix using reinforcement learning. We utilize Proximal Policy Optimization to navigate the highly non-convex landscape of the object function to achieve three types of transmission matrices: (1) Fixed-ratio power conversion and zero-transmission mode in rank-1 matrices, (2) exceptional points with degenerate eigenvalues and unidirectional mode conversion, and (3) uniform channel participat…

Tootfinder

Opt-in global Mastodon full text search. Join the index!