LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline ConstructionJing Chang, Chang Liu, Jinbin Huang, Rui Mao, Jianbin Qinhttps://arxiv.org/abs/2507.13712
LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline ConstructionAutomated data preparation is crucial for democratizing machine learning, yet existing reinforcement learning (RL) based approaches suffer from inefficient exploration in the vast space of possible preprocessing pipelines. We present LLaPipe, a novel framework that addresses this exploration bottleneck by integrating Large Language Models (LLMs) as intelligent policy advisors. Unlike traditional methods that rely solely on statistical features and blind trial-and-error, LLaPipe leverages the se…