Understanding the Influence of Synthetic Data for Text Embedders
Jacob Mitchell Springer, Vaibhav Adlakha, Siva Reddy, Aditi Raghunathan, Marius Mosbach
https://arxiv.org/abs/2509.06184
SynGen-Vision: Synthetic Data Generation for training industrial vision models
Alpana Dubey, Suma Mani Kuriakose, Nitish Bhardwaj
https://arxiv.org/abs/2509.04894 https://
Towards Label-Free Biological Reasoning Synthetic Dataset Creation via Uncertainty Filtering
Josefa Lia Stoisser, Lawrence Phillips, Aditya Misra, Tom A. Lamb, Philip Torr, Marc Boubnovski Martell, Julien Fauqueur, Kaspar M\"artens
https://arxiv.org/abs/2510.05871
Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)
Junki Mori, Kazuya Kakizaki, Taiki Miyagawa, Jun Sakuma
https://arxiv.org/abs/2510.06719
A Synthetic-to-Real Dehazing Method based on Domain Unification
Zhiqiang Yuan, Jinchao Zhang, Jie Zhou
https://arxiv.org/abs/2509.05374 https://arxiv.org/p…
High-Fidelity Synthetic ECG Generation via Mel-Spectrogram Informed Diffusion Training
Zhuoyi Huang, Nutan Sahoo, Anamika Kumari, Girish Kumar, Kexuan Cai, Shixing Cao, Yue Kang, Tian Xia, Somya Chatterjee, Nicholas Hausman, Aidan Jay, Eric S. Rosenthal, Soundar Srinivasan, Sadid Hasan, Alex Fedorov, Sulaiman Vesal, Soundar Srinivasan, Sadid Hasan, Alex Fedorov, Sulaiman Vesal
SDQM: Synthetic Data Quality Metric for Object Detection Dataset Evaluation
Ayush Zenith, Arnold Zumbrun, Neel Raut, Jing Lin
https://arxiv.org/abs/2510.06596 https://
Knowledge Collapse in LLMs: When Fluency Survives but Facts Fail under Recursive Synthetic Training
Figarri Keisha, Zekun Wu, Ze Wang, Adriano Koshiyama, Philip Treleaven
https://arxiv.org/abs/2509.04796
Tell-Tale Watermarks for Explanatory Reasoning in Synthetic Media Forensics
Ching-Chun Chang, Isao Echizen
https://arxiv.org/abs/2509.05753 https://arxiv.o…
Aligning Large Language Models via Fully Self-Synthetic Data
Shangjian Yin, Zhepei Wei, Xinyu Zhu, Wei-Lin Chen, Yu Meng
https://arxiv.org/abs/2510.06652 https://