Interleaving Reasoning for Better Text-to-Image Generation
Wenxuan Huang, Shuang Chen, Zheyong Xie, Shaosheng Cao, Shixiang Tang, Yufan Shen, Qingyu Yin, Wenbo Hu, Xiaoman Wang, Yuntian Tang, Junbo Qiao, Yue Guo, Yao Hu, Zhenfei Yin, Philip Torr, Yu Cheng, Wanli Ouyang, Shaohui Lin
https://arxiv.org/abs/2509.06945
Stitch: Training-Free Position Control in Multimodal Diffusion Transformers
Jessica Bader, Mateusz Pach, Maria A. Bravo, Serge Belongie, Zeynep Akata
https://arxiv.org/abs/2509.26644