HCCM: Hierarchical Cross-Granularity Contrastive and Matching Learning for Natural Language-Guided DronesHao Ruan, Jinliang Lin, Yingxin Lai, Zhiming Luo, Shaozi Lihttps://arxiv.org/abs/2508.21539
HCCM: Hierarchical Cross-Granularity Contrastive and Matching Learning for Natural Language-Guided DronesNatural Language-Guided Drones (NLGD) provide a novel paradigm for tasks such as target matching and navigation. However, the wide field of view and complex compositional semantics in drone scenarios pose challenges for vision-language understanding. Mainstream Vision-Language Models (VLMs) emphasize global alignment while lacking fine-grained semantics, and existing hierarchical methods depend on precise entity partitioning and strict containment, limiting effectiveness in dynamic environments…