Abstract: Multi-modal remote sensing image template matching is a meaningful and crucial topic in remote sensing image processing. However, due to different imaging mechanisms, there are significant ...
Abstract: Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. However, they are sensitive to the choice of input text prompts and ...