Abstract: Multimodal large language models (MLLMs) act as essential interfaces, connecting humans with AI technologies in multimodal applications. However, current MLLMs face challenges in accurately ...
Abstract: Cross-view object geo-localization (CVOGL) determines the geographic location of an object on the satellite view reference image. The object is indicated by a point prompt in a ground- or ...