Understanding Class and Object Models

Is ‘Right’ Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning

Abstract: Multimodal large language models (MLLMs) act as essential interfaces, connecting humans with AI technologies in multimodal applications. However, current MLLMs face challenges in accurately ...

IEEE

GeoFormer: Boosting Object Distinguishing and Prompt Understanding for Cross-View Object Geo-Localization

Abstract: Cross-view object geo-localization (CVOGL) determines the geographic location of an object on the satellite view reference image. The object is indicated by a point prompt in a ground- or ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Is ‘Right’ Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning

GeoFormer: Boosting Object Distinguishing and Prompt Understanding for Cross-View Object Geo-Localization

Trending now