Abstract: Omnidirectional images carry comprehensive scene representation and are widely useful for applications like AR/VR, robotics, and autonomous driving that require holistic scene understanding.
The dataset used for fine-tuning the model. Code for generating the dataset. Scripts for fine-tuning the model on high-performance GPUs. Inference scripts for real-time task execution. SG_VLM utilizes ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results