Abstract: Camera and IMU are widely used in robotics to achieve accurate and robust pose estimation. However, this fusion relies heavily on sufficient visual feature observations and precise inertial ...
Abstract: Multimodal vision-language models (VLMs) continue to achieve ever-improving scores on chart understanding benchmarks. Yet, we find that this progress does not fully capture the breadth of ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results