Abstract: Speech-based Visual Question Answering (SBVQA) is a challenging task that aims to answer spoken questions about images. The challenges of this task involve the variability of speakers, the ...
Abstract: Retrieval plays an important role in knowledge-based visual question answering (KB-VQA), which relies on external knowledge to answer questions related to an image. However, not all ...