Abstract: Existent video description approaches advocated in the literature rely on capturing the semantic relationships among concepts and visual features from training data specific to various ...