OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded DialogPublished in International Conference on Computational Linguistics (COLING), 2024Share on Twitter Facebook LinkedIn Previous Next