Multi-Modal Video Dialog State Tracking in the Wild

Published in European Conference on Computer Vision (ECCV), 2024