Q-TriM: Question-Guided Tri-Modal Attention for Audio–Visual Question Answering

Publication
In European Conference on Computer Vision (ECCV ‘26)