Abstract: This paper proposes a multimodal adaptive convolutional neural network (MA-CNN), which aims to improve the learning performance of complex tasks by efficiently fusing multiple modal data.
This paper focuses on the key challenges in multimodal semantic understanding and proposes a dual-path approach to systematically improve model performance. At the level of cross-modal feature ...