Abstract: Multimodal relation extraction (MRE) aims at predicting the semantic relation between two entities given a hybrid context of a text and its related image. Though existing MRE methods have ...