Abstract
In response to the challenges of multimodal data analysis in disaster events, this study proposes a two-stage technical framework of “feature alignment evidence fusion”. A cross-modal contrastive learning framework (PMCL) utilizing agents is constructed during the feature alignment stage, which achieves cross-modal feature collaboration through a bimodal Transformer encoder, agent sample collaborative optimization, and geometric constraints. In the fusion decision-making stage, a Cross-modal Enhanced Fusion Network (CEFN) is constructed, and Dirichlet distribution parameterization, projection distance evaluation, and an adaptive fusion mechanism are used to address semantic uncertainty. Experiments have shown that PMCL achieved an accuracy rate of 85% on crisis multimodal information classification datasets, which was 30% higher than the single modal baseline. The accuracy of CEFN in this dataset task was 1.16% higher than that of suboptimal models, and the conflict loss function still controlled performance degradation within 3.34% under 100% inconsistent samples. In addition, PMCL's multimodal pre-training initialization strategy improved the accuracy of the model by 7.1%. This study provides an efficient and interpretable technical solution for disaster emergency response, which has important practical significance for multimodal data-driven intelligent disaster reduction decision-making.
Get full access to this article
View all access options for this article.
