TY - GEN
T1 - Cross-modal similarity learning
T2 - 24th ACM International Conference on Information and Knowledge Management, CIKM 2015
AU - Kang, Cuicui
AU - Liao, Shengcai
AU - He, Yonghao
AU - Wang, Jian
AU - Niu, Wenjia
AU - Xiang, Shiming
AU - Pan, Chunhong
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/10/17
Y1 - 2015/10/17
N2 - The cross-media retrieval problem has received much attention in recent years due to the rapid increasing of multimedia data on the Internet. A new approach to the problem has been raised which intends to match features of different modalities directly. In this research, there are two critical issues: how to get rid of the heterogeneity between different modalities and how to match the cross-modal features of different dimensions. Recently metric learning methods show a good capability in learning a distance metric to explore the relationship between data points. However, the traditional metric learning algorithms only focus on single-modal features, which suffer difficulties in addressing the cross-modal features of different dimensions. In this paper, we propose a cross-modal similarity learning algorithm for the cross-modal feature matching. The proposed method takes a bilinear formulation, and with the nuclear-norm penalization, it achieves low-rank representation. Accordingly, the accelerated proximal gradient algorithm is successfully imported to find the optimal solution with a fast convergence rate O(1/t2). Experiments on three well known image-text crossmedia retrieval databases show that the proposed method achieves the best performance compared to the state-of-the-art algorithms.
AB - The cross-media retrieval problem has received much attention in recent years due to the rapid increasing of multimedia data on the Internet. A new approach to the problem has been raised which intends to match features of different modalities directly. In this research, there are two critical issues: how to get rid of the heterogeneity between different modalities and how to match the cross-modal features of different dimensions. Recently metric learning methods show a good capability in learning a distance metric to explore the relationship between data points. However, the traditional metric learning algorithms only focus on single-modal features, which suffer difficulties in addressing the cross-modal features of different dimensions. In this paper, we propose a cross-modal similarity learning algorithm for the cross-modal feature matching. The proposed method takes a bilinear formulation, and with the nuclear-norm penalization, it achieves low-rank representation. Accordingly, the accelerated proximal gradient algorithm is successfully imported to find the optimal solution with a fast convergence rate O(1/t2). Experiments on three well known image-text crossmedia retrieval databases show that the proposed method achieves the best performance compared to the state-of-the-art algorithms.
KW - Accelerated proximal gradient
KW - Cross-Modality
KW - Multimedia retrieval
KW - Nuclear norm
KW - Similarity learning
UR - http://www.scopus.com/inward/record.url?scp=84958233920&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84958233920&partnerID=8YFLogxK
U2 - 10.1145/2806416.2806469
DO - 10.1145/2806416.2806469
M3 - Conference contribution
AN - SCOPUS:84958233920
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1251
EP - 1260
BT - CIKM 2015 - Proceedings of the 24th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
Y2 - 19 October 2015 through 23 October 2015
ER -