CN103049526A

CN103049526A - Cross-media retrieval method based on double space learning

Info

Publication number: CN103049526A
Application number: CN201210559081XA
Authority: CN
Inventors: 王亮; 谭铁牛; 赫然; 王开业; 王威
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2012-12-20
Filing date: 2012-12-20
Publication date: 2013-04-17
Anticipated expiration: 2032-12-20
Also published as: CN103049526B

Abstract

The invention discloses a cross-media retrieval method based on double space learning. The method comprises the following steps: firstly extracting the characteristics of multimedia data in different modes, utilizing double space learning method to learn and obtain two mapping matrixes, and mapping the data in different modes to the same space; dividing a test sample set into two parts of a query data set and a target data set, using the mapping matrixes obtained by learning to map the data of the two data sets to a unified space, and then measuring the distance between query data and target data, and obtaining the target data mostly close to the query data in distance. According to the method, the multimedia data in different modes can be mapped into the unified space for measurement, and the characteristics are selected while mapping, so that the robustness and accuracy of retrieval are improved, and the method is good in application prospect.

Description

Stride the media search method based on double space study

Technical field

The present invention relates to mode identification technology, particularly a kind ofly stride the media search method based on double space study.

Background technology

In the last few years, along with the rise of Web2.0 technology, the multi-medium data on the internet presented explosive growth, and the user need to retrieve the contents such as the article oneself wanted, picture, video from these mass datas.And present main flow search engine such as Baidu, Google etc., the search technique of employing still is based on key word, and this search technique can't solve the semantic gap that exists between each media.In order to realize striding the media retrieval, need to provide a kind of integrated retrieval method from the another kind of multi-medium data of a class multimedia data retrieval.

The research of striding the media search method has caused the attention of domestic and international academia.At present propose in the world certain methods and come relation between the multi-medium data of modeling different modalities, existing method can be divided into two classes: 1) multi-medium data with two mode is mapped to same subspace, then the similarity between the multimedia of tolerance different modalities in the subspace that this study is arrived realizes striding the media retrieval with this.2) multi-medium data of a class mode is transformed into the similarity measurement that carries out the different modalities data in the feature space of multi-medium data of another kind of mode, realizes striding the media retrieval with this.Yet existing method has all been ignored another major issue, namely how to select simultaneously the most relevant from the feature space of different modalities and the feature of differentiation power is arranged most, and this problem is called the double space feature selecting.For in Learning Subspaces, carry out the double space feature selecting, the present invention propose a kind of based on double space study stride the media search method, the method has realized the leap of the semantic gap between the different modalities media effectively, and then so that the result that search engine returns is more accurate.

Summary of the invention

The problem that exists in order to solve prior art the purpose of this invention is to provide and a kind ofly strides the media search method based on double space study.The method can be carried out sub-space learning and double space feature selecting simultaneously by minimizing an objective function, in the similarity of the multi-medium data of the subspace vacuum metrics different modalities that study is arrived, realizes striding the media retrieval with this.

A kind of media search method of striding based on double space study that the present invention proposes may further comprise the steps:

Step S1, the multi-medium data sample of collection different modalities is set up and is striden media retrieval knowledge database, and described database is divided into training set and test set;

Step S2 extracts the proper vector of different modalities multi-medium data sample in the described database;

Step S3 obtains and described different modalities corresponding mapping matrix respectively based on the proper vector of the different modalities data in the described training set;

Step S4, according to the mode classification of multi-medium data sample in the described test set, the mapping matrix that utilizes described step S3 to obtain is mapped to the same space with them;

Step S5, as query set, the multi-medium data of another mode classification is as object set with the multi-medium data of same mode classification in the test set that is mapped to behind the same space;

Step S6 for a multi-medium data in the described query set, according to the similarity between the multi-medium data, obtains described target tightening and its most similar multi-medium data, thereby obtains striding the media result for retrieval.

The method according to this invention can be mapped into the multi-medium data of different modalities unified space and measure, and has carried out feature selecting in mapping, has improved robustness and the accuracy of retrieval, has good utilization prospect.The inventive method can have been crossed over the semantic gap between the different modalities media effectively, and then to stride the result that the media research engine returns more accurate.

Description of drawings

Fig. 1 be the present invention is based on double space study stride media search method process flow diagram;

Fig. 2 strides the retrieval effectiveness synoptic diagram of media search method from the text to the image according to the present invention.

Embodiment

For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.

The present invention learns two mapping matrixes by double space, and the data-mapping of different modalities in same subspace, and has been carried out feature selecting in mapping; In the similarity of the data of the subspace vacuum metrics different modalities that study is arrived, realize striding the purpose of media retrieval with this.

Fig. 1 be the present invention is based on double space study stride media search method process flow diagram, as shown in Figure 1, a kind of cross-module attitude search method based on double space study that the present invention proposes comprises following step:

Described different modalities is such as being the mode such as text, image, and for the ease of the present invention will be described, hereinafter the data instance with text and these two mode of image makes an explanation.

When described database is divided into training set and test set, can divide according to actual needs, such as the data of 80% in the described database can being divided into training set, remaining 20% data are divided into test set.

Among the present invention, use respectively hidden Di Lei Cray distribution (Latent Dirichlet Allocation for text and image, LDA) algorithm and the conversion of yardstick invariant features (Scale-Invariant Feature Transform, SIFT) algorithm carries out feature extraction.

Before this step is described in detail, introduce first some matrix operation symbols:

For matrix

Be the space of matrices of n * m, the i row element of matrix M is expressed as m ⁱ, the j column element of matrix M is expressed as m _j, and the Frobenius norm of matrix M is called for short the F-norm and is defined as:

{| | M | |}_{F} = \sqrt{Σ_{i = 1}^{n} {| | m^{i} | |}_{2}^{2}} - - - (1)

Two norms of the row vector of matrix M with ‖ M ‖ ₂₁Be defined as:

{| | M | |}_{21} = Σ_{i = 1}^{n} {| | m^{i} | |}_{2} - - - (2)

The trace norm of matrix M is defined as:

{| | M | |}_{*} = Σ_{i = 1}^{\min (m, n)} σ_{i} - - - (3)

Wherein, σ _iI the singular value of representing matrix M.

For vector

Represent that a diagonal entry is u _iDiagonal matrix, wherein, u _iI the element that represents vectorial u.

If the matrix that the proper vector of described training set Chinese version multi-medium data forms is

D1 is the dimension of proper vector, and n is the number of proper vector, and a represents the mode classification; The matrix that the proper vector of image multi-medium data forms is

D2 is the dimension of proper vector, and n is the number of proper vector, and b represents the mode classification; The class label matrix is

N is the number of proper vector, and c is the number of mode classification.

Described step S3 further may further comprise the steps:

Step S31, set up objective function based on the proper vector of different modalities data in the described training set:

\min_{U_{a}, U_{b}} \frac{1}{2} (| | X_{a}^{T} U_{a} - Y {| |}_{F}^{2} + | | X_{b}^{T} U_{b} - {Y | |}_{F}^{2}) + λ_{1} (| | U_{a} {| |}_{21} + | | U_{b} {| |}_{21}) + λ_{2} | | [X_{a}^{T} U_{a} X_{b}^{T} U_{b}] {| |}_{*} - - - (4)

Wherein, U _aAnd U _bBe two mapping matrixes corresponding with text and image difference, λ ₁And λ ₂Be undetermined parameter, span is 0.0001 to 0.1;

Step S32, find the solution described objective function (such as passing through iterative algorithm) and obtain the mapping matrix corresponding with described different modalities difference:

Described step S32 further may further comprise the steps:

Step S321 arranges mapping matrix U _aAnd U _bInitial value be null matrix, maximum iteration time k is set, and the initial value that iterations is set is 1;

Step S322 is to matrix

Carry out Eigenvalues Decomposition, obtain Eigenvalues Decomposition result: VDiag (d) V ^T, wherein, V is with matrix

Feature value vector be the matrix that column vector forms; D is the vector that feature value vector characteristic of correspondence value forms.

Step S323 sets intermediate variable S ^-1=VDiag (c) V ^T, wherein, c is a vector, its i element is

Wherein, d _iBe i the element of vectorial d, μ is a minimum number.

Step S324 is according to i the element p of following formula calculating intermediate vector p and q _iAnd q _i:

\{\begin{matrix} p_{i} = \frac{1}{2 \sqrt{{| | u_{a}^{i} | |}_{2}^{2} + ϵ}} \\ q_{i} = \frac{1}{2 \sqrt{{| | u_{b}^{i} | |}_{2}^{2} + ϵ}} \end{matrix} - - - (5)

And make P=Diag (p), and Q=Diag (q), wherein, ε is a minimum number.

Step S325 tries to achieve described mapping matrix U by two linear problems of finding the solution in the following formula _aAnd U _b:

\{\begin{matrix} (X_{a} X_{a}^{T} + λ_{1} P_{a} + λ_{2} X_{a} S^{- 1} X_{a}^{T}) U_{a} = X_{a} Y \\ (X_{b} X_{b}^{T} + λ_{1} P_{b} + λ_{2} X_{b} S^{- 1} X_{b}^{T}) U_{b} = X_{b} Y \end{matrix} - - - (6)

Whether step S326 judges the iterations of this moment less than maximum iteration time k, if then turn to described step S322 to continue iteration; If not, then stop iteration, obtain described mapping matrix U _aAnd U _b

Similarity between the described multi-medium data can be measured with Euclidean distance, and for Euclidean distance, distance is nearer just to illustrate that these two multi-medium datas are more similar.

In order to verify the validity of the inventive method, next method proposed by the invention is applied to a multimedia database.The multi-medium data that comprises 10 semantic domains in this database, the text and the image that comprise 2866 pairs of identical semantic informations, choose at random wherein the 2173 pairs of texts and image pattern as training set, choose wherein the 693 pairs of texts and image pattern as test set, the concrete steps of using the inventive method are as follows:

1) extracts different proper vectors for the difference of the data based mode classification in the described database: for text modality, extract the LDA semantic feature of 10 dimensions, for image modalities, extract the SIFT feature of 128 dimensions;

2) with the proper vector composition characteristic matrix of proper vector and the view data of described training set Chinese version data, the structure objective function, learn to obtain two mapping matrixes corresponding with text and image by minimizing objective function, the learning process of described mapping matrix realizes by an iterative process;

3) according to the difference of described mode classification, the mapping matrix that the proper vector of the proper vector of the text data in the test set and view data is obtained by study is mapped in the same space;

4) with the text data in the test set as query set, view data is as object set, a given text document can access the image of target tightening the most similar with it (being that Euclidean distance is nearest).Fig. 2 strides the retrieval effectiveness synoptic diagram of media search method from the text to the image according to the present invention, Fig. 2 left side is two text documents, the right side be obtain according to the inventive method to these two images that text document is the most similar.

To sum up, the present invention proposes and a kind ofly effectively stride the media search method based on double space study.The present invention is easy to realize, stable performance, and recognition accuracy is high.The present invention proposes effectively to have crossed over the semantic gap between the different modalities media, strides media method relatively with traditional, and the present invention shows and uses widely prospect and larger marketable value.

Above-described specific embodiment; purpose of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the above only is specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

One kind based on double space study stride the media search method, it is characterized in that the method may further comprise the steps:

Step S1, the multi-medium data sample of collection different modalities is set up and is striden media retrieval knowledge database, and described database is divided into training set and test set;

Step S2 extracts the proper vector of different modalities multi-medium data sample in the described database;

Step S3 obtains and described different modalities corresponding mapping matrix respectively based on the proper vector of the different modalities data in the described training set;

Step S4, according to the mode classification of multi-medium data sample in the described test set, the mapping matrix that utilizes described step S3 to obtain is mapped to the same space with them;

Step S5, as query set, the multi-medium data of another mode classification is as object set with the multi-medium data of same mode classification in the test set that is mapped to behind the same space;

Step S6 for a multi-medium data in the described query set, according to the similarity between the multi-medium data, obtains described target tightening and its most similar multi-medium data, thereby obtains striding the media result for retrieval.
2. method according to claim 1 is characterized in that, described different modalities is bimodal.
3. method according to claim 2 is characterized in that, described bimodal is text and image.
4. method according to claim 3 is characterized in that, for text multi-medium data sample, extracts its hidden Di Lei Cray distribution characteristics vector; For image multi-medium data sample, extract its yardstick invariant features transform characteristics vector.
5. method according to claim 1 is characterized in that, described step S3 further may further comprise the steps:

Step S31 sets up objective function based on the proper vector of different modalities data in the described training set;

Step S32 finds the solution described objective function and obtains the mapping matrix corresponding with described different modalities difference.
6. method according to claim 5 is characterized in that, described objective function is:

$\min_{U_{a}, U_{b}} \frac{1}{2} (| | X_{a}^{T} U_{a} - Y {| |}_{F}^{2} + | | X_{b}^{T} U_{b} - {Y | |}_{F}^{2}) + λ_{1} (| | U_{a} {| |}_{21} + | | U_{b} {| |}_{21}) + λ_{2} | | [X_{a}^{T} U_{a} X_{b}^{T} U_{b}] {| |}_{*},$

Wherein, U _aAnd U _bBe two mapping matrixes corresponding with the bimodal difference, λ ₁And λ ₂Be undetermined parameter, ‖ ‖ _FFor asking for the operation of F-norm, ‖ ‖ ₂₁For asking for vectorial two norms of row and operation, ‖ ‖ _*For asking for the trace norm operation.
7. method according to claim 5 is characterized in that, carries out for finding the solution with iterative algorithm of described objective function.
8. method according to claim 7 is characterized in that, described step S32 further may further comprise the steps:

Step S321 arranges mapping matrix U _aAnd U _bInitial value be null matrix, maximum iteration time k is set, the initial value of iterations is 1;

Step S322 is to matrix
Carry out Eigenvalues Decomposition, obtain Eigenvalues Decomposition result: VDiag (d) V ^T, wherein,
Be the matrix that the proper vector of described training set Chinese version multi-medium data forms, d1 is the dimension of proper vector, and n is the number of proper vector, and a represents the mode classification; Be the matrix that the proper vector of image multi-medium data in the described training set forms, d2 is the dimension of proper vector, and n is the number of proper vector, and b represents the mode classification; V is with matrix
Feature value vector be the matrix that column vector forms; D is the vector that feature value vector characteristic of correspondence value forms;

Step S323 sets intermediate variable S ^-1=VDiag (c) V ^T, wherein, c is a vector, its i element is
Wherein, d _iBe i the element of vectorial d, μ is a minimum number;

Step S324 is according to i the element p of following formula calculating intermediate vector p and q _iAnd q _i:

$\{\begin{matrix} p_{i} = \frac{1}{2 \sqrt{{| | u_{a}^{i} | |}_{2}^{2} + ϵ}} \\ q_{i} = \frac{1}{2 \sqrt{{| | u_{b}^{i} | |}_{2}^{2} + ϵ}} \end{matrix},$

And make P=Diag (p), and Q=Diag (q), wherein, ε is a minimum number;

Step S325 tries to achieve described mapping matrix U by two linear problems of finding the solution in the following formula _aAnd U _b:

$\{\begin{matrix} (X_{a} X_{a}^{T} + λ_{1} P_{a} + λ_{2} X_{a} S^{- 1} X_{a}^{T}) U_{a} = X_{a} Y \\ (X_{b} X_{b}^{T} + λ_{1} P_{b} + λ_{2} X_{b} S^{- 1} X_{b}^{T}) U_{b} = X_{b} Y \end{matrix},$

Wherein,
Be the class label matrix, n is the number of proper vector, and c is the number of mode classification;

Whether step S326 judges the iterations of this moment less than maximum iteration time k, if then turn to described step S322 to continue iteration; If not, then stop iteration, obtain described mapping matrix U _aAnd U _b
9. method according to claim 1 is characterized in that, the similarity between the described multi-medium data is measured with Euclidean distance.