WO2008069791A1 - Method and apparatus for improving image retrieval and search using latent semantic indexing - Google Patents

Method and apparatus for improving image retrieval and search using latent semantic indexing Download PDF

Info

Publication number
WO2008069791A1
WO2008069791A1 PCT/US2006/046394 US2006046394W WO2008069791A1 WO 2008069791 A1 WO2008069791 A1 WO 2008069791A1 US 2006046394 W US2006046394 W US 2006046394W WO 2008069791 A1 WO2008069791 A1 WO 2008069791A1
Authority
WO
WIPO (PCT)
Prior art keywords
term
document matrix
vectors
vector
document
Prior art date
Application number
PCT/US2006/046394
Other languages
French (fr)
Inventor
Jonathon S. Hare
Paul H. Lewis
Original Assignee
General Instrument Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by General Instrument Corporation filed Critical General Instrument Corporation
Priority to PCT/US2006/046394 priority Critical patent/WO2008069791A1/en
Publication of WO2008069791A1 publication Critical patent/WO2008069791A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Definitions

  • the present invention relates generally to computer-based information retrieval and, in particular, to image retrieval stored in a computer database.
  • An aspect of the invention relates to creating an image database. First a term-document matrix comprising at least two different domain features is created. Next, latent semantic indexing is applied to the term-document matrix. Finally, a plurality of new vectors are added to a decomposed document matrix of the term- document matrix using a fold-in technique.
  • Another aspect of the invention relates to image retrieval comprising creating a term-document matrix comprising at least two different domain features, applying latent semantic indexing to the term-document matrix, adding a plurality of new vectors to a decomposed document matrix of the term-document matrix using a fold-in technique, providing a query vector, comparing the query vector against each one of a plurality of document vectors of the decomposed document matrix and returning a plurality of images that are similar to the query vector.
  • FIG. -1 illustrates a flowchart of a method for creating an image database and image retrieval from the image database
  • FIG. 2 is a block diagram depicting an exemplary annotated image in accordance with the invention.
  • FIG. 3 illustrates an exemplary embodiment of a term-document matrix in accordance with the invention
  • FIG. 4 illustrates an exemplary embodiment of the term-document matrix after latent semantic indexing is applied
  • FIG. 5 illustrates an exemplary embodiment of the fold-in technique
  • FIG. 6 illustrates a high level block diagram of a general purpose computer suitable for use in performing the functions described herein.
  • identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.
  • FIG. 1 is a flow diagram depicting an exemplary embodiment of a method 100 for image retrieval from an image database in accordance with one or more aspects of the invention.
  • FIG. 2 depicts an exemplary annotated image 200-1 of a small set of annotated images 200 in accordance with one or more aspects of the invention.
  • the method 100 begins at step 110, where a small set of annotated images 200 are collected or generated from an entire image collection.
  • Annotated image 200-1 may be described by at least two different domain features such as, for example, human language and visual language domains.
  • Semantic annotations 210 are used to generate a human language vector 214.
  • Human language vector 214 is a representation of the word occurrences in the semantic annotations 210 compared to the human language vocabulary 212.
  • the first item of semantic annotation 210 represents the word 'sky' and 'sky' appears in the semantic annotations 210 only once.
  • the closest term to 'sky' is located in human language vocabulary 212.
  • the matching term found in human language vocabulary 212 is then assigned a value of '1' in the human language vector 214 for the first item.
  • Any words that are in the semantic annotations 210, but are not found in the human language vocabulary 212 are represented as zeros in the human language vector 214.
  • the size of the human language vector 214 is shown with only five items, one skilled in the art will recognize that the size of human language vector 214 may be any size suitable to capture all the words found in the semantic annotations 210.
  • Annotated image 200-1 will also have visual language annotations 220.
  • Visual language annotations 220 represent quantized local descriptors of salient regions of annotated image 200-1.
  • Visual language annotations 220 may be represented by any descriptor capable of describing the image content as a set of discrete terms, such as for example, RGB histograms, rg-chromaticity histograms or any other color based histogram.
  • the salient regions of annotated image 200-1 may be selected by any method known to those skilled in the art of image retrieval.
  • the visual language annotations 220 are used to generate a visual language vector 224.
  • Visual language vector 224 is a representation of the word occurrences in the visual language annotations 220 compared to the visual language vocabulary 222.
  • Visual language vocabulary 222 may be represented, for example, by a matrix of vectors.
  • visual language vector 224 is assigned by finding the closest match of each term of visual language annotations 220 with visual language vocabulary 222 in terms of a calculated distance, for example calculating a Euclidean distance.
  • the method for generating both the human language vocabulary 212 and visual language vocabulary 222 may be any suitable method known in the art of image retrieval.
  • the human language vocabulary 212 and visual language vocabulary 222 were generated using a k-means clustering algorithm applied to a sample of local descriptors picked from a set of training images. Furthermore, in one embodiment, a human language vocabulary 212 and visual language vocabulary 222 are provided that include a multiplicity of terms (e.g., at least a few thousand terms). Finally, human language vectors 214 and visual language vectors 224 are generated for the remaining annotated images 200-2 through 200-n of the small set of annotated images 200.
  • cross-language vectors are created by appending the human language vectors and visual language vectors.
  • the human language vector 214 and visual language vector 224 of annotated image 200-1 are appended to generate a cross-language vector 230.
  • cross-language vector 230 contains both human language and visual language elements.
  • cross-language vectors 230 are generated for the remaining annotated images 200-2 through 200-n of the small set of annotated images 200.
  • the cross- language vectors 230 are combined into a term-document matrix.
  • FIG. 3 depicts an exemplary embodiment of a term-document matrix 300 in accordance with one or more aspects of the invention.
  • Each column 320-1 to 320-j represents each of the annotated images 200-1 through 200-n, also referred to as documents, of the small set of annotated images 200.
  • Each row 310-1 to 310-i represents a term found in all of the documents.
  • Within term-document matrix 300 are elements ay which represent the frequency of term i in document j. Each element ay is weighted to each element in term- document matrix 300 due to the fact that term-document matrix 300 is usually very sparse because every word does not normally occur in each document.
  • log-entropy weighting is used. Log- entropy weighting is defined as:
  • LSI Latent Semantic Indexing
  • LSI is a technique for information retrieval that is related to the vector-space model of information retrieval.
  • documents are represented in a multidimensional space, such as for example, term-document matrix 300.
  • LSI takes the vector-space model one stage further by applying linear algebra to attempt to factor out noise and deal with issues of polysemy (words with multiple meaning) and synonymy (different words with the same meaning).
  • LSI works by constructing the term-document matrix 300 and factoring term-document matrix 300 using Singular Value Decomposition (SVD). From the factored data, a rank-/c estimate of the original term- document matrix 300 can be reconstructed that removes much of the noise and reduces the dimensionality, thereby, reducing the computational complexity of performing image search and retrieval.
  • Singular Value Decomposition Singular Value Decomposition
  • FIG. 4 depicts an exemplary embodiment of a result of applying LSI to term- document matrix 300 using SVD in accordance with one or more aspects of the invention.
  • Term-document matrix 300 is decomposed into a product of three separate matrices of vectors.
  • Matrix U 402 represents an i x m matrix of term vectors, where i represents the terms of term-document matrix 300 and m ⁇ min(i.j).
  • Matrix ⁇ 404 represents a m x m diagonal matrix of singular values.
  • Matrix V 406 represents a m x j matrix of document vectors, where j represents the documents of term-document matrix 300.
  • the dimensionality of the term-document matrix 300 can be further reduced by using rank-/c approximation of term-document matrix 300 by selecting the /c-largest singular values 412 within matrix ⁇ 404.
  • the remaining values within matrix £ 404 are set to zeros.
  • the rows and columns within matrix U 402 and matrix V 406 having these zeros are deleted, thereby, creating an i x k matrix U 402 represented by shaded region 410 and an k x j matrix V 406 represented by shaded region 414. Consequently, the dimensions of term-document matrix 300 are reduced as represented by reduced term- document matrix 400.
  • FIG. 5 depicts an exemplary illustration of the "fold-in" technique of step 140 in accordance with one or more aspects of the invention.
  • the reduced decomposed matrices from step 130 are represented as matrix I ⁇ 502, matrix ⁇ k 504 and matrix V/ 506.
  • the additional visual language vectors are generated similarly to visual language vector 224, as discussed above.
  • d ⁇ is the vector of visual language terms, padded by zeros in place of the unknown human language terms, and d is the projected version of the d ⁇ vector. If weighting is used as discussed above, then the same weighting must first be applied to d ⁇ before projection. The new projected vector is then appended as a new column to the matrix V* ⁇ 506 and represented by shaded region 510. Thus, a complete term- document matrix 500 is created with all the images added via the "fold-in" technique. The additional added images are represented by shaded region 512. [0026] Referring back to FIG. 1 , the image database is now ready to be queried. A query is submitted in step 150. The query may include visual language, human language or a combination of the two.
  • a query vector is created from the visual language terms, human language terms or both.
  • the query vector is created similar to the way human language vector 214 and visual language vector 224 were created, as discussed above.
  • the query vector is also weighted in the same way as each term a,- s was when creating term-document matrix 300, as discussed above.
  • the query vector is then reduced to k dimensions and represented as follows:
  • the next step 160 is to compare the query vector of equation (5) against each document represented by the columns of matrix V/ 506, including all the images added via the "fold-in” technique represented by shaded region 510.
  • the comparison is based on the calculated distance, for example Euclidean distance, between the query vector of equation (5) and each column of matrix V* ⁇ 506, including all the images added via the "fold-in” technique represented by shaded region 510.
  • the method ends at step 170 by returning matching results.
  • the results may be ranked in order of their distance, with the closest being the most similar.
  • FIG. 6 is a block diagram depicting an exemplary embodiment of a computer 600 suitable for implementing the processes and methods described above in accordance with one or more aspects of the invention.
  • the computer 600 includes a processor 601 , a memory 603, various support circuits 604, and an I/O interface 602.
  • the processor 601 may include one or more of any type of microprocessor known in the art.
  • the support circuits 604 for the processor 601 include conventional cache, power supplies, clock circuits, data registers, I/O interfaces, and the like.
  • the I/O interface 602 may be directly coupled to the memory 603 or coupled through the processor 601.
  • the memory 603 may include one or more of the following random access memory, read only memory, magneto-resistive read/write memory, optical read/write memory, cache memory, magnetic read/write memory, and the like, as well as signal-bearing media as described below.
  • the memory 603 stores processor-executable instructions and/or data that may be executed by and/or used by the processor 601 as described further below.
  • These processor-executable instructions may comprise hardware, firmware, software, and the like, or some combination thereof.
  • the processor-executable instructions may be configured to cause the processor to perform the method 100 of FIG. 1.
  • processor(s) executing a software program those skilled in the art will appreciate that the invention may be implemented in hardware, software, or a combination of hardware and software.
  • Such implementations may include a number of processors independently executing various programs and dedicated hardware, such as ASICs.
  • the computer 600 may be programmed with an operating system, which may be OS/2, Java Virtual Machine, Linux, Solaris, Unix, Windows, Windows95, Windows98, Windows NT, and Windows2000, WindowsME, and WindowsXP, among other known platforms. At least a portion of an operating system may be disposed in the memory 603.
  • an operating system which may be OS/2, Java Virtual Machine, Linux, Solaris, Unix, Windows, Windows95, Windows98, Windows NT, and Windows2000, WindowsME, and WindowsXP, among other known platforms. At least a portion of an operating system may be disposed in the memory 603.
  • An aspect of the invention is implemented as a program product for execution by a processor.
  • Program(s) of the program product defines functions of embodiments and can be contained on a variety of signal-bearing media (computer readable media), which include, but are not limited to: (i) information permanently stored on non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM or DVD-ROM disks readable by a CD-ROM drive or a DVD drive); (ii) alterable information stored on writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive or read/writable CD or read/writable DVD); or (iii) information conveyed to a computer by a communications medium, such as through a computer or telephone network, including wireless communications.
  • a communications medium such as through a computer or telephone network, including wireless communications.
  • the latter embodiment specifically includes information downloaded from the Internet and other networks.
  • Such signal- bearing media when

Abstract

A method of creating an image database and searching the database is disclosed. A term-document matrix that includes at least two different domain features is created. Latent semantic indexing is applied to the term-document matrix to decompose the term-document matrix. Then a plurality of new vectors are added to a decomposed document matrix of the term-document matrix using a fold-in technique to complete the creation of the searchable image database. Consequently, images are retrieved from the image database by providing a query vector. The query vector is compared against each one of a plurality of document vectors of the decomposed document matrix. Finally, a plurality of images that are similar to the query vector are returned.

Description

METHOD AND APPARATUS FOR IMPROVING IMAGE RETRIEVAL AND SEARCH USING LATENT SEMANTIC INDEXING
BACKGROUND OF THE INVENTION
1. Field of the Invention
[0001] The present invention relates generally to computer-based information retrieval and, in particular, to image retrieval stored in a computer database.
2. Description of the Background Art
[0002] Research into content based image retrieval has been on going for many years, with many algorithms having been developed for searching for similar images based on a query image. However, these algorithms have not been widely deployed due to the fact that searching for an image or images with an example query image is not a natural thing to do, and also requires being able to find a suitable query image. [0003] Image retrieval using descriptors based on the pixel content of salient regions has been shown to outperform existing methods for retrieval based on global descriptors and avoid the segmentation problems found with region-based indexing and retrieval, while being robust to various image transforms. However, a common problem with current retrieval algorithms based on salient regions is the computational complexity due to the high dimensionality of the problem. With a salient region based approach, the cost of comparison rises with the number of regions, as each region may have to be compared to every other region. This cost can be massive with the number of regions per image feasibly reaching into the 1000s.
[0004] Therefore, a need exists for a method of creating an image database and image retrieval that reduces the computational complexity of image retrieval with the robustness to utilize searches based on human language, visual language, or both.
SUMMARY OF THE INVENTION
[0005] An aspect of the invention relates to creating an image database. First a term-document matrix comprising at least two different domain features is created. Next, latent semantic indexing is applied to the term-document matrix. Finally, a plurality of new vectors are added to a decomposed document matrix of the term- document matrix using a fold-in technique.
[0006] Another aspect of the invention relates to image retrieval comprising creating a term-document matrix comprising at least two different domain features, applying latent semantic indexing to the term-document matrix, adding a plurality of new vectors to a decomposed document matrix of the term-document matrix using a fold-in technique, providing a query vector, comparing the query vector against each one of a plurality of document vectors of the decomposed document matrix and returning a plurality of images that are similar to the query vector.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
[0008] FIG. -1 illustrates a flowchart of a method for creating an image database and image retrieval from the image database;
[0009] FIG. 2 is a block diagram depicting an exemplary annotated image in accordance with the invention;
[0010] FIG. 3 illustrates an exemplary embodiment of a term-document matrix in accordance with the invention;
[0011] FIG. 4 illustrates an exemplary embodiment of the term-document matrix after latent semantic indexing is applied;
[0012] FIG. 5 illustrates an exemplary embodiment of the fold-in technique; and
[0013] FIG. 6 illustrates a high level block diagram of a general purpose computer suitable for use in performing the functions described herein. [0014] To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.
DETAILED DESCRIPTION OF THE INVENTION
[0015] FIG. 1 is a flow diagram depicting an exemplary embodiment of a method 100 for image retrieval from an image database in accordance with one or more aspects of the invention. FIG. 2 depicts an exemplary annotated image 200-1 of a small set of annotated images 200 in accordance with one or more aspects of the invention. The method 100 begins at step 110, where a small set of annotated images 200 are collected or generated from an entire image collection. Annotated image 200-1 may be described by at least two different domain features such as, for example, human language and visual language domains. Semantic annotations 210 are used to generate a human language vector 214. Human language vector 214 is a representation of the word occurrences in the semantic annotations 210 compared to the human language vocabulary 212.
[0016] For example, the first item of semantic annotation 210 represents the word 'sky' and 'sky' appears in the semantic annotations 210 only once. The closest term to 'sky' is located in human language vocabulary 212. The matching term found in human language vocabulary 212 is then assigned a value of '1' in the human language vector 214 for the first item. Any words that are in the semantic annotations 210, but are not found in the human language vocabulary 212 are represented as zeros in the human language vector 214. Although, in the exemplary embodiment the size of the human language vector 214 is shown with only five items, one skilled in the art will recognize that the size of human language vector 214 may be any size suitable to capture all the words found in the semantic annotations 210.
[0017] Annotated image 200-1 will also have visual language annotations 220. Visual language annotations 220 represent quantized local descriptors of salient regions of annotated image 200-1. Visual language annotations 220 may be represented by any descriptor capable of describing the image content as a set of discrete terms, such as for example, RGB histograms, rg-chromaticity histograms or any other color based histogram. The salient regions of annotated image 200-1 may be selected by any method known to those skilled in the art of image retrieval. The visual language annotations 220 are used to generate a visual language vector 224. Visual language vector 224, similar to human language vector 214, is a representation of the word occurrences in the visual language annotations 220 compared to the visual language vocabulary 222. Visual language vocabulary 222 may be represented, for example, by a matrix of vectors. Thus, visual language vector 224 is assigned by finding the closest match of each term of visual language annotations 220 with visual language vocabulary 222 in terms of a calculated distance, for example calculating a Euclidean distance. [0018] The method for generating both the human language vocabulary 212 and visual language vocabulary 222 may be any suitable method known in the art of image retrieval. In an exemplary embodiment, the human language vocabulary 212 and visual language vocabulary 222 were generated using a k-means clustering algorithm applied to a sample of local descriptors picked from a set of training images. Furthermore, in one embodiment, a human language vocabulary 212 and visual language vocabulary 222 are provided that include a multiplicity of terms (e.g., at least a few thousand terms). Finally, human language vectors 214 and visual language vectors 224 are generated for the remaining annotated images 200-2 through 200-n of the small set of annotated images 200.
[0019] Referring back to FIG. 1 , in step 120, cross-language vectors are created by appending the human language vectors and visual language vectors. For example, as illustrated in FIG. 2, the human language vector 214 and visual language vector 224 of annotated image 200-1 are appended to generate a cross-language vector 230. Notably, cross-language vector 230 contains both human language and visual language elements. Similarly, cross-language vectors 230 are generated for the remaining annotated images 200-2 through 200-n of the small set of annotated images 200. [0020] Once all of the cross-language vectors 230 are generated for each of the images 200-1 through 200-n within the small set of annotated images 200, the cross- language vectors 230 are combined into a term-document matrix. FIG. 3 depicts an exemplary embodiment of a term-document matrix 300 in accordance with one or more aspects of the invention. Each column 320-1 to 320-j represents each of the annotated images 200-1 through 200-n, also referred to as documents, of the small set of annotated images 200. Each row 310-1 to 310-i represents a term found in all of the documents. Within term-document matrix 300, are elements ay which represent the frequency of term i in document j. Each element ay is weighted to each element in term- document matrix 300 due to the fact that term-document matrix 300 is usually very sparse because every word does not normally occur in each document. In one embodiment, the weighting is calculated such that: βy = L(i, j) x G(i) (1 ) where L(i,j) represents the local weighting for term i in document j and G(i) is the global weighting for term i. In an exemplary embodiment, log-entropy weighting is used. Log- entropy weighting is defined as:
Figure imgf000007_0001
G(O = 1- ∑ [(W) log (W)] / fog Λ/J (3) where tf\] is the frequency of term i in document j, gf\ is the total number of times term i occurs in the entire collection, and N is the total number of documents in the collection. It should be noted that although a particular weighting process is described above, the present invention is not so limited. Namely, any weighting process or no weighting process can be used in accordance with the requirements of a particular implementation.
[0021] Once the term-document matrix 300 is generated, Latent Semantic Indexing (LSI) is applied in step 130 as depicted in FIG. 1. LSI, as known in the art of image retrieval, is a technique for information retrieval that is related to the vector-space model of information retrieval. In the vector-space model, documents are represented in a multidimensional space, such as for example, term-document matrix 300. LSI takes the vector-space model one stage further by applying linear algebra to attempt to factor out noise and deal with issues of polysemy (words with multiple meaning) and synonymy (different words with the same meaning). LSI works by constructing the term-document matrix 300 and factoring term-document matrix 300 using Singular Value Decomposition (SVD). From the factored data, a rank-/c estimate of the original term- document matrix 300 can be reconstructed that removes much of the noise and reduces the dimensionality, thereby, reducing the computational complexity of performing image search and retrieval.
[0022] FIG. 4 depicts an exemplary embodiment of a result of applying LSI to term- document matrix 300 using SVD in accordance with one or more aspects of the invention. Term-document matrix 300 is decomposed into a product of three separate matrices of vectors. Matrix U 402 represents an i x m matrix of term vectors, where i represents the terms of term-document matrix 300 and m≤ min(i.j). Matrix ∑ 404 represents a m x m diagonal matrix of singular values. Matrix V 406 represents a m x j matrix of document vectors, where j represents the documents of term-document matrix 300.
[0023] The dimensionality of the term-document matrix 300 can be further reduced by using rank-/c approximation of term-document matrix 300 by selecting the /c-largest singular values 412 within matrix ∑ 404. The remaining values within matrix £ 404 are set to zeros. The rows and columns within matrix U 402 and matrix V 406 having these zeros are deleted, thereby, creating an i x k matrix U 402 represented by shaded region 410 and an k x j matrix V 406 represented by shaded region 414. Consequently, the dimensions of term-document matrix 300 are reduced as represented by reduced term- document matrix 400.
[0024] Referring back to FIG. 1 , once LSI is applied and a reduced term-document matrix 400 is generated, remaining images from the entire collection of images may be added to the small set of annotated images 200 collected in step 110 by using a "fold-in" technique in step 140. The remaining images need only to be annotated by visual language vectors. Consequently, the remaining images may be added without the need to semantically annotate each and every remaining image within the entire image collection. Notably, only a small set of the entire collection of images needs to be semantically annotated. [0025] FIG. 5 depicts an exemplary illustration of the "fold-in" technique of step 140 in accordance with one or more aspects of the invention. The reduced decomposed matrices from step 130 are represented as matrix IΛ 502, matrix ∑k 504 and matrix V/ 506. The additional visual language vectors are generated similarly to visual language vector 224, as discussed above. The new vectors are projected into the reduced k- space as: d = d Υυkkrι (4)
wherein dτ is the vector of visual language terms, padded by zeros in place of the unknown human language terms, and d is the projected version of the dτ vector. If weighting is used as discussed above, then the same weighting must first be applied to dτ before projection. The new projected vector is then appended as a new column to the matrix V*τ 506 and represented by shaded region 510. Thus, a complete term- document matrix 500 is created with all the images added via the "fold-in" technique. The additional added images are represented by shaded region 512. [0026] Referring back to FIG. 1 , the image database is now ready to be queried. A query is submitted in step 150. The query may include visual language, human language or a combination of the two. Notably, even though the remaining images added via the "fold-in" technique did not have semantic annotations that contain human language, these images may still be queried using human language, visual language or both. After the query is submitted a query vector is created from the visual language terms, human language terms or both. The query vector is created similar to the way human language vector 214 and visual language vector 224 were created, as discussed above. Furthermore, the query vector is also weighted in the same way as each term a,-s was when creating term-document matrix 300, as discussed above. Finally the query vector is then reduced to k dimensions and represented as follows:
q = 9TU*∑*-' (5)
wherein qτ is the query vector, and q is the projected version of the qτ vector. [0027] Referring back to FIG. 1 , the next step 160 is to compare the query vector of equation (5) against each document represented by the columns of matrix V/ 506, including all the images added via the "fold-in" technique represented by shaded region 510. The comparison is based on the calculated distance, for example Euclidean distance, between the query vector of equation (5) and each column of matrix V*τ 506, including all the images added via the "fold-in" technique represented by shaded region 510.
[0028] Finally, the method ends at step 170 by returning matching results. In an exemplary embodiment, the results may be ranked in order of their distance, with the closest being the most similar.
[0029] FIG. 6 is a block diagram depicting an exemplary embodiment of a computer 600 suitable for implementing the processes and methods described above in accordance with one or more aspects of the invention. The computer 600 includes a processor 601 , a memory 603, various support circuits 604, and an I/O interface 602. The processor 601 may include one or more of any type of microprocessor known in the art. The support circuits 604 for the processor 601 include conventional cache, power supplies, clock circuits, data registers, I/O interfaces, and the like. The I/O interface 602 may be directly coupled to the memory 603 or coupled through the processor 601. The I/O interface 602. The memory 603 may include one or more of the following random access memory, read only memory, magneto-resistive read/write memory, optical read/write memory, cache memory, magnetic read/write memory, and the like, as well as signal-bearing media as described below.
[0030] The memory 603 stores processor-executable instructions and/or data that may be executed by and/or used by the processor 601 as described further below. These processor-executable instructions may comprise hardware, firmware, software, and the like, or some combination thereof. Notably, the processor-executable instructions may be configured to cause the processor to perform the method 100 of FIG. 1. Although one or more aspects of the invention are disclosed as being implemented as processor(s) executing a software program, those skilled in the art will appreciate that the invention may be implemented in hardware, software, or a combination of hardware and software. Such implementations may include a number of processors independently executing various programs and dedicated hardware, such as ASICs. The computer 600 may be programmed with an operating system, which may be OS/2, Java Virtual Machine, Linux, Solaris, Unix, Windows, Windows95, Windows98, Windows NT, and Windows2000, WindowsME, and WindowsXP, among other known platforms. At least a portion of an operating system may be disposed in the memory 603.
[0031] An aspect of the invention is implemented as a program product for execution by a processor. Program(s) of the program product defines functions of embodiments and can be contained on a variety of signal-bearing media (computer readable media), which include, but are not limited to: (i) information permanently stored on non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM or DVD-ROM disks readable by a CD-ROM drive or a DVD drive); (ii) alterable information stored on writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive or read/writable CD or read/writable DVD); or (iii) information conveyed to a computer by a communications medium, such as through a computer or telephone network, including wireless communications. The latter embodiment specifically includes information downloaded from the Internet and other networks. Such signal- bearing media, when carrying computer-readable instructions that direct functions of the invention, represent embodiments of the invention.
[0032] While various embodiments have been described above, it should be understood that they are presented by way of example only, and not limiting. For example, although the invention disclosed herein was discussed in connection with an image with human language and visual language annotations in the exemplary embodiments, one skilled in the art would recognize that the method and system disclosed herein can also be used in connection with other documents containing multiple mixed domain features. Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

What is claimed is:
1. A method of creating an image database, comprising: creating a term-document matrix comprising at least two different domain features; applying latent semantic indexing to the term-document matrix; and adding a plurality of new vectors to a decomposed document matrix of the term- document matrix using a fold-in technique.
2. The method of claim 1 , wherein the term-document matrix comprises a plurality cross-language vectors of the at least two different domain features.
3. The method of claim 2, wherein each of the plurality cross-language vector comprises a combination of the at least two different domain features, wherein at least one of the at least two different domain features is a human language vector and another one of the at least two different domain features is a visual language vector.
4. The method of claim 3, wherein the human language vector represents annotations associated with a small set of annotated images that are generated from an entire image collection.
5. The method of claim 3, wherein the visual language vectors represents a descriptor that describes an image's content as a set of discrete terms.
6. The method of claim 1 , wherein the term-document matrix comprises a plurality of individual elements ay, wherein ay represents the frequency of a term i in a document j.
7. The method of claim 6, wherein each element a-ή is weighted such that: aq = L(ij) x G(i), wherein L(ij) represents the local weighting for term i in document j and G(i) is the global weighting for term i.
8. The method of claim 1 , wherein latent semantic indexing further comprises applying Singular Value Decomposition (SVD) to the term-document matrix.
9. The method of claim 1 , wherein the fold-in technique comprises adding the plurality of new vectors only based on a visual language vector associated with each one of the plurality of new vectors.
10. The method of claim 9, wherein the visual language vector is created based on values of a vocabulary vector that are closest to a plurality of visual language annotations of an image.
11. The method of claim 1 , wherein the fold-in technique further comprises adding the plurality of new vectors by appending a new column for each one of the plurality of new vectors to the decomposed document matrix.
12. The method of claim 1 , wherein the plurality of new vectors represents remaining un-annotated images in an entire image collection.
13. A method of image retrieval, comprising: creating a term-document matrix comprising at least two different domain features; applying latent semantic indexing to the term-document matrix; adding a plurality of new vectors to a decomposed document matrix of the term- document matrix using a fold-in technique; providing a query vector; comparing the query vector against each one of a plurality of document vectors of the decomposed document matrix; and returning a plurality of images that are similar to the query vector.
14. The method of claim 13, wherein the fold-in technique comprises adding the plurality of new vectors only based on a visual language vector associated with each one of the plurality of new vectors.
15. The method of claim 14, wherein the visual language vector is created based on values of a vocabulary vector that are closest to a plurality of visual language annotations of an image.
16. The method of claim 13, wherein the fold-in technique further comprises adding the plurality of new vectors by appending a new column for each one of the plurality of new vectors to the decomposed document matrix.
17. The method of claim 13, wherein the query vector comprises human language vectors, visual language vectors or both human and visual language vectors.
18. The method of claim 13, wherein the comparing step comprises calculating the Euclidean distance between the query vector and each one of the plurality of document vectors of the decomposed document matrix.
19. The method of claim 13, wherein the returning step ranks the plurality of images in order of a calculated distance between each one of the plurality of images and the query vector.
20. A computer-readable medium having stored thereon a plurality of instructions, the plurality of instructions including instructions which, when executed by a processor, cause the processor to perform the steps of a method of image retrieval, comprising: creating a term-document matrix comprising human language vectors and visual language vectors; applying latent semantic indexing to the term-document matrix; adding a plurality of new vectors to a decomposed document matrix of the term- document matrix using a fold-in technique; providing a query vector; comparing the query vector against each one of a plurality of document vectors of the decomposed document matrix; and returning a plurality of images that are similar to the query vector.
21. Apparatus for image retrieval, comprising: means for creating a term-document matrix comprising at least two different domain features; means for applying latent semantic indexing to the term-document matrix; means for adding a plurality of new vectors to a decomposed document matrix of the term-document matrix using a fold-in technique; means for providing a query vector; means for comparing the query vector against each one of a plurality of document vectors of the decomposed document matrix; and means for returning a plurality of images that are similar to the query vector.
PCT/US2006/046394 2006-12-04 2006-12-04 Method and apparatus for improving image retrieval and search using latent semantic indexing WO2008069791A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2006/046394 WO2008069791A1 (en) 2006-12-04 2006-12-04 Method and apparatus for improving image retrieval and search using latent semantic indexing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2006/046394 WO2008069791A1 (en) 2006-12-04 2006-12-04 Method and apparatus for improving image retrieval and search using latent semantic indexing

Publications (1)

Publication Number Publication Date
WO2008069791A1 true WO2008069791A1 (en) 2008-06-12

Family

ID=39492501

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/046394 WO2008069791A1 (en) 2006-12-04 2006-12-04 Method and apparatus for improving image retrieval and search using latent semantic indexing

Country Status (1)

Country Link
WO (1) WO2008069791A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8239334B2 (en) 2008-12-24 2012-08-07 Microsoft Corporation Learning latent semantic space for ranking
US8244711B2 (en) 2009-09-28 2012-08-14 Chin Lung Fong System, method and apparatus for information retrieval and data representation
WO2014197684A1 (en) * 2013-06-05 2014-12-11 Digitalglobe, Inc. System and method for multiresolution and multitemporal image search
US9075846B2 (en) 2012-12-12 2015-07-07 King Fahd University Of Petroleum And Minerals Method for retrieval of arabic historical manuscripts
CN108763244A (en) * 2013-08-14 2018-11-06 谷歌有限责任公司 It searches for and annotates in image
CN109344407A (en) * 2018-10-29 2019-02-15 北京天融信网络安全技术有限公司 Semantic-based document fingerprint construction method, storage medium and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020102021A1 (en) * 2000-12-15 2002-08-01 Pass Gregory S. Representing an image with a posterized joint histogram
US20040205461A1 (en) * 2001-12-28 2004-10-14 International Business Machines Corporation System and method for hierarchical segmentation with latent semantic indexing in scale space
US20040267740A1 (en) * 2000-10-30 2004-12-30 Microsoft Corporation Image retrieval systems and methods with semantic and feature based relevance feedback

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040267740A1 (en) * 2000-10-30 2004-12-30 Microsoft Corporation Image retrieval systems and methods with semantic and feature based relevance feedback
US20020102021A1 (en) * 2000-12-15 2002-08-01 Pass Gregory S. Representing an image with a posterized joint histogram
US20040205461A1 (en) * 2001-12-28 2004-10-14 International Business Machines Corporation System and method for hierarchical segmentation with latent semantic indexing in scale space

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8239334B2 (en) 2008-12-24 2012-08-07 Microsoft Corporation Learning latent semantic space for ranking
US8244711B2 (en) 2009-09-28 2012-08-14 Chin Lung Fong System, method and apparatus for information retrieval and data representation
US9075846B2 (en) 2012-12-12 2015-07-07 King Fahd University Of Petroleum And Minerals Method for retrieval of arabic historical manuscripts
WO2014197684A1 (en) * 2013-06-05 2014-12-11 Digitalglobe, Inc. System and method for multiresolution and multitemporal image search
CN108763244A (en) * 2013-08-14 2018-11-06 谷歌有限责任公司 It searches for and annotates in image
CN108763244B (en) * 2013-08-14 2022-02-01 谷歌有限责任公司 Searching and annotating within images
CN109344407A (en) * 2018-10-29 2019-02-15 北京天融信网络安全技术有限公司 Semantic-based document fingerprint construction method, storage medium and computer equipment
CN109344407B (en) * 2018-10-29 2024-02-09 天融信雄安网络安全技术有限公司 Semantic-based document fingerprint construction method, storage medium and computer equipment

Similar Documents

Publication Publication Date Title
US8065313B2 (en) Method and apparatus for automatically annotating images
US9298682B2 (en) Annotating images
US8126274B2 (en) Visual language modeling for image classification
US8150170B2 (en) Statistical approach to large-scale image annotation
Zhu et al. Theory of keyblock-based image retrieval
JP4295062B2 (en) Image search method and apparatus using iterative matching
US9195738B2 (en) Tokenization platform
US8577882B2 (en) Method and system for searching multilingual documents
US20090112830A1 (en) System and methods for searching images in presentations
US7580910B2 (en) Perturbing latent semantic indexing spaces
Xu et al. Attribute hashing for zero-shot image retrieval
US9805035B2 (en) Systems and methods for multimedia image clustering
JP2014533868A (en) Image search
US7555428B1 (en) System and method for identifying compounds through iterative analysis
WO2008069791A1 (en) Method and apparatus for improving image retrieval and search using latent semantic indexing
Poullot et al. Z-grid-based probabilistic retrieval for scaling up content-based copy detection
Foncubierta-Rodríguez et al. Medical image retrieval using bag of meaningful visual words: unsupervised visual vocabulary pruning with PLSA
CN108304381B (en) Entity edge establishing method, device and equipment based on artificial intelligence and storage medium
US11494431B2 (en) Generating accurate and natural captions for figures
CN109902162B (en) Text similarity identification method based on digital fingerprints, storage medium and device
US20070016567A1 (en) Searching device and program product
Taghva et al. Address extraction using hidden markov models
JP2004046612A (en) Data matching method and device, data matching program, and computer readable recording medium
US11687514B2 (en) Multimodal table encoding for information retrieval systems
US20050060308A1 (en) System, method, and recording medium for coarse-to-fine descriptor propagation, mapping and/or classification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 06839006

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06839006

Country of ref document: EP

Kind code of ref document: A1