METHOD FOR INDEXING. SEARCHING FOR. AND RETRIEVING IMAGES ON A COMPUTER
Rafael A. Serrano, Mark S. Zeldis, Eric A. Serrano
CROSS-REFERENCE TO RELATED APPLICATION
The present application claims the benefit of U. S. provisional patent
application no. 60/227,870, filed on August 25, 2000, and entitled "Method and
System for Searching for and Retrieving Images on a Computer, " which is hereby
incorporated by reference.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to methods for indexing, searching
for, and retrieving digital images on a computer, including images contained on the
Internet.
2. Description of Related Art
Most currently available methods for indexing, and searching for images on
a computer, including on the Internet, rely on text and are not image-based; they
require a text-based query. The methods for conducting image-based searches that
are available rely upon attributes of an image such as color and contour and are not
narrowly tailored. Thus, the existing methods are cumbersome, time-consuming,
and space-intensive, and they often turn up images that are not relevant and may
miss images that are being sought. As such, existing methods are not
comprehensive or thorough. Accordingly, there is a need for a method of indexing
and searching for digital images which is image-based and is quicker and more
accurate than the existing methods.
SUMMARY OF THE INVENTION
The present invention includes a method for indexing a digital image by
establishing a numerical identifier for the image.
An image is divided into a number of sections. The average pixel intensity
of each section is determined. The average pixel intensities of two sections are
compared, and it is determined whether the first section is greater than the
second section. Depending on the outcome, the bit is assigned to a " 1 " or a "0. "
The comparisons are repeated until the average pixel intensity of each section has
been sequentially and separately compared to the average pixel intensity of at least
one other section, assigning the bit consistently. An identifier or signature
consisting of a series of all of the bits is established. The bits are stored as a long.
The invention also includes a method for searching for an image on a
computer. In this method, a target signature is established for an image to be
searched, where an indicator, instead of a " 1 " or a "0, " is assigned if the intensities
are within a specified tolerance. Then the target signature is compared to a
database of signatures for other images and locations of the images. The indicators
are treated as both a "1" and a "0" for the comparison. A match is determined if
a signature is identical to the target signature, where each indicator is treated as
both a "1" and a "0." The method for searching can also be performed without
utilizing an existing database.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is a flowchart of the steps of the method for establishing a signature
for an image.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The invention includes a method for indexing a digital image, i.e.,
establishing a "signature" or numerical identifier for the image. Using the method,
a database or index may be created that contains at least one signature for an
image. The invention also includes a method for searching for an image on a
computer where the image to be searched for has been given a signature. The
invention can declare two images to be identical if they appear to be the same to
the human eye.
Fig. 1 is a flowchart of the steps of the method for establishing a signature
for an image. The method can be used for many types of images, including
graphics, photographs, and streaming media. The images may be in color or black
and white, and need not be of the highest quality. (In a preliminary step, if the
image is in color, it is converted to black and white). In step 100, the image is
divided into a number of sections, where the number of sections is "n. " This may
be satisfactorily done using a grid, such as an 8 by 8 grid having 64 squares. Other
ways of sectioning and other sizes of grids will work as well, including using
sections that are not formed by a grid or that are overlapping.
In step 110, for each section, the average pixel intensity ("I") of the portion
of the image falling within that section is determined.
In step 120, two sections, "x" and "y," are selected, and the average pixel
intensity of the portion of the image contained in section "x" is compared to the
average pixel intensity of the portion of the image contained in section "y . " If, in
step 130, the average pixel intensity of "x" is greater than the average pixel
intensity of "y, " then, in step 140, the bit is assigned to a " 1. " If, in step 130, the
average pixel intensity of "x" is not greater than the average pixel intensity of "y , "
then, in step 150, the bit is assigned to a "0. " The method can work just as well
if the assignment of bits is reversed; i.e., if the average pixel intensity of "x" is
greater than the average pixel intensity of "y, " the bit is assigned to "0, " and if the
average pixel intensity of "x" is not greater than the average pixel intensity of "y , "
the bit is assigned to " 1. " The bits must be assigned in a consistent manner,
however, such that all comparisons for all images are done in the same way.
If, in step 160, it is determined that the signature, or number of bits, is long
enough, then the process is complete. If, in step 160, it is determined that the
signature is not long enough, then the process is repeated from step 120, until the
average pixel intensity of each section has been sequentially and separately
compared to the average pixel intensity of at least one other section. The length
of the signature is selected in advance and must be at least as long as one-half of
the number of sections n. For example, if the grid consists of 64 squares, and each
square is compared to two other squares, then the signature will consist of 64 bits,
a series of 64 ones and zeroes. If the grid consists of 16 sections and each section
is compared to one other section, then the signature will consist of eight bits. The
bits are stored as a long.
The selection, in step 120, of the two sections to be compared is preferably
conducted in a random (but consistent for all images) fashion. For example, "x"
would be section 1 and "y " would be section 47, then "x" would be section 47 and
"y" would be section 12, and so forth until the average pixel intensity of each
section has been separately compared to the average pixel intensities of two other
sections. Thus, section 47 has been compared to section 1 and to section 12.
Alternatively, the selection of the sections to be compared can be conducted in a
numerical fashion such that adjacent sections are compared. For example, "x"
would be section 1 and "y" would be section 2, then "x" would be section 2 and
"y" would be section 3, and so on. Thus, section 2 has been compared to section 1
and to section s.
It is preferable to compare each section to two other sections as shown in the
above two examples; however, a comparison to one other section, three other
sections, or any other number of sections will also work. For example, if each
section is compared to one other section, and adjacent sections are selected, the
comparisons would be: section 1 to section 2, section 3 to section 4, section 5 to
section 6, and so forth. If the selection of the sections to be compared is done
randomly, the comparisons would be, for example: section 1 to section 47,
section 12 to section 50, section 24 to section 7, and so forth.
The method of the invention can be used to create a database containing the
signatures of multiple images. Such a database can also include the locations where
each image can be found. The database could be organized by the signature, but
may be organized in other fashions as well.
The invention also includes a method for a user to conduct a search for an
image on a computer. The image to be searched for may be obtained from a
number of different sources, including the Internet or any other computer on a
network, scanned into a computer, taken from a CD, diskette, or digital camera,
or any other means by which an image can be imported into a computer. In
addition, if the image to be searched for is on a network, such as the Internet, it
need not be saved, but can be searched for by using its location information (e.g. ,
URL). A search can be done using an existing database containing signatures of
images and the locations of the images or without using an existing database.
The following are the steps to conduct a search for an image utilizing a
database with signatures of images and the locations of the images. In the first
step, the user inputs the image to be searched, called the target image. In the
second step, the target image is given a signature as described above, except that,
for all comparisons of the average pixel intensities of two sections, if the intensities
are within a specified tolerance, an indicator, for example, an "*, " is assigned,
instead of a " 1 " or "0. " Thus, the signature for the target image, called the target
signature, may include indicators, as well as ones and zeros. In the third step, the
target signature is compared to the database of signatures and locations. The
comparison is preferably performed using a brute force search through the
database. Any indicators in the target signature are treated as both a " 1 " and a "0, "
allowing for more matching. A signature is determined to be a "match" to the
target signature if it is identical to the target signature, where each indicator is
treated as both a "1" and a "0. "
A search may also be conducted without using an existing database, instead
utilizing a crawler or "bot" that searches a computer or computers for images. The
computer may be a local computer not connected to a network or a computer
connected to a network, such as the Internet, an intranet, or other type of network.
In that case, the bot compares the target signature to the signature of each image
it encounters as it crawls through the network.
The method also includes providing a specified tolerance for the search
which can be preset or selected by the user. In other words, the search may be
limited to images having signatures within a certain tolerance (degree of difference)
of the target signature. For example, if the search is set for ten percent tolerance,
it will return only images having signatures that are ten percent or less different
from the target signature, and the results of the search will only include those
images. The search can also be limited to a minimum number of results (either
preset or set by the user). In that case, if the minimum number of results is not
obtained at a certain tolerance, the tolerance will increase until the minimum
number of results is obtained.
The results of the search may be presented, as a list, by location, of the
signatures in the database that match the target signature. Thus, these signatures
identify the images that match the target image. The list of matching images is
preferably shown in decreasing order of similarity, such that the closest match is
listed first. The user may then locate any of the matched images by its location
information (e.g., for an image on the Internet, by its URL).
The invention has been described above with reference to the preferred
embodiments. Those skilled in the art may envision other embodiments and
variations of the invention that fall within the scope of the claims.