WO2001008083A1 - Hyperimage system - Google Patents

Hyperimage system Download PDF

Info

Publication number
WO2001008083A1
WO2001008083A1 PCT/AU2000/000891 AU0000891W WO0108083A1 WO 2001008083 A1 WO2001008083 A1 WO 2001008083A1 AU 0000891 W AU0000891 W AU 0000891W WO 0108083 A1 WO0108083 A1 WO 0108083A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
images
grouping
signs
nodes
Prior art date
Application number
PCT/AU2000/000891
Other languages
French (fr)
Inventor
Ruben Gonzalez
Original Assignee
Griffith University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Griffith University filed Critical Griffith University
Priority to AU59552/00A priority Critical patent/AU762537B2/en
Publication of WO2001008083A1 publication Critical patent/WO2001008083A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5854Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using shape and object relationship
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • G06F16/94Hypermedia
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • G06V10/422Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation for representing the structure of the pattern or shape of an object therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Definitions

  • This invention relates to a method for processing images into a form suitable for hyperlinking.
  • it relates to a process for generating hyperlinks between images in a set of images.
  • the invention provides a hyper- image system that facilitates uninhibited navigation between images.
  • Hyperlinking is not restricted to the Web but is often found in, for example, large text documents and electronic encyclopedias. Hyperlinking is intrinsic to the creation of multimedia works. Hyperlinking has been, in essence, restricted to hypertext linking.
  • hypertext mark-up language is a system for linking text in a document to another location.
  • Systems such as HTML provide extensions that allow for linking to and from images. However, this is achieved by production of a linkmap overlay. The overlay is manually generated and specifies hot regions serving as anchor points. While providing the semblance of image linking, the linking is not intrinsic to the image. For the production of true hype ⁇ inked multimedia systems, there is a need for a hyperimage system that provides hyperlinking of images in similar manner to existing hypertext systems.
  • Substitute Sheet (Rule 26) RO/AU It is also desirable for a hyperimage system to be able to process a set of images automatically.
  • Automatic hypertext systems have been developed to address the heavy authoring load associated with producing a large multimedia document. Although existing systems are somewhat crude, generally requiring human intervention to achieve an aesthetically acceptable result, they nonetheless reduce the authoring overhead. In the case of images however, the highly abstract nature of the information conveyed intensifies the role of the authoring system, since semantic interpretation of the data is required in addition to the extra handling complexity due to its multifarious nature. While this means that the development of an automatic authoring process is much more difficult for hyperimages, it also means that the introduction of automatic image hyperlinking is much more advantageous.
  • Node authoring involves preparing the raw or unstructured source information into the desired nodes. Each node typically encapsulates a single concept or semantic entity.
  • Link authoring is the process of forming relationships between the nodes by creating links and anchors.
  • Presentation authoring defines the rendition' of the data including such things as layout, font selection and formatting both within individual nodes and for the overall system.
  • Objects, or nodes are a specific grouping of semiotic signs. Each sign is made up of sub-signs having no semantic value but which can be identified as homogeneous regions demarcated by either observed or implied luminance edges.
  • the invention resides in a process for generating hyperlinks between images in a set of images including the steps of: processing each image to identify nodes within the image; characterising each node in terms of a discrete set of attributes; compiling a searchable metadata file of all sets of attributes for all images; searching the metadata file to identify a node having attributes most closely related to attributes of a selected node; and forming a link between the identified node and the selected node.
  • the step of identifying nodes further includes the steps of: isolating elementary components in an image; local grouping of the components into signs; and global grouping of the signs into objects that collectively describe the image.
  • the step of isolating elementary components may suitably be performed by a region growing technique in which pixels are added to a region according to a homogeneity criterion and edge information.
  • the step of local grouping preferably includes the steps of segmenting the components and edge analysis.
  • Gestalt grouping principles are preferably employed.
  • the step of forming links between identified nodes and selected nodes may occur dynamically or statically.
  • FIG 1 is a schematic of a first embodiment of a hyper-image system
  • FIG 2 is a schematic of a second embodiment of a hyper-image system
  • FIG 3 shows the steps involved in object demarcation
  • FIG 4 shows an original set of images to be used for exemplifying the invention
  • FIG 5 shows the effect of segmentation of one of the images of FIG 4
  • FIG 6 shows the steps involved in third phase grouping
  • FIG 7 shows the effect of segmentation and edge enhancement
  • FIG 8 shows the effect of segmentation and grouping.
  • the components in the hyper-image system are partitioned into three main sections.
  • the first section deals with object metadata file population, the second with link creation and the third deals mainly with user interface and navigation support.
  • the object metadata file population occurs in batch mode with the user supplying a large set of images to the system which processes them automatically by identifying objects and extracting their attributes and structure and storing these in the object metadata file.
  • users are free to surf through images by simply clicking anywhere within the displayed image area.
  • a set of images 1 are processed 2 to identify nodes (objects).
  • the nodes are further analysed 3 to extract object attributes for populating the node metadata file 4.
  • An associative linking process 5 is automatically performed according to the similarity measures described in a configuration file 6 that determines the relative weight to be accorded to each of the object's features such as shape, colour, structure etc.
  • This linking stage is performed in batch mode, forming a static hyperweb for the image data set. After its completion authors are free to manually edit the links according to their needs.
  • Hit testing and object matching is performed 7 to confirm the hyper-image linking for the user interface presentation 8.
  • the user 9 is then free to surf through the images 1 by clicking anywhere within the displayed image area. This process isolates the node demarcation from the presentation layer so that it is not dependent on the user interface.
  • the images 1 are processed 2 and analysed 3 to populate a node metadata file 4, in the manner described above.
  • Hit testing 7 is performed on the displayed image to determine what object the user selected and its properties are retrieved from the metadata file 4 to search against. Once the search criteria have been retrieved, a full search is made of the metadata file 4 to identify the image that contains the most
  • Substitute Sheet Rule 26 RO/AU similar matching object. The image that contains that object is then loaded and displayed ready for the user to continue surfing.
  • Navigation in the second embodiment is significantly slower than the first but affords the user the ability to reconfigure the attribute weightings for each search.
  • right clicking on any image object will bring up a list of possible link destinations (image objects) ranked in order of similarity to the current object. This allows users more control over navigation by providing multiple target destinations.
  • This mode also permits users to start navigation from an object in an image that is supplied by the user at run-time and hence has not been registered with the system. In this case the system first demarcates the boundaries of the object clicked on by the user and extracts object attributes to be used to search against.
  • the significant nodes within an image must be identified.
  • the nodes must be significant in terms of human vision rather than conventional machine vision. The aim is not to recognise an image, but rather to generate nodes that are likely to be significant to a human viewer so that when a viewer clicks a part of the image a sensible hyperlink occurs.
  • luminance boundaries The most important perceptual cues in images are luminance boundaries. Special cells exist in the brain to detect luminance boundaries and to perform complex processing in the presence of occluded contours. One reason for their importance in perception is that they are used to define regions in the perceived image, the properties of which are often extrapolated from luminance edges via a filling-in mechanism. Hence, the basic building blocks, or sub-signs, in an image are homogenous image regions that are demarcated by either observed or implied luminance edges. Signs are groups of these regions having specific properties in relation to each other. Semantic objects in an image are specific groupings of these signs.
  • the method of identifying nodes consists of three distinct phases.
  • the first phase involves image segmentation by isolating elementary components from the data that are not in themselves significant.
  • the data may be any suitable image data including captured video data.
  • the second phase requires local grouping of the components into
  • Substitute Sheet RO/AU significant entities are not necessarily meaningful of themselves and must undergo a third global grouping phase to produce objects with the capacity to be truly meaningful in a semantic sense. At no stage in the process is any meaning or interpretation attached to any image component. Further description will be made with reference to various images. The original images are shown in FIG 4. Later figures display one or more of these images after various stages of processing.
  • a region growing technique in the HVC colour space is preferred for the first phase.
  • Other techniques that could be used include quad-tree split and merge, histogram splitting, and watershed algorithms. These techniques will be known to persons skilled in the art.
  • the region growing starts by selecting a pixel to which neighbouring pixels are joined until no more pixels can be added according to some homogeneity criterion.
  • a suitable homogeneity criteria is that the intensity difference between the pixel and the closest region boundary pixel is below some threshold.
  • the growing procedure is started from the top left corner of the image and the growing process is performed continuously until no more pixels can be added. When this occurs, a new region is then created by starting on the closest unassigned pixel. This continues until all pixels in the whole image have been assigned to a sub-sign.
  • the HVC colour space was used instead of luminance images alone because images with low luminance contrast are difficult to segment correctly unless chrominance or saturation variations are exploited.
  • the RGB colour space provides no advantages over the luminance alone since the RGB values are highly correlated and don't truly reflect the magnitude of perceived colour variations.
  • the HVC colour space is well recognised for its successful correspondence to human colour perception. It represents colour in terms of hue (H), value or intensity (V) and chroma or saturation (C) that describes the colour purity.
  • FIG 5 shows the effect of segmenting each of these components separately for the fourth image of FIG 4.
  • FIGs 5a-f show respectively hue component, value component, chroma component, hue segmentation, value segmentation and chroma segmentation.
  • the second phase groups the sub-signs according to local criteria and exploits the Gestalt grouping principles of similarity, proximity, good continuation and closure. These principles are applied by analysing the nature of a number of characteristics of neighbouring sub-signs and the boundaries between them.
  • the second phase can be considered in two stages.
  • the first grouping stage is a region labelling and connected component analysis process that simultaneously eliminates segmentation artifacts and performs additional texture-based grouping. For example, areas that are smaller than 20 pixels are considered to be insignificant and are merged into surrounding areas. Adjacent areas that are also less than 20 pixels are grouped into a textured region. The selection of 20 pixels is entirely arbitrary.
  • Substitute Sheet A candidates for grouping together Regions in this formation are generally understood to be part of the same object.
  • the final outcome is a robust set of signs consisting of homogeneous regions.
  • FIG 7a The images after the first stage of segmentation are shown in FIG 7a and after the second stage in FIG 7b.
  • FIG 8a shows the effect on the original images of FIG 4 of the first phase segmentation process. After second phase grouping, the images appear as in FIG 8b.
  • the third phase globally applies the remaining Gestalt grouping principles of common fate, relative size, symmetry and surroundedness, as well as good continuity, to the signs produced by the second phase. For example, regions that are completely surrounded by another region are often part of the surrounding object and may be grouped together. Regions that are symmetrical about a given axis can be considered to be components of a single object.
  • the outcome of this stage is an image structure tree that describes the image in terms of its component objects made up of signs that are groups of sub-signs.
  • the image structure tree is the data for the node metadata file.
  • the objects are composed of one or more signs that are represented by each node.
  • the child nodes store the sub-signs while the relationships between the child nodes is stored in the parent node.
  • Each entry in the tree is labelled with characteristic features of the image object it identifies, such as average colour, colour variance, shape chain code, size and its relationship to other significant objects in the image. The result being that the original image can be reconstructed with a high degree of accuracy from the metadata file.
  • the structure tree represents a generalised scene description of the original image.
  • the structure tree permits the image to be analysed at various levels of information in terms of component objects.
  • the objects may not be strongly semantically significant in themselves, they may just be a significant portion of a semantic object. For instance, a face can be identified as three smaller regions embedded in a larger oval shaped region with particular size and distance relationships between them.
  • Substitute Sheet le 26 RO/AU
  • Each of the objects defined in the image structure tree can now be linked with either other objects in the same tree or with objects in other images if these have been similarly processed.
  • the associative links are predominantly formed by finding similar objects in other images with matching properties.
  • the properties considered include average object colour and texture, object shape, relationships to other objects and object structural composition.
  • Various weightings are applied to these properties in the matching process to produce different linking outcomes. Normally, shape is assigned the highest weighting since it plays a more significant role in cognitive similarity matching.
  • each object is defined using Fourier shape descriptors that are invariant to orientation, scale and translation.
  • the process of obtaining the Fourier shape descriptor for an object involves first locating the centroid of the object, then mapping its shape using polar coordinates and finally applying a Fourier transform to the data.
  • a discrete cosine transform may be used in place of the Fourier transform due to its superior data compaction properties and because it is a purely real transform.
  • each object is defined by a binary mask, P(x, y) of M columns and N rows in size
  • the centroid C xy of an object is found by finding the centroid in each of the x and y directions separately.
  • the centroid After the centroid is located then the distance of the shape boundary from the centroid is mapped out using polar coordinates at discrete angular intervals k to obtain n total shape samples.
  • the values are placed in an array b(k) and a one dimensional discrete cosine transform (DCT) is applied to it.
  • the resulting transform coefficients form the final shape vector.
  • the DCT is defined as:
  • Substitute Sheet (Rule 26) RO/AU
  • the average colour for each region is stored in the HVC colour components, as these allow associations that are more cognitively correct based on the perceptual properties of hue, value and chroma.
  • the definition of the HVC colour space is given in terms of the CIE Lab space that is given in terms of the XYZ space as follows:
  • the similarity matching process is performed by selecting an initial image object and performing a weighted mean square error analysis between its attributes and the corresponding attributes of all other image objects stored in the system.
  • the object that generates the lowest mean square error against the current object is selected for linking.
  • a list of links that refer to the current object is maintained and checked against. If the matched object already has a link that refers to the current object then that link is rejected and the link is instead formed to the object with the second lowest means square error subject to the same condition. If no matches are found to be below a certain acceptance threshold then the object in question remains unlinked.
  • the weighted mean square error criterion is defined for the original data set x and the comparison set x defined at each location j over a total of n values as follows:
  • the system performs this matching using only low- level statistically defined features. This works reasonably well for objects that are simple and fairly homogenous. In the case of complex object formed out of reasonably dissimilar regions this approach is less effective. In these situations when an object is not particularly homogenous the saliency of its statistical attributes are significantly reduced. In their stead however these compound objects tend to exhibit structural saliency.
  • the object structure can be used for similarity matching.
  • the matching involves finding objects that are defined similarly in terms of the relationships between their component regions.
  • the similarity matching no longer involves comparing the values of individual absolute features but rather the relative values of a number of features. This is done by directly comparing the object structure trees of the object that contain these relationships. At this level the associations supported by the system are no longer purely feature analytic but are well on their way to being semantic, based on the structural properties of the image content.
  • Substitute Sheet and signs can operate in either static or dynamic mode and is based on object similarity matching using shape, colour and relational or structural features to form associations among objects in the image sets.
  • the navigation support system is capable of automatically providing a list of multiple alternative targets that the user may wish to pursue.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Business, Economics & Management (AREA)
  • Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method of hyperlinking images in a set of images by generating a structure tree for each image and matching attributes of an image to identify similar structure trees. The structure tree is generated by processing each image to identify nodes and compiling the nodes into the tree in the form of a searchable metadata file. The method generates direct hyperlinking between images without overlays.

Description

HYPERIMAGE SYSTEM
This invention relates to a method for processing images into a form suitable for hyperlinking. In particular, it relates to a process for generating hyperlinks between images in a set of images. The invention provides a hyper- image system that facilitates uninhibited navigation between images.
BACKGROUND TO THE INVENTION The concept of hyperlinking is familiar to people who routinely access the
World Wide Web (the Web). A hyperlink is an active redirection from one location to another location, usually related. Hyperlinking is not restricted to the Web but is often found in, for example, large text documents and electronic encyclopedias. Hyperlinking is intrinsic to the creation of multimedia works. Hyperlinking has been, in essence, restricted to hypertext linking.
Although existing hypertext systems support images, they are normally only embedded in the body of the text in a non-interactive way. In some cases, particularly multimedia encyclopedias, images (static and video) are destinations that are not forward linkable. In some cases it is possible to link forward from an image by treating the image as an iconic button, but this does not provide hyperlinking from the image medium. This constraint is due to images being unstructured bit arrays that do not have internally referenceable components for the formation of hyperlinks.
The familiar hypertext mark-up language (HTML) is a system for linking text in a document to another location. Systems such as HTML provide extensions that allow for linking to and from images. However, this is achieved by production of a linkmap overlay. The overlay is manually generated and specifies hot regions serving as anchor points. While providing the semblance of image linking, the linking is not intrinsic to the image. For the production of true hypeήinked multimedia systems, there is a need for a hyperimage system that provides hyperlinking of images in similar manner to existing hypertext systems.
Substitute Sheet (Rule 26) RO/AU It is also desirable for a hyperimage system to be able to process a set of images automatically. Automatic hypertext systems have been developed to address the heavy authoring load associated with producing a large multimedia document. Although existing systems are somewhat crude, generally requiring human intervention to achieve an aesthetically acceptable result, they nonetheless reduce the authoring overhead. In the case of images however, the highly abstract nature of the information conveyed intensifies the role of the authoring system, since semantic interpretation of the data is required in addition to the extra handling complexity due to its multifarious nature. While this means that the development of an automatic authoring process is much more difficult for hyperimages, it also means that the introduction of automatic image hyperlinking is much more advantageous.
The process of authoring (or creating) hypermedia documents requires three distinct phases. These phases are node, link and presentation authoring. Node authoring involves preparing the raw or unstructured source information into the desired nodes. Each node typically encapsulates a single concept or semantic entity. Link authoring is the process of forming relationships between the nodes by creating links and anchors. Presentation authoring defines the rendition' of the data including such things as layout, font selection and formatting both within individual nodes and for the overall system.
In the following description, a convention is adopted that an image can be described by a collection of objects, or nodes. Objects, or nodes, are a specific grouping of semiotic signs. Each sign is made up of sub-signs having no semantic value but which can be identified as homogeneous regions demarcated by either observed or implied luminance edges.
OBJECT OF THE INVENTION
It is an object of the invention to provide a method for processing an image into a hyperiinkable form. It is a further object of the invention to provide a process for generating hyperlinks between images in a set of images.
Other objects will be evident from the following discussion.
Substitute Sheet SUMMARY OF THE INVENTION
In one form, although it need not be the only, or indeed the broadest, form the invention resides in a process for generating hyperlinks between images in a set of images including the steps of: processing each image to identify nodes within the image; characterising each node in terms of a discrete set of attributes; compiling a searchable metadata file of all sets of attributes for all images; searching the metadata file to identify a node having attributes most closely related to attributes of a selected node; and forming a link between the identified node and the selected node.
Suitably, the step of identifying nodes further includes the steps of: isolating elementary components in an image; local grouping of the components into signs; and global grouping of the signs into objects that collectively describe the image. The step of isolating elementary components may suitably be performed by a region growing technique in which pixels are added to a region according to a homogeneity criterion and edge information.
The step of local grouping preferably includes the steps of segmenting the components and edge analysis. Gestalt grouping principles are preferably employed.
The step of forming links between identified nodes and selected nodes may occur dynamically or statically.
BRIEF DESCRIPTION OF THE DRAWINGS Preferred embodiments of the invention are described with reference to the following figures in which:
FIG 1 is a schematic of a first embodiment of a hyper-image system;
FIG 2 is a schematic of a second embodiment of a hyper-image system;
FIG 3 shows the steps involved in object demarcation; FIG 4 shows an original set of images to be used for exemplifying the invention;
FIG 5 shows the effect of segmentation of one of the images of FIG 4;
FIG 6 shows the steps involved in third phase grouping;
Substitute Shee Rule 26) RO/AU FIG 7 shows the effect of segmentation and edge enhancement; and FIG 8 shows the effect of segmentation and grouping.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT The components in the hyper-image system are partitioned into three main sections. The first section deals with object metadata file population, the second with link creation and the third deals mainly with user interface and navigation support. Typically the object metadata file population occurs in batch mode with the user supplying a large set of images to the system which processes them automatically by identifying objects and extracting their attributes and structure and storing these in the object metadata file. When complete, users are free to surf through images by simply clicking anywhere within the displayed image area.
In a first operating mode, depicted in FIG 1 , a set of images 1 are processed 2 to identify nodes (objects). The nodes are further analysed 3 to extract object attributes for populating the node metadata file 4. An associative linking process 5 is automatically performed according to the similarity measures described in a configuration file 6 that determines the relative weight to be accorded to each of the object's features such as shape, colour, structure etc. This linking stage is performed in batch mode, forming a static hyperweb for the image data set. After its completion authors are free to manually edit the links according to their needs. Hit testing and object matching is performed 7 to confirm the hyper-image linking for the user interface presentation 8. As mentioned above, the user 9 is then free to surf through the images 1 by clicking anywhere within the displayed image area. This process isolates the node demarcation from the presentation layer so that it is not dependent on the user interface.
In a second operating mode, displayed in FIG 2, the images 1 are processed 2 and analysed 3 to populate a node metadata file 4, in the manner described above. Hit testing 7 is performed on the displayed image to determine what object the user selected and its properties are retrieved from the metadata file 4 to search against. Once the search criteria have been retrieved, a full search is made of the metadata file 4 to identify the image that contains the most
Substitute Sheet Rule 26) RO/AU similar matching object. The image that contains that object is then loaded and displayed ready for the user to continue surfing.
Navigation in the second embodiment is significantly slower than the first but affords the user the ability to reconfigure the attribute weightings for each search. In a further embodiment, right clicking on any image object will bring up a list of possible link destinations (image objects) ranked in order of similarity to the current object. This allows users more control over navigation by providing multiple target destinations. This mode also permits users to start navigation from an object in an image that is supplied by the user at run-time and hence has not been registered with the system. In this case the system first demarcates the boundaries of the object clicked on by the user and extracts object attributes to be used to search against.
In order to process an image into a hyperlinked form, the significant nodes within an image must be identified. The nodes must be significant in terms of human vision rather than conventional machine vision. The aim is not to recognise an image, but rather to generate nodes that are likely to be significant to a human viewer so that when a viewer clicks a part of the image a sensible hyperlink occurs.
The most important perceptual cues in images are luminance boundaries. Special cells exist in the brain to detect luminance boundaries and to perform complex processing in the presence of occluded contours. One reason for their importance in perception is that they are used to define regions in the perceived image, the properties of which are often extrapolated from luminance edges via a filling-in mechanism. Hence, the basic building blocks, or sub-signs, in an image are homogenous image regions that are demarcated by either observed or implied luminance edges. Signs are groups of these regions having specific properties in relation to each other. Semantic objects in an image are specific groupings of these signs.
Referring to FIG 3, it can be seen that the method of identifying nodes consists of three distinct phases. The first phase involves image segmentation by isolating elementary components from the data that are not in themselves significant. The data may be any suitable image data including captured video data. The second phase requires local grouping of the components into
Substitute Sheet RO/AU significant entities (signs). These signs are not necessarily meaningful of themselves and must undergo a third global grouping phase to produce objects with the capacity to be truly meaningful in a semantic sense. At no stage in the process is any meaning or interpretation attached to any image component. Further description will be made with reference to various images. The original images are shown in FIG 4. Later figures display one or more of these images after various stages of processing.
A region growing technique in the HVC colour space is preferred for the first phase. Other techniques that could be used include quad-tree split and merge, histogram splitting, and watershed algorithms. These techniques will be known to persons skilled in the art.
In the preferred embodiment, the region growing starts by selecting a pixel to which neighbouring pixels are joined until no more pixels can be added according to some homogeneity criterion. A suitable homogeneity criteria is that the intensity difference between the pixel and the closest region boundary pixel is below some threshold. The growing procedure is started from the top left corner of the image and the growing process is performed continuously until no more pixels can be added. When this occurs, a new region is then created by starting on the closest unassigned pixel. This continues until all pixels in the whole image have been assigned to a sub-sign.
The HVC colour space was used instead of luminance images alone because images with low luminance contrast are difficult to segment correctly unless chrominance or saturation variations are exploited. The RGB colour space provides no advantages over the luminance alone since the RGB values are highly correlated and don't truly reflect the magnitude of perceived colour variations. The HVC colour space is well recognised for its successful correspondence to human colour perception. It represents colour in terms of hue (H), value or intensity (V) and chroma or saturation (C) that describes the colour purity. FIG 5 shows the effect of segmenting each of these components separately for the fourth image of FIG 4. FIGs 5a-f show respectively hue component, value component, chroma component, hue segmentation, value segmentation and chroma segmentation.
Substitute Shee (Rule 26) RO/AU The combined HVC colour segmentation method based on region growing uses a cylindrical colour distance measure in place of a simple intensity difference. The colour distance between two pixels / and j (which may be normalised appropriately) is given by the following (where vit c„ h,, and v,-, c;, hJt are the value, chroma and hue at pixel locations andy): d,j = TJ(V , - V J )2 + (c, sin (h, )- cy sin (/?y ))2 + (c, cos (h, )- cy cos (hj ))2
In one embodiment, the threshold is set automatically using a linear combination of the mean and standard deviation of the intensity gradient values in the source image according to the following equation: τ = 0 + kxμ + k2σ where μ is the mean, σ is the standard deviation, and the constants ko, k., and k2 are obtained experimentally to be 3.6, 0.36 and -0.143 respectively, using least square optimisation from the sample set of images shown in FIG 4. The second phase groups the sub-signs according to local criteria and exploits the Gestalt grouping principles of similarity, proximity, good continuation and closure. These principles are applied by analysing the nature of a number of characteristics of neighbouring sub-signs and the boundaries between them. The second phase can be considered in two stages. The first grouping stage is a region labelling and connected component analysis process that simultaneously eliminates segmentation artifacts and performs additional texture-based grouping. For example, areas that are smaller than 20 pixels are considered to be insignificant and are merged into surrounding areas. Adjacent areas that are also less than 20 pixels are grouped into a textured region. The selection of 20 pixels is entirely arbitrary.
In the second grouping stage, line information from edge detection is combined with the region information through Gestalt based grouping rules. The process is depicted in FIG 6. Firstly, regions are checked to see if they are separated by any edges. Regions that are separated by lines usually belong to different objects, hence they can not be grouped together. However regions that share a continuous line between them along a common side are
Substitute Sheet A candidates for grouping together. Regions in this formation are generally understood to be part of the same object. The final outcome is a robust set of signs consisting of homogeneous regions. The images after the first stage of segmentation are shown in FIG 7a and after the second stage in FIG 7b. FIG 8a shows the effect on the original images of FIG 4 of the first phase segmentation process. After second phase grouping, the images appear as in FIG 8b.
The third phase globally applies the remaining Gestalt grouping principles of common fate, relative size, symmetry and surroundedness, as well as good continuity, to the signs produced by the second phase. For example, regions that are completely surrounded by another region are often part of the surrounding object and may be grouped together. Regions that are symmetrical about a given axis can be considered to be components of a single object. The outcome of this stage is an image structure tree that describes the image in terms of its component objects made up of signs that are groups of sub-signs. The image structure tree is the data for the node metadata file.
The objects are composed of one or more signs that are represented by each node. The child nodes store the sub-signs while the relationships between the child nodes is stored in the parent node. Each entry in the tree is labelled with characteristic features of the image object it identifies, such as average colour, colour variance, shape chain code, size and its relationship to other significant objects in the image. The result being that the original image can be reconstructed with a high degree of accuracy from the metadata file. The structure tree represents a generalised scene description of the original image.
It will be appreciated that the structure tree permits the image to be analysed at various levels of information in terms of component objects. The objects may not be strongly semantically significant in themselves, they may just be a significant portion of a semantic object. For instance, a face can be identified as three smaller regions embedded in a larger oval shaped region with particular size and distance relationships between them.
Substitute Sheet le 26) RO/AU Each of the objects defined in the image structure tree can now be linked with either other objects in the same tree or with objects in other images if these have been similarly processed. The associative links are predominantly formed by finding similar objects in other images with matching properties. The properties considered include average object colour and texture, object shape, relationships to other objects and object structural composition. Various weightings are applied to these properties in the matching process to produce different linking outcomes. Normally, shape is assigned the highest weighting since it plays a more significant role in cognitive similarity matching.
The shape of each object is defined using Fourier shape descriptors that are invariant to orientation, scale and translation. The process of obtaining the Fourier shape descriptor for an object involves first locating the centroid of the object, then mapping its shape using polar coordinates and finally applying a Fourier transform to the data. By way of example, a discrete cosine transform may be used in place of the Fourier transform due to its superior data compaction properties and because it is a purely real transform. Assuming each object is defined by a binary mask, P(x, y) of M columns and N rows in size, the centroid Cxy of an object is found by finding the centroid in each of the x and y directions separately.
Figure imgf000010_0001
After the centroid is located then the distance of the shape boundary from the centroid is mapped out using polar coordinates at discrete angular intervals k to obtain n total shape samples. The values are placed in an array b(k) and a one dimensional discrete cosine transform (DCT) is applied to it. The resulting transform coefficients form the final shape vector. The DCT is defined as:
Figure imgf000010_0002
Substitute Sheet (Rule 26) RO/AU The average colour for each region is stored in the HVC colour components, as these allow associations that are more cognitively correct based on the perceptual properties of hue, value and chroma. The definition of the HVC colour space is given in terms of the CIE Lab space that is given in terms of the XYZ space as follows:
Figure imgf000011_0001
The similarity matching process is performed by selecting an initial image object and performing a weighted mean square error analysis between its attributes and the corresponding attributes of all other image objects stored in the system. The object that generates the lowest mean square error against the current object is selected for linking. To avoid circular references a list of links that refer to the current object is maintained and checked against. If the matched object already has a link that refers to the current object then that link is rejected and the link is instead formed to the object with the second lowest means square error subject to the same condition. If no matches are found to be below a certain acceptance threshold then the object in question remains unlinked. The weighted mean square error criterion is defined for the original data set x and the comparison set x defined at each location j over a total of n values as follows:
Substitute Sheet
Figure imgf000012_0001
^ ^)
In one embodiment the system performs this matching using only low- level statistically defined features. This works reasonably well for objects that are simple and fairly homogenous. In the case of complex object formed out of reasonably dissimilar regions this approach is less effective. In these situations when an object is not particularly homogenous the saliency of its statistical attributes are significantly reduced. In their stead however these compound objects tend to exhibit structural saliency. Hence the object structure can be used for similarity matching. In this case the matching involves finding objects that are defined similarly in terms of the relationships between their component regions. The similarity matching no longer involves comparing the values of individual absolute features but rather the relative values of a number of features. This is done by directly comparing the object structure trees of the object that contain these relationships. At this level the associations supported by the system are no longer purely feature analytic but are well on their way to being semantic, based on the structural properties of the image content.
Unlike automatic hypertext authoring systems where key words may appear many hundreds of times creating the need for context and relevance evaluation in the process of creating links, no two different images will contain identical objects or attributes. Hence it is extremely unlikely that a search will reveal two possible matches that have exactly the same degree of similarity with the reference object. The closest match on each occasion can therefore be used. In contrast to existing methods for defining referenceable components in images for linking, such as manually defining image maps and using object recognition, that are sematic in nature, the system does not require the interpretation or recognition of semantics in images and hence it is completely generic and unconstrained. The node demarcation heavily exploits Gestalt principles and makes use of an initial segmentation phase followed by local and global grouping phases to identify semiotic sub-signs
Substitute Sheet and signs. The associative linking can operate in either static or dynamic mode and is based on object similarity matching using shape, colour and relational or structural features to form associations among objects in the image sets. The navigation support system is capable of automatically providing a list of multiple alternative targets that the user may wish to pursue.
Throughout the specification the aim has been to describe the invention without limiting the invention to any specific combination of features.
Substitute Sheet

Claims

1. A method for generating hyperlinks between images in a set of images including the steps of: processing each image to identify nodes within the image; characterising each node in terms of a discrete set of attributes; compiling a searchable metadata file of all sets of attributes for all images; searching the metadata file to identify a node having attributes most closely related to attributes of a selected node; and forming a link between the identified node and the selected node.
2. The method of claim 1 wherein the step of identifying nodes further includes the steps of: isolating elementary components in an image; local grouping of the components into signs; and global grouping of the signs into objects that collectively describe the image.
3. The method of claim 2 wherein the step of isolating elementary components is performed by a region growing technique in which pixels are added to a region according to a homogeneity criterion and edge information.
4. The method of claim 2 wherein the step of local grouping includes the steps of segmenting the components and edge analysis.
5. The method of claim 2 wherein Gestalt grouping principles are employed in the local grouping step.
6. The method of claim 2 wherein the local grouping step is performed in two stages including a first region labelling and connected component analysis stage; and a second Gestalt based grouping stage.
7. The method of claim 2 wherein the global grouping step applies Gestalt grouping principals including one or more of: common fate; relative size; symmetry, surroundedness and continuity.
8. The method of claim 1 wherein said searchable metadata file consists of an image structure tree for each image in said set of images.
9. The method of claim 1 wherein the step of searching the metadata file is a similarity matching process.
stitute Sheet
10. The method of claim 9 wherein the similarity matching process is weighted in favour of features of shape.
11. The method of claim 1 wherein the step of forming links between identified nodes and selected nodes occurs dynamically.
12. The method of claim 1 wherein the step of forming links between identified nodes and selected nodes occurs statically.
13. A method of forming an image structure tree for an image in an image hyperlinking system that generates hyperlinks between images in a set of images, including the steps of: segmenting said image by isolating elementary components in said image; local grouping of the components into signs; global grouping the signs into objects; and arranging the objects, signs and components into an image structure tree to describe the image.
Substitute Sheet
PCT/AU2000/000891 1999-07-27 2000-07-26 Hyperimage system WO2001008083A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU59552/00A AU762537B2 (en) 1999-07-27 2000-07-26 Hyperimage system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AUPQ1854 1999-07-27
AUPQ1854A AUPQ185499A0 (en) 1999-07-27 1999-07-27 Hyperimage system

Publications (1)

Publication Number Publication Date
WO2001008083A1 true WO2001008083A1 (en) 2001-02-01

Family

ID=3816035

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2000/000891 WO2001008083A1 (en) 1999-07-27 2000-07-26 Hyperimage system

Country Status (2)

Country Link
AU (1) AUPQ185499A0 (en)
WO (1) WO2001008083A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005004007A1 (en) * 2002-09-18 2005-01-13 Dmetrix, Inc. Method for referencing image data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5579471A (en) * 1992-11-09 1996-11-26 International Business Machines Corporation Image query system and method
US5710877A (en) * 1995-12-29 1998-01-20 Xerox Corporation User-directed interaction with an image structure map representation of an image
US5819288A (en) * 1996-10-16 1998-10-06 Microsoft Corporation Statistically based image group descriptor particularly suited for use in an image classification and retrieval system
US5915038A (en) * 1996-08-26 1999-06-22 Philips Electronics North America Corporation Using index keys extracted from JPEG-compressed images for image retrieval
US5983237A (en) * 1996-03-29 1999-11-09 Virage, Inc. Visual dictionary
US5987456A (en) * 1997-10-28 1999-11-16 University Of Masschusetts Image retrieval by syntactic characterization of appearance

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5579471A (en) * 1992-11-09 1996-11-26 International Business Machines Corporation Image query system and method
US5751286A (en) * 1992-11-09 1998-05-12 International Business Machines Corporation Image query system and method
US5710877A (en) * 1995-12-29 1998-01-20 Xerox Corporation User-directed interaction with an image structure map representation of an image
US5983237A (en) * 1996-03-29 1999-11-09 Virage, Inc. Visual dictionary
US5915038A (en) * 1996-08-26 1999-06-22 Philips Electronics North America Corporation Using index keys extracted from JPEG-compressed images for image retrieval
US5819288A (en) * 1996-10-16 1998-10-06 Microsoft Corporation Statistically based image group descriptor particularly suited for use in an image classification and retrieval system
US5987456A (en) * 1997-10-28 1999-11-16 University Of Masschusetts Image retrieval by syntactic characterization of appearance

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005004007A1 (en) * 2002-09-18 2005-01-13 Dmetrix, Inc. Method for referencing image data

Also Published As

Publication number Publication date
AUPQ185499A0 (en) 1999-08-19

Similar Documents

Publication Publication Date Title
US7031555B2 (en) Perceptual similarity image retrieval
US7801893B2 (en) Similarity detection and clustering of images
US6804684B2 (en) Method for associating semantic information with multiple images in an image database environment
JP3781696B2 (en) Image search method and search device
Gupta et al. Visual information retrieval
US6240423B1 (en) Method and system for image querying using region based and boundary based image matching
Tian et al. Image retrieval using wavelet-based salient points
US5987456A (en) Image retrieval by syntactic characterization of appearance
US6594386B1 (en) Method for computerized indexing and retrieval of digital images based on spatial color distribution
US6072904A (en) Fast image retrieval using multi-scale edge representation of images
Lewis et al. An integrated content and metadata based retrieval system for art
WO2000045342A1 (en) Method of searching multimedia data
Danish et al. A survey: content based image retrieval based on color, texture, shape & neuro fuzzy
US6882746B1 (en) Normalized bitmap representation of visual object's shape for search/query/filtering applications
KR20030005908A (en) apparatus and method for representative feature extraction of objects and, its application to content-based image retrieval
Jiang et al. An ontology-based approach to retrieve digitized art images
Castillo-Ortega et al. Describing images via linguistic features and hierarchical segmentation
Fauqueur et al. New image retrieval paradigm: logical composition of region categories
JP3510980B2 (en) Image retrieval apparatus, retrieval method, and program recording medium using straight line
Tao et al. Spatial color indexing using rotation, translation, and scale invariant anglograms
AU762537B2 (en) Hyperimage system
WO2001008083A1 (en) Hyperimage system
CN106548118A (en) The recognition and retrieval method and system of cinema projection content
Farooque Image indexing and retrieval
Wardhani et al. Automatic image structure analysis

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 59552/00

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 10048369

Country of ref document: US

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
WWG Wipo information: grant in national office

Ref document number: 59552/00

Country of ref document: AU

NENP Non-entry into the national phase

Ref country code: JP