WO2001008083A1

WO2001008083A1 - Hyperimage system

Info

Publication number: WO2001008083A1
Application number: PCT/AU2000/000891
Authority: WO
Inventors: Ruben Gonzalez
Original assignee: Griffith University
Priority date: 1999-07-27
Filing date: 2000-07-26
Publication date: 2001-02-01
Also published as: AUPQ185499A0

Abstract

A method of hyperlinking images in a set of images by generating a structure tree for each image and matching attributes of an image to identify similar structure trees. The structure tree is generated by processing each image to identify nodes and compiling the nodes into the tree in the form of a searchable metadata file. The method generates direct hyperlinking between images without overlays.

Description

HYPERIMAGE SYSTEM

This invention relates to a method for processing images into a form suitable for hyperlinking. In particular, it relates to a process for generating hyperlinks between images in a set of images. The invention provides a hyper- image system that facilitates uninhibited navigation between images.

BACKGROUND TO THE INVENTION The concept of hyperlinking is familiar to people who routinely access the

World Wide Web (the Web). A hyperlink is an active redirection from one location to another location, usually related. Hyperlinking is not restricted to the Web but is often found in, for example, large text documents and electronic encyclopedias. Hyperlinking is intrinsic to the creation of multimedia works. Hyperlinking has been, in essence, restricted to hypertext linking.

Although existing hypertext systems support images, they are normally only embedded in the body of the text in a non-interactive way. In some cases, particularly multimedia encyclopedias, images (static and video) are destinations that are not forward linkable. In some cases it is possible to link forward from an image by treating the image as an iconic button, but this does not provide hyperlinking from the image medium. This constraint is due to images being unstructured bit arrays that do not have internally referenceable components for the formation of hyperlinks.

The familiar hypertext mark-up language (HTML) is a system for linking text in a document to another location. Systems such as HTML provide extensions that allow for linking to and from images. However, this is achieved by production of a linkmap overlay. The overlay is manually generated and specifies hot regions serving as anchor points. While providing the semblance of image linking, the linking is not intrinsic to the image. For the production of true hypeήinked multimedia systems, there is a need for a hyperimage system that provides hyperlinking of images in similar manner to existing hypertext systems.

Substitute Sheet (Rule 26) RO/AU It is also desirable for a hyperimage system to be able to process a set of images automatically. Automatic hypertext systems have been developed to address the heavy authoring load associated with producing a large multimedia document. Although existing systems are somewhat crude, generally requiring human intervention to achieve an aesthetically acceptable result, they nonetheless reduce the authoring overhead. In the case of images however, the highly abstract nature of the information conveyed intensifies the role of the authoring system, since semantic interpretation of the data is required in addition to the extra handling complexity due to its multifarious nature. While this means that the development of an automatic authoring process is much more difficult for hyperimages, it also means that the introduction of automatic image hyperlinking is much more advantageous.

The process of authoring (or creating) hypermedia documents requires three distinct phases. These phases are node, link and presentation authoring. Node authoring involves preparing the raw or unstructured source information into the desired nodes. Each node typically encapsulates a single concept or semantic entity. Link authoring is the process of forming relationships between the nodes by creating links and anchors. Presentation authoring defines the rendition' of the data including such things as layout, font selection and formatting both within individual nodes and for the overall system.

In the following description, a convention is adopted that an image can be described by a collection of objects, or nodes. Objects, or nodes, are a specific grouping of semiotic signs. Each sign is made up of sub-signs having no semantic value but which can be identified as homogeneous regions demarcated by either observed or implied luminance edges.

OBJECT OF THE INVENTION

It is an object of the invention to provide a method for processing an image into a hyperiinkable form. It is a further object of the invention to provide a process for generating hyperlinks between images in a set of images.

Other objects will be evident from the following discussion.

Substitute Sheet SUMMARY OF THE INVENTION

In one form, although it need not be the only, or indeed the broadest, form the invention resides in a process for generating hyperlinks between images in a set of images including the steps of: processing each image to identify nodes within the image; characterising each node in terms of a discrete set of attributes; compiling a searchable metadata file of all sets of attributes for all images; searching the metadata file to identify a node having attributes most closely related to attributes of a selected node; and forming a link between the identified node and the selected node.

Suitably, the step of identifying nodes further includes the steps of: isolating elementary components in an image; local grouping of the components into signs; and global grouping of the signs into objects that collectively describe the image. The step of isolating elementary components may suitably be performed by a region growing technique in which pixels are added to a region according to a homogeneity criterion and edge information.

The step of local grouping preferably includes the steps of segmenting the components and edge analysis. Gestalt grouping principles are preferably employed.

The step of forming links between identified nodes and selected nodes may occur dynamically or statically.

BRIEF DESCRIPTION OF THE DRAWINGS Preferred embodiments of the invention are described with reference to the following figures in which:

FIG 1 is a schematic of a first embodiment of a hyper-image system;

FIG 2 is a schematic of a second embodiment of a hyper-image system;

FIG 3 shows the steps involved in object demarcation; FIG 4 shows an original set of images to be used for exemplifying the invention;

FIG 5 shows the effect of segmentation of one of the images of FIG 4;

FIG 6 shows the steps involved in third phase grouping;

Substitute Shee Rule 26) RO/AU FIG 7 shows the effect of segmentation and edge enhancement; and FIG 8 shows the effect of segmentation and grouping.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT The components in the hyper-image system are partitioned into three main sections. The first section deals with object metadata file population, the second with link creation and the third deals mainly with user interface and navigation support. Typically the object metadata file population occurs in batch mode with the user supplying a large set of images to the system which processes them automatically by identifying objects and extracting their attributes and structure and storing these in the object metadata file. When complete, users are free to surf through images by simply clicking anywhere within the displayed image area.

In a first operating mode, depicted in FIG 1 , a set of images 1 are processed 2 to identify nodes (objects). The nodes are further analysed 3 to extract object attributes for populating the node metadata file 4. An associative linking process 5 is automatically performed according to the similarity measures described in a configuration file 6 that determines the relative weight to be accorded to each of the object's features such as shape, colour, structure etc. This linking stage is performed in batch mode, forming a static hyperweb for the image data set. After its completion authors are free to manually edit the links according to their needs. Hit testing and object matching is performed 7 to confirm the hyper-image linking for the user interface presentation 8. As mentioned above, the user 9 is then free to surf through the images 1 by clicking anywhere within the displayed image area. This process isolates the node demarcation from the presentation layer so that it is not dependent on the user interface.

In a second operating mode, displayed in FIG 2, the images 1 are processed 2 and analysed 3 to populate a node metadata file 4, in the manner described above. Hit testing 7 is performed on the displayed image to determine what object the user selected and its properties are retrieved from the metadata file 4 to search against. Once the search criteria have been retrieved, a full search is made of the metadata file 4 to identify the image that contains the most

Substitute Sheet Rule 26) RO/AU similar matching object. The image that contains that object is then loaded and displayed ready for the user to continue surfing.

Navigation in the second embodiment is significantly slower than the first but affords the user the ability to reconfigure the attribute weightings for each search. In a further embodiment, right clicking on any image object will bring up a list of possible link destinations (image objects) ranked in order of similarity to the current object. This allows users more control over navigation by providing multiple target destinations. This mode also permits users to start navigation from an object in an image that is supplied by the user at run-time and hence has not been registered with the system. In this case the system first demarcates the boundaries of the object clicked on by the user and extracts object attributes to be used to search against.

In order to process an image into a hyperlinked form, the significant nodes within an image must be identified. The nodes must be significant in terms of human vision rather than conventional machine vision. The aim is not to recognise an image, but rather to generate nodes that are likely to be significant to a human viewer so that when a viewer clicks a part of the image a sensible hyperlink occurs.

The most important perceptual cues in images are luminance boundaries. Special cells exist in the brain to detect luminance boundaries and to perform complex processing in the presence of occluded contours. One reason for their importance in perception is that they are used to define regions in the perceived image, the properties of which are often extrapolated from luminance edges via a filling-in mechanism. Hence, the basic building blocks, or sub-signs, in an image are homogenous image regions that are demarcated by either observed or implied luminance edges. Signs are groups of these regions having specific properties in relation to each other. Semantic objects in an image are specific groupings of these signs.

Referring to FIG 3, it can be seen that the method of identifying nodes consists of three distinct phases. The first phase involves image segmentation by isolating elementary components from the data that are not in themselves significant. The data may be any suitable image data including captured video data. The second phase requires local grouping of the components into

Substitute Sheet RO/AU significant entities (signs). These signs are not necessarily meaningful of themselves and must undergo a third global grouping phase to produce objects with the capacity to be truly meaningful in a semantic sense. At no stage in the process is any meaning or interpretation attached to any image component. Further description will be made with reference to various images. The original images are shown in FIG 4. Later figures display one or more of these images after various stages of processing.

A region growing technique in the HVC colour space is preferred for the first phase. Other techniques that could be used include quad-tree split and merge, histogram splitting, and watershed algorithms. These techniques will be known to persons skilled in the art.

In the preferred embodiment, the region growing starts by selecting a pixel to which neighbouring pixels are joined until no more pixels can be added according to some homogeneity criterion. A suitable homogeneity criteria is that the intensity difference between the pixel and the closest region boundary pixel is below some threshold. The growing procedure is started from the top left corner of the image and the growing process is performed continuously until no more pixels can be added. When this occurs, a new region is then created by starting on the closest unassigned pixel. This continues until all pixels in the whole image have been assigned to a sub-sign.

The HVC colour space was used instead of luminance images alone because images with low luminance contrast are difficult to segment correctly unless chrominance or saturation variations are exploited. The RGB colour space provides no advantages over the luminance alone since the RGB values are highly correlated and don't truly reflect the magnitude of perceived colour variations. The HVC colour space is well recognised for its successful correspondence to human colour perception. It represents colour in terms of hue (H), value or intensity (V) and chroma or saturation (C) that describes the colour purity. FIG 5 shows the effect of segmenting each of these components separately for the fourth image of FIG 4. FIGs 5a-f show respectively hue component, value component, chroma component, hue segmentation, value segmentation and chroma segmentation.

Substitute Shee (Rule 26) RO/AU The combined HVC colour segmentation method based on region growing uses a cylindrical colour distance measure in place of a simple intensity difference. The colour distance between two pixels / and j (which may be normalised appropriately) is given by the following (where v_it c„ h,, and v,-, c_;, h_Jt are the value, chroma and hue at pixel locations andy): d,_j = _TJ(V , - V J )² + (c, sin (h, )- c_y sin (/?_y ))² + (c, cos (h, )- c_y cos (h_j ))²

In one embodiment, the threshold is set automatically using a linear combination of the mean and standard deviation of the intensity gradient values in the source image according to the following equation: τ = ₀ + k_xμ + k₂σ where μ is the mean, σ is the standard deviation, and the constants ko, k., and k₂ are obtained experimentally to be 3.6, 0.36 and -0.143 respectively, using least square optimisation from the sample set of images shown in FIG 4. The second phase groups the sub-signs according to local criteria and exploits the Gestalt grouping principles of similarity, proximity, good continuation and closure. These principles are applied by analysing the nature of a number of characteristics of neighbouring sub-signs and the boundaries between them. The second phase can be considered in two stages. The first grouping stage is a region labelling and connected component analysis process that simultaneously eliminates segmentation artifacts and performs additional texture-based grouping. For example, areas that are smaller than 20 pixels are considered to be insignificant and are merged into surrounding areas. Adjacent areas that are also less than 20 pixels are grouped into a textured region. The selection of 20 pixels is entirely arbitrary.

In the second grouping stage, line information from edge detection is combined with the region information through Gestalt based grouping rules. The process is depicted in FIG 6. Firstly, regions are checked to see if they are separated by any edges. Regions that are separated by lines usually belong to different objects, hence they can not be grouped together. However regions that share a continuous line between them along a common side are

Substitute Sheet A candidates for grouping together. Regions in this formation are generally understood to be part of the same object. The final outcome is a robust set of signs consisting of homogeneous regions. The images after the first stage of segmentation are shown in FIG 7a and after the second stage in FIG 7b. FIG 8a shows the effect on the original images of FIG 4 of the first phase segmentation process. After second phase grouping, the images appear as in FIG 8b.

The third phase globally applies the remaining Gestalt grouping principles of common fate, relative size, symmetry and surroundedness, as well as good continuity, to the signs produced by the second phase. For example, regions that are completely surrounded by another region are often part of the surrounding object and may be grouped together. Regions that are symmetrical about a given axis can be considered to be components of a single object. The outcome of this stage is an image structure tree that describes the image in terms of its component objects made up of signs that are groups of sub-signs. The image structure tree is the data for the node metadata file.

The objects are composed of one or more signs that are represented by each node. The child nodes store the sub-signs while the relationships between the child nodes is stored in the parent node. Each entry in the tree is labelled with characteristic features of the image object it identifies, such as average colour, colour variance, shape chain code, size and its relationship to other significant objects in the image. The result being that the original image can be reconstructed with a high degree of accuracy from the metadata file. The structure tree represents a generalised scene description of the original image.

It will be appreciated that the structure tree permits the image to be analysed at various levels of information in terms of component objects. The objects may not be strongly semantically significant in themselves, they may just be a significant portion of a semantic object. For instance, a face can be identified as three smaller regions embedded in a larger oval shaped region with particular size and distance relationships between them.

Substitute Sheet le 26) RO/AU Each of the objects defined in the image structure tree can now be linked with either other objects in the same tree or with objects in other images if these have been similarly processed. The associative links are predominantly formed by finding similar objects in other images with matching properties. The properties considered include average object colour and texture, object shape, relationships to other objects and object structural composition. Various weightings are applied to these properties in the matching process to produce different linking outcomes. Normally, shape is assigned the highest weighting since it plays a more significant role in cognitive similarity matching.

The shape of each object is defined using Fourier shape descriptors that are invariant to orientation, scale and translation. The process of obtaining the Fourier shape descriptor for an object involves first locating the centroid of the object, then mapping its shape using polar coordinates and finally applying a Fourier transform to the data. By way of example, a discrete cosine transform may be used in place of the Fourier transform due to its superior data compaction properties and because it is a purely real transform. Assuming each object is defined by a binary mask, P(x, y) of M columns and N rows in size, the centroid C_xy of an object is found by finding the centroid in each of the x and y directions separately.

After the centroid is located then the distance of the shape boundary from the centroid is mapped out using polar coordinates at discrete angular intervals k to obtain n total shape samples. The values are placed in an array b(k) and a one dimensional discrete cosine transform (DCT) is applied to it. The resulting transform coefficients form the final shape vector. The DCT is defined as:

Substitute Sheet (Rule 26) RO/AU The average colour for each region is stored in the HVC colour components, as these allow associations that are more cognitively correct based on the perceptual properties of hue, value and chroma. The definition of the HVC colour space is given in terms of the CIE Lab space that is given in terms of the XYZ space as follows:

The similarity matching process is performed by selecting an initial image object and performing a weighted mean square error analysis between its attributes and the corresponding attributes of all other image objects stored in the system. The object that generates the lowest mean square error against the current object is selected for linking. To avoid circular references a list of links that refer to the current object is maintained and checked against. If the matched object already has a link that refers to the current object then that link is rejected and the link is instead formed to the object with the second lowest means square error subject to the same condition. If no matches are found to be below a certain acceptance threshold then the object in question remains unlinked. The weighted mean square error criterion is defined for the original data set x and the comparison set x defined at each location j over a total of n values as follows:

Substitute Sheet

^ ^)

In one embodiment the system performs this matching using only low- level statistically defined features. This works reasonably well for objects that are simple and fairly homogenous. In the case of complex object formed out of reasonably dissimilar regions this approach is less effective. In these situations when an object is not particularly homogenous the saliency of its statistical attributes are significantly reduced. In their stead however these compound objects tend to exhibit structural saliency. Hence the object structure can be used for similarity matching. In this case the matching involves finding objects that are defined similarly in terms of the relationships between their component regions. The similarity matching no longer involves comparing the values of individual absolute features but rather the relative values of a number of features. This is done by directly comparing the object structure trees of the object that contain these relationships. At this level the associations supported by the system are no longer purely feature analytic but are well on their way to being semantic, based on the structural properties of the image content.

Unlike automatic hypertext authoring systems where key words may appear many hundreds of times creating the need for context and relevance evaluation in the process of creating links, no two different images will contain identical objects or attributes. Hence it is extremely unlikely that a search will reveal two possible matches that have exactly the same degree of similarity with the reference object. The closest match on each occasion can therefore be used. In contrast to existing methods for defining referenceable components in images for linking, such as manually defining image maps and using object recognition, that are sematic in nature, the system does not require the interpretation or recognition of semantics in images and hence it is completely generic and unconstrained. The node demarcation heavily exploits Gestalt principles and makes use of an initial segmentation phase followed by local and global grouping phases to identify semiotic sub-signs

Substitute Sheet and signs. The associative linking can operate in either static or dynamic mode and is based on object similarity matching using shape, colour and relational or structural features to form associations among objects in the image sets. The navigation support system is capable of automatically providing a list of multiple alternative targets that the user may wish to pursue.

Throughout the specification the aim has been to describe the invention without limiting the invention to any specific combination of features.

Substitute Sheet

Claims

1. A method for generating hyperlinks between images in a set of images including the steps of: processing each image to identify nodes within the image; characterising each node in terms of a discrete set of attributes; compiling a searchable metadata file of all sets of attributes for all images; searching the metadata file to identify a node having attributes most closely related to attributes of a selected node; and forming a link between the identified node and the selected node.

2. The method of claim 1 wherein the step of identifying nodes further includes the steps of: isolating elementary components in an image; local grouping of the components into signs; and global grouping of the signs into objects that collectively describe the image.

3. The method of claim 2 wherein the step of isolating elementary components is performed by a region growing technique in which pixels are added to a region according to a homogeneity criterion and edge information.

4. The method of claim 2 wherein the step of local grouping includes the steps of segmenting the components and edge analysis.

5. The method of claim 2 wherein Gestalt grouping principles are employed in the local grouping step.

6. The method of claim 2 wherein the local grouping step is performed in two stages including a first region labelling and connected component analysis stage; and a second Gestalt based grouping stage.

7. The method of claim 2 wherein the global grouping step applies Gestalt grouping principals including one or more of: common fate; relative size; symmetry, surroundedness and continuity.

8. The method of claim 1 wherein said searchable metadata file consists of an image structure tree for each image in said set of images.

9. The method of claim 1 wherein the step of searching the metadata file is a similarity matching process.

stitute Sheet

10. The method of claim 9 wherein the similarity matching process is weighted in favour of features of shape.

11. The method of claim 1 wherein the step of forming links between identified nodes and selected nodes occurs dynamically.

12. The method of claim 1 wherein the step of forming links between identified nodes and selected nodes occurs statically.

13. A method of forming an image structure tree for an image in an image hyperlinking system that generates hyperlinks between images in a set of images, including the steps of: segmenting said image by isolating elementary components in said image; local grouping of the components into signs; global grouping the signs into objects; and arranging the objects, signs and components into an image structure tree to describe the image.

Substitute Sheet