WO2008039635A2

WO2008039635A2 - Semantic image analysis

Info

Publication number: WO2008039635A2
Application number: PCT/US2007/077818
Authority: WO
Inventors: Jonathan S. Teh; Paola M. Hobson
Original assignee: Motorola, Inc.
Priority date: 2006-09-27
Filing date: 2007-09-07
Publication date: 2008-04-03
Also published as: GB2442255B; GB0619039D0; GB2442255A; WO2008039635A3

Abstract

A system for image analysis comprises a client device (101) with an image processor (203) which analyses an image to generate image feature data. A transmitter (205, 207) transmits the image feature data to a remote server (107) for semantic image analysis. The remote server (107) comprises a receiver (301, 303) which receives the image feature data. A semantic label processor (305) then generates the semantic image data by performing a semantic analysis for the image in response to the image feature data. A transmitter (307, 301) transmits the semantic image data to the client device (101) where it is received by a receiver (207,209). The distributed analysis allows complex semantic image analysis to be performed efficiently for low complexity and limited resource devices while maintaining a low communication resource usage.

Description

SEMANTIC IMAGE ANALYSIS

Field of the invention

The invention relates to semantic image analysis and in particular, but not exclusively, to knowledge based semantic labelling of digitally encoded images.

Background of the Invention

As images are increasingly stored, distributed and processed as digitally encoded images, such as individual images or video images, the amount and variety of encoded images has increased substantially.

However, the increasing amount of image data has increased the need and desirability of automated and technical processing of pictures or video sequences with no or less human input or involvement. For example, manual human analysis and indexing of images, such as photos, is frequently used when managing image collections. However, such operations are very cumbersome and time consuming in the human domain and there is a desire to increasingly perform such operations as automated or semi-automated processes in the technical domain .

Specifically, automatic semantic labelling of images is an area that has attracted significant research interest. Conventional image analysis algorithms have been developed to describe an image in terms of its colour histogram, edges, lines and texture type. The results of the image analyses have been used to segment images into multiple closed regions. More recent algorithms have combined these techniques with knowledge-assisted techniques that use ontologies and domain knowledge to combine regions in the typically oversegmented image and to assign semantic labels to these regions.

However, semantic labelling algorithms are extremely resource demanding and require very high processor and memory resources. Therefore, semantic labelling is in practice limited to high performance computer systems and even for such systems the image labelling is typically a very slow process thereby making it impractical for labelling of large image collections.

The research consortium aceMedia has been formed by a number of companies to develop and research algorithms and processes in the field of knowledge assisted multimedia management. Within the consortium, a knowledge-assisted analysis (KAA) module for semantic labelling has been developed. For this system, the initial labelling has a duration of around 2 minutes for a relatively small image of 0.5 megapixel on an Intel Pentium P4 2.8GHz personal computer with a memory usage of around 500MB of memory.

The complexity and resource demand renders conventional semantic image labelling impractical for lower resource systems. However, such automated annotation would be of particularly high importance for many low complexity, mobile and/or user interface constrained devices, such as mobile phones or Personal Digital Assistants.

CML033₄7EV Furthermore, even if the required computational resource is available the high processing need results in a high power consumption thereby making the semantic labelling suboptimal or impractical for many battery powered devices .

For example, many high-end mobile devices, such as mobile phones, today include a 2 megapixel or higher resolution cameras and these lead to images of significant size (typically around 100-350KB for a 2 megapixel image) . Semantic labelling of these images on the device, if at all possible, will result in extremely long processing times and a significant reduction in the battery life. Transmission of such an image or video clip to a server for processing is also time consuming.

Hence, an improved image analysis system would be advantageous and in particular a system allowing increased flexibility, efficient implementation, increased practicality, improved suitability for low complexity/resource devices and/or improved performance would be advantageous.

Summary of the Invention

Accordingly, the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.

According to a first aspect of the invention there is provided a system for image analysis, the system comprising :

CML033₄7EV a client device comprising: means for providing at least a first image, first analysing means for analysing the first image to generate image feature data, first transmitting means for transmitting the image feature data to a remote server for semantic image analysis, and means for receiving semantic image data from the remote server; and the remote server comprising: means for receiving the image feature data, second analysing means for generating the semantic image data by performing a semantic analysis for the first image in response to the image feature data, second transmitting means for transmitting the semantic image data to the client device .

The invention may allow an improved image analysis system. In particular, the system may enable or facilitate image analysis on client devices with reduced capability, complexity and/or computational resource. The client device may for example be a portable device such as a mobile phone or PDA. The client device may be battery powered and the invention may allow increased battery life. The invention may furthermore allow an efficient and/or low resource overhead, and in particular only the image feature data required for the semantic analysis may be communicated to the remote server thereby resulting in a low communication overhead.

The approach allows a distributed image analysis which may allow a practical implementation on a low resource client device. In particular, performing only enough analysis to determine image feature data at the client device allows both low device resource usage as well as low communication resource usage. In many embodiments, a

CML033₄7EV much faster image analysis may be performed at a remote server with much higher computational power, which can therefore be used for the complex and time consuming analysis to generate semantic image data.

The image feature data may comprise a feature vector and/or visual descriptors for image segments of the image. The semantic image data may comprise a semantic labelling of the image.

The approach may e.g. be applied to video sequences or to individual images such as pictures or photos.

In accordance with an optional feature of the invention, the remote server furthermore comprises means for generating segmentation data for the image by combining image segments indicated by the image feature data and the second transmitting means is arranged to transmit the segmentation data to the client device.

This may allow an efficient and high performance image analysis. The segmentation data may comprise a segmentation map and may include semantic labels for one or more of the segments.

In accordance with an optional feature of the invention, the remote server comprises means for transmitting a subset indication of a subset of image feature data pertinent to the semantic analysis of at least one analysed image, and the client device comprises subset means for determining the image feature data to transmit to the remote server for the first image in response to the subset indication.

CML033₄7EV This may allow increased efficiency and may in particular allow reduced communication resource usage while allowing a high performance image analysis. The image feature data pertinent to the semantic analysis of at least one analysed image may be data which meets an impact criterion for the image analysis. For example, the pertinent image data may be data which has been used in the image analysis of the analysed data or data which is required or desired for the image analysis. In some embodiments, the second analysing means is arranged to determine a relevance measure for different image feature data to the generated semantic image data and may determine the subset in response to the relevance measure.

According to another aspect of the invention, there is provided a communication device comprising: means for providing at least a first image; analysing means for analysing the first image to generate image feature data; transmitting means for transmitting the image feature data to a remote server for semantic image analysis; means for receiving semantic image data determined in response to the image feature data from the remote server.

According to another aspect of the invention, there is provided a method of image analysis, the system comprising : a client device performing the steps of: providing at least a first image, analysing the first image to generate image feature data, transmitting the image feature data to a remote server for semantic image

CML033₄7EV analysis, and receiving semantic image data from the remote server; and the remote server performing the steps of: receiving the image feature data, generating the semantic image data by performing a semantic analysis for the first image in response to the image feature data, transmitting the semantic image data to the client device .

These and other aspects, features and advantages of the invention will be apparent from and elucidated with reference to the embodiment (s) described hereinafter.

Brief Description of the Drawings

Embodiments of the invention will be described, by way of example only, with reference to the drawings, in which

FIG. 1 illustrates an example of a communication system in accordance with some embodiments of the invention;

FIG. 2 illustrates an example of a client device in accordance with some embodiments of the invention;

FIG. 3 illustrates an example of an image analysis server in accordance with some embodiments of the invention;

FIG. 4 illustrates exemplary images on which an image analysis may be performed; and

FIG. 5 illustrates an example of a method of image analysis in accordance with some embodiments of the invention .

CML033₄7EV Detailed Description of Some Embodiments of the Invention

The following description focuses on embodiments of the invention applicable to an application wherein images of a mobile phone are analysed. However, it will be appreciated that the invention is not limited to this application but may be applied to many other systems and devices.

In the described examples, image analysis of an image from the mobile device is distributed between the mobile device itself and a remote server. In particular, the analysis stage of a semantic labelling algorithm is split between the mobile device and the remote server. An image feature extraction algorithm is performed on the mobile device using an image (or video) descriptor toolbox and a knowledge assisted analysis and semantic labelling is performed on the remote server where more computing resources are available. Thus, rather than transmitting the whole image to the remote server, only the feature vectors are sent to the remote server. The image analysis of the remote server is then performed based only on the feature vectors which have a smaller footprint in memory (typically 1-5% of the raw image size) . The server then returns the semantic labels and final segmentation map to the client device which stores it together with the image .

Thus, the system uses a separation or distribution of the image analysis process to generate descriptors at a client device for the purposes of a more complex analysis

CML033₄7EV at a remote server. The approach allows the use of automatic semantic annotation of images on a resource- constrained device by distributing the image analysis process where the more complex tasks are executed on a more powerful device of a communication network.

Thus, whereas users prefer to annotate images shortly after they have been taken, the annotation process is a tedious and laborious task, and is particularly difficult on a user interface constrained device such as a mobile handset. The current approach allows a semantic annotation to be automatically performed for the user with the results being provided faster than if the analysis was performed on the mobile device using its limited computational power. Furthermore, as some analysis steps are performed on the client device to generate compact image feature data suitable for automatic semantic labelling, the semantic labelling can be provided faster and with lower computational and communication resource usage than systems transferring the image to be analysed to a remote server or other devices (e.g. by connecting the client device to a Personal Computer via a wireless connection) . This may further allow the user to correct any mistakes made by the semantic annotation process while the context in which the picture was taken is still fresh in the user's mind.

FIG. 1 illustrates an example of a communication system in accordance with some embodiments of the invention.

The communication system comprises a client device 101 for which one or more images or video sequences should be

CML033₄7EV analysed. In the example, the client device 101 is a wireless device which communicates with a base station 103 over a radio air interface. The base station 103 is coupled to a communication network 105 which is further coupled to a remote server 107.

The communication system may for example be a cellular communication system, a wireless local area network (WLAN) or any other wireless communication network or combination of these. In the specific example, the client device 101 is a mobile phone of a cellular communication system, such as a GSM or UMTS system.

FIG. 2 illustrates the client device 101 in more detail.

The client device 101 comprises an image generator 201 which provides an image to be analysed. The image generator 201 may for example comprise a digital camera which can provide a digital image or video sequence. Specifically, the client device 101 may be a mobile camera phone. In other embodiments, the image to be analysed may simply be received from another source such as an internal storage or an external device.

The image generator 201 is coupled to an image processor 203 which analyses the image to generate image feature data. The processing by the image processor 203 may comprise several steps.

In the specific example, the image processor 203 is arranged to segment the image into various image segments with corresponding characteristics. The image segmentation may for example be based on colour,

CML033₄7EV brightness level etc. The image segmentation will generally generate a number of image segments where the pixels of each segment tend to belong to the same object. For example, for an image of a beach landscape, the initial segmentation may for example generate a number of blue image segments which may be part of the sea, a number of bright image segments which may be part of the sun etc .

Specifically, the image processing may be performed by using algorithms selected from a standardised image or video descriptor toolbox. For example, the initial segmentation may comprise a colour segmentation of up to 8 dominant colours in accordance with the MPEG-7 Dominant Colour descriptor standard.

Furthermore, for a video sequence a spatial segmentation can also be performed using motion information. This can be assisted by the motion estimation module present in the video encoding subsystem of the handset.

The initial segmentation tends to be an over-segmentation resulting in a potentially large number of image segments for each image object in the picture. For example, any image area corresponding to the sea will typically result in a large number of image segments with varying degrees of blue, green, grey etc pixels.

Following the initial segmentation, the image processor 203 proceeds to generate feature data for the segments. Thus image feature data in the form of an image vector describing visual characteristics of each segment may be generated. The image feature data may for example

CML033₄7EV characterise the colour, brightness level, shape and texture of each image segment. Specifically, the dominant colour can determined for each segment in accordance with to the MPEG-7 Dominant Colour descriptor standard. Other features such as texture can be represented with the MPEG-7 Homogenous Texture descriptor standard

In some embodiments, the image feature data may not comprise individual characteristics for each segment but rather the initial segments may be grouped together and the image feature data may comprise indications of the characteristics of a group rather than the individual segments .

The image processor 203 is coupled to a transmit controller 205 which is further coupled to a transceiver 207. The transmit controller 205 receives the image feature data from the image processor 203 and generates a suitable data message (or plurality of messages) for transmission in accordance with the communication standards of the communication system. The data message is then fed to the transceiver 207 which transmits it to the base station 103 over the air interface. The data message is addressed to the remote server 107 and is forwarded from the base station 103 to the remote server 107 via the communication network 105.

By generating and transmitting only the image feature data rather than the full image, a much faster and reduced resource demanding system is achieved. For example, a typical 2 megapixel picture in compressed JPEG format (such as those taken with a typical high-end camera phone) tends to be between 100 and 350KB. The

CML033₄7EV visual descriptors can be used to extract feature vectors that are represented in RDF (Resource Description Framework) format and which typically results in a size of around 8KB depending on the level of detail. Using suitable compression, this can be reduced by approximately 85%. This represents an overall metadata size of around only 0.3% to 1.2% of the original JPEG compressed image. Assuming an uplink bandwidth of 64kbps, the image would need between 12 and 45 seconds to be transmitted whereas the visual descriptors would only take 0.15 seconds. The bandwidth savings also means cost savings in systems where the user pays for the communication resource used (such as in cellular communication systems) . Where multiple images need to be automatically annotated, the upload time can be very impractical for complete images.

FIG. 3 illustrates the remote server 107 in more detail. The remote server 107 comprises a network interface 301 which interfaces the remote server 107 to the communication network 105. The network interface 301 receives the data message from the client device 101.

The network interface 301 is furthermore coupled to a network receive processor 303 which is forwarded the received data message. The network receive processor 303 extracts the image feature data from the received data message and feeds it to a semantic label processor 305 coupled to the network receive processor 303.

The semantic label processor 305 is arranged to perform a semantic analysis of the image based on the image feature data received from the client device 101. Specifically,

CML033₄7EV the semantic label processor 305 can perform a semantic labelling analysis which is based only on the image feature data. Thus, in the example, the visual descriptors and image characteristics of the segments generated by the image processor 203 are used by the semantic label processor 305 to generate semantic labels for the image. For example, the semantic label processor 305 may combine image segments into larger image segments corresponding to individual objects and may label the objects. Thus, in the beach picture example, a number of image segments may be combined into a single image object which is then labelled "sea".

It will be appreciated that any suitable algorithm or approach to perform the semantic analysis of the image can be used without detracting from the invention.

In a specific example, the semantic label processor 305 performs a knowledge assisted semantic analysis to generate the semantic labels.

Specifically, the semantic label processor 305 examines the image feature data relating to the image being analysed, and assigns to each region a list of possible concepts along with a degree of confidence. Those concepts are used (along with the degrees and spatial information of the regions) for the construction of an RDF description that is the actual system's output: a semantic interpretation of the multimedia content. By comparison of the low level features of the image regions with prototypical examples, the initial labelling can be made. This initial labelling is further improved by taking into account the semantic relevance of the labels,

CML033₄7EV including whether they are contained in a domain ontology relating to the scene (e.g. a label of "bear" for an image region in a beach scene would be rejected) and taking into account spatial context, i.e. how certain concepts are usually related in terms of their spatial arrangement (e.g. usually sky is depicted above the sea, and an aeroplane will be usually depicted within the sky and not within the sea) . After discarding false labels there will be further use of the spatial knowledge to refine the regions, e.g. merging regions that all depict sky into one big sky region. More information may be found in the aceMedia annual public report for 2005 available at www.acemedia.org and also "Knowledge- Assisted Video Analysis Using a Genetic Algorithm", N. Voisine, S. Dasiopoulou, V. Mezaris, E. Spyrou, T. Athanasiadis, I. Kompatsiaris, Y . Avrithis, M. G. Strintzis, Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005) , Montreux, Switzerland, April 13-15, 2005 and "Using a Multimedia Ontology Infrastructure for Semantic Annotation of Multimedia Content", Thanos Athanasiadis, Vassilis Tzouvaras, Kosmas Petridis, Frederic Precioso, Yannis Avrithis and Yiannis Kompatsiaris, 5th International Workshop on Knowledge Markup and Semantic Annotation (SemAnnot 2005) at the 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, Nov. 2005 .

The semantic label processor 305 is coupled to a network transmit processor 307 which is coupled to the network interface 301. The semantic label processor 305 provides the semantic label data to the network transmit processor 307 which generates a suitable data message and transmits

CML033₄7EV this to the client device 101 using the network interface 301.

The client device 101 furthermore comprises a receive controller 209 which is coupled to the transceiver 207. When the data message is received from the remote server 107, it is forwarded to the receive controller 209 which extracts the semantic label data. The receive controller 209 is coupled to a store processor 211 which is furthermore coupled to the image generator 201. The store processor 211 is arranged to store the semantic label data together with the image in an internal or external image store. The automatically generated semantic labelling can then be used by the user or other applications to e.g. identify and search for images. In some embodiments, the user may furthermore be provided with an opportunity to modify or correct the generated semantic labelling data.

In the example, the semantic label processor 305 also generates larger image segments corresponding to image object or regions. Thus, the initial over segmentation performed by the image processor 203 of the client device 101 is transformed into a semantically more significant segmentation. In some embodiments, the segmentation information may be transmitted to the client device 101 and specifically a segmentation map of the picture may be transmitted to the client device 101 and stored with the semantic labelling and the image.

Thus, the described approach enables or facilitates the use of automatic semantic annotation of images on a resource-constrained device by distributing the image

CML033₄7EV analysis process such that the more complex tasks are executed on a more powerful remote server.

In some embodiments, only a subset of the generated feature data is transmitted from the client device to the remote server. Specifically, a feedback system is implemented wherein the data used in the analysis of previously annotated images is used to determine an appropriate subset of feature data to transmit.

In this example, the remote server 107 also comprises an analysis subset processor 309 which can generate and transmit a subset indication of a subset of image feature data which is pertinent to the semantic analysis of the image being analysed. Thus, when an image is semantically analysed, the semantic label processor 305 also evaluates which image feature data was particularly relevant and which image feature data was of little significance.

For example, the semantic label processor 305 may identify that only the colour descriptors had been used to generate high confidence region labels, and therefore additional descriptors such as motion or shape would not be useful in the decision making for further images of the same type of scene. In addition, for some scenes a descriptor that applies over a larger region is sufficient for high confidence labels and thus descriptors for smaller individual regions would not be needed.

The analysis subset processor 309 may in some embodiments receive a relevance indication for how relevant or

CML033₄7EV important the different parameters of the image feature data was for the analysis. This relevance indication may then be used to determine whether the corresponding data should be indicated as pertinent or non-pertinent. In some embodiments, the relevance indications may directly be transmitted to the client device 101 and used to determine which image feature data to include for future images .

The analysis subset processor 309 is coupled to the network transmit processor 307 which is arranged to receive the indication of the relevant data and to include this in the data message transmitted to the client device 101.

When the client device 101 receives the data message from the remote server 105 it stores the indication of the feature data that was used for the analysis together with the semantic labelling. The client device 101 furthermore comprises a subset processor 213 which is arranged to determine a subset of the generated image feature data which should be transmitted to the image to be analysed depending on the data that was used for previous images.

Thus, the feedback system allows a learning system wherein information of the relevant feature data for different images is gradually built up and is used to reduce the data transmitted when a new image is analysed. This can reduce the amount of time spent computing the feature data and can reduce the transmission time and communication resource usage. The approach utilises the fact that the knowledge-assisted analysis may not require all possible visual descriptors of all segments in order

CML033₄7EV to perform an accurate labelling and that transmitting additional feature vectors yields diminishing returns in accuracy.

The subset processor 213 specifically identifies an image from the stored collection of already analysed pictures which is similar to the current image to be analysed. It will be appreciated that any suitable similarity criterion can be used. For example, the subset processor 213 can evaluate a distance criterion for the feature vectors on a global scale e.g. by plotting the feature vectors from the current image and already analysed images and computing local distance between them using the Euclidean distance or other metric.

Additionally or alternatively, the similarity criterion may also take into account locations associated with the images. For example, for each image a current location may be stored (e.g. obtained from a GPS receiver or input by a user) and the similarity criterion may require that the images are taken within a given distance of each other. Similarly, the similarity criterion may include a requirement that the images are taken within a given time interval of each other.

As another example, a general image characteristic such as a general brightness level of the images may be considered. Also, in some embodiments, a user annotation of the images may be taken into account. For example, the user may manually include an annotation of the images when taken and this annotation may be stored with the image. The user annotation of the current image may then be compared to user annotation of the stored images and

CML033₄7EV the similarity criterion may include e.g. a requirement of at least one user annotation being the same.

In some embodiments, the client device 101 may first determine an initial image category for the image and the image feature data to transmit may be determined based on this category. Thus, a rough image classification can initially be performed on the image. For instance, based on simple image characteristics such as brightness level and variations it can be determined whether the image is an indoor or outdoor image and/or whether flash has been used or not.

When a similar image has been found, the information of the image feature data that was used for the analysis is then used to select a subset of the image feature data to transmit for the current image. For example, if a standardised toolbox is used for the generation of the feature data, the subset processor 213 may select that only the feature data which was generated by the algorithms for which the data was also used when analysing the previous picture is transmitted.

Thus, in the approach, the remote server 107 reports to the client device 101 what feature data was used in its analysis for certain classes of images (or video clips) and the client device 101 stores this in its local data store. The client device 101 then compares the current image with previous images in its data store to determine a suitable set of visual descriptors to send.

As a specific example, FIG. 4 illustrates a frame selected from a house image. The toolbox used by aceMedia

CML033₄7EV computes 13 visual descriptors and a segmentation mask based on MPEG-7 visual descriptors. From experimentation, it is clear that not all of the visual descriptors are needed in order to annotate images and video clips and a subset can be used for certain content.

Specifically, in experiments, the house image required Dominant Colour and the Colour Space Descriptor, as well as a segmentation mask to arrive at the labels house, sky, mountain, and field.

Thus, as shown in the example, only certain descriptors are needed for certain image annotation problems. For all content resembling the house image, it is not necessary to send the complete set of descriptors to the remote server 107 but rather efficient automated annotation can be achieved by sending only the same small set of descriptors that were successfully used to label the specific example frame.

An added advantage is that the user of the client device can also choose to sacrifice some accuracy by using a reduced subset of visual descriptors in order to reduce computation time and hence conserve battery life, for instance.

In some embodiments, the image feature data comprises scalable metadata. Specifically, the visual descriptors may be represented by scalable metadata which provides an increasingly accurate description for an increasing data size. In this case, a similar feedback system may be used to provide information from the remote server of the required accuracy of the scalable metadata. This

CML033₄7EV information may then be used for new images by truncating the scalable metadata at a level indicated by the previous image.

It will be appreciated that although the previous description has focussed on the semantic labelling of individual images, the described approach is equally applicable to images from video sequences. In such cases, individual frames may be analysed individually but in most embodiments the semantic analysis will also take into account the correlation in time of the images. For example, motion data may be included in the semantic analysis .

FIG. 5 illustrates a method of image analysis in accordance with some embodiments of the invention.

The method initiates in step 501 wherein a client device provides at least a first image.

Step 501 is followed by step 503 wherein the client device analyses the image to generate image feature data.

Step 503 is followed by step 505 wherein the client device transmits the image feature data to a remote server for semantic image analysis.

Step 505 is followed by step 507 wherein the remote server receives the image feature data from the client device.

Step 507 is followed by step 509 wherein the remote server generates the semantic image data by performing a

CML033₄7EV semantic analysis for the image in response to the image feature data.

Step 509 is followed by step 511 wherein the remote server transmits the semantic image data to the client device .

Step 511 is followed by step 513 wherein the client device receives the semantic image data from the remote server.

It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional units and processors. However, it will be apparent that any suitable distribution of functionality between different functional units or processors may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processors or controllers may be performed by the same processor or controllers. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or organization.

The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. The invention may optionally be implemented at least partly as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may be physically, functionally and logically implemented

CML033₄7EV in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors .

Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the accompanying claims. Additionally, although a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be combined in accordance with the invention. In the claims, the term comprising does not exclude the presence of other elements or steps.

Furthermore, although individually listed, a plurality of means, elements or method steps may be implemented by e.g. a single unit or processor. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. Also the inclusion of a feature in one category of claims does not imply a limitation to this category but rather indicates that the feature is equally applicable to other claim categories as appropriate.

Furthermore, the order of features in the claims does not imply any specific order in which the features must be worked and in particular the order of individual steps in

CML033₄7EV a method claim does not imply that the steps must be performed in this order. Rather, the steps may be performed in any suitable order.

CML033₄7EV

Claims

1. A system for image analysis, the system comprising: a client device comprising: means for providing at least a first image, first analysing means for analysing the first image to generate image feature data, first transmitting means for transmitting the image feature data to a remote server for semantic image analysis, and means for receiving semantic image data from the remote server; and the remote server comprising: means for receiving the image feature data, second analysing means for generating the semantic image data by performing a semantic analysis for the first image in response to the image feature data, second transmitting means for transmitting the semantic image data to the client device.

2. The system of claim 1 wherein the first analysing means is arranged to segment the image into image segments and to generate the image feature data as image characteristics of the image segments.

3. The system of claim 2 wherein the remote server furthermore comprises means for generating segmentation data for the image by combining image segments indicated by the image feature data and the second transmitting means is arranged to transmit the segmentation data to the client device.

CML033₄7EV

4. The system of claim 1 wherein the remote server comprises means for transmitting a subset indication of a subset of image feature data pertinent to the semantic analysis of at least one analysed image, and the client device comprises subset means for determining the image feature data to transmit to the remote server for the first image in response to the subset indication.

5. The system of claim 4 wherein the subset means is arranged to select a first analysed image in response to a similarity criterion for the analysed image and the first image and to determine the image feature data to transmit for the first image in response to the subset indication for the first analysed image.

6. The system of claim 5 wherein the subset means is arranged to determine a distance indication between image feature data of the first image and image feature data of the first analysed image and to evaluate the similarity criterion in response to the distance indication.

7. The system of claim 5 wherein the subset means is arranged to evaluate the similarity criterion taking into account at least one parameter selected from the group consisting of:

- a location associated with the first image;

- a brightness level for the first image;

- a user annotation of the first image; and

- a time characteristic associated with the first image.

8. The system of any of the claims 4 wherein the client device comprises means for determining an initial image

CML033₄7EV category for the first image and the subset means is arranged to determine the image feature data to transmit in response to the initial image category.

5 10. The system of claim 1 wherein the image feature data comprises scalable metadata.

11. The system of claim 10 wherein the remote server comprises means for transmitting a first indication of 10 metadata pertinent to the semantic analysis of at least one analysed image, and the first analysing means is arranged to determine a scalability level for the scalable metadata in response to the first indication.

15 12. The system of claim 1 wherein the first analysing means is arranged to determine the image feature data in response to visual descriptors for image segments of the first image.

20 13. The system of claim 1 wherein the semantic image data comprises semantic labelling data.

14. The system of claim 1 wherein the semantic analysis is a knowledge assisted semantic analysis.

25

15. The system of claim 1 wherein the first image is an image of a video sequence

16. The system of claim 1 wherein the client device is a 30 mobile communication unit.

17. A communication device comprising: means for providing at least a first image;

CML033₄7EV analysing means for analysing the first image to generate image feature data; transmitting means for transmitting the image feature data to a remote server for semantic image analysis; means for receiving semantic image data determined in response to the image feature data from the remote server.

18. A method of image analysis, the system comprising: a client device performing the steps of: providing at least a first image, analysing the first image to generate image feature data, transmitting the image feature data to a remote server for semantic image analysis, and receiving semantic image data from the remote server; and the remote server performing the steps of: receiving the image feature data, generating the semantic image data by performing a semantic analysis for the first image in response to the image feature data, transmitting the semantic image data to the client device.

CML033₄7EV