WO2008039635A2 - Semantic image analysis - Google Patents

Semantic image analysis Download PDF

Info

Publication number
WO2008039635A2
WO2008039635A2 PCT/US2007/077818 US2007077818W WO2008039635A2 WO 2008039635 A2 WO2008039635 A2 WO 2008039635A2 US 2007077818 W US2007077818 W US 2007077818W WO 2008039635 A2 WO2008039635 A2 WO 2008039635A2
Authority
WO
WIPO (PCT)
Prior art keywords
image
semantic
feature data
data
analysis
Prior art date
Application number
PCT/US2007/077818
Other languages
French (fr)
Other versions
WO2008039635A3 (en
Inventor
Jonathan S. Teh
Paola M. Hobson
Original Assignee
Motorola, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola, Inc. filed Critical Motorola, Inc.
Publication of WO2008039635A2 publication Critical patent/WO2008039635A2/en
Publication of WO2008039635A3 publication Critical patent/WO2008039635A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/16Analogue secrecy systems; Analogue subscription systems
    • H04N7/173Analogue secrecy systems; Analogue subscription systems with two-way working, e.g. subscriber sending a programme selection signal
    • H04N7/17309Transmission or handling of upstream communications
    • H04N7/17318Direct or substantially direct transmission and handling of requests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/432Content retrieval operation from a local storage medium, e.g. hard-disk
    • H04N21/4325Content retrieval operation from a local storage medium, e.g. hard-disk by playing back content from the storage medium
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content

Definitions

  • the invention relates to semantic image analysis and in particular, but not exclusively, to knowledge based semantic labelling of digitally encoded images.
  • semantic labelling algorithms are extremely resource demanding and require very high processor and memory resources. Therefore, semantic labelling is in practice limited to high performance computer systems and even for such systems the image labelling is typically a very slow process thereby making it impractical for labelling of large image collections.
  • the research consortium aceMedia has been formed by a number of companies to develop and research algorithms and processes in the field of knowledge assisted multimedia management.
  • a knowledge-assisted analysis (KAA) module for semantic labelling has been developed.
  • the initial labelling has a duration of around 2 minutes for a relatively small image of 0.5 megapixel on an Intel Pentium P4 2.8GHz personal computer with a memory usage of around 500MB of memory.
  • an improved image analysis system would be advantageous and in particular a system allowing increased flexibility, efficient implementation, increased practicality, improved suitability for low complexity/resource devices and/or improved performance would be advantageous.
  • the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.
  • a system for image analysis comprising:
  • CML033 4 7EV a client device comprising: means for providing at least a first image, first analysing means for analysing the first image to generate image feature data, first transmitting means for transmitting the image feature data to a remote server for semantic image analysis, and means for receiving semantic image data from the remote server; and the remote server comprising: means for receiving the image feature data, second analysing means for generating the semantic image data by performing a semantic analysis for the first image in response to the image feature data, second transmitting means for transmitting the semantic image data to the client device .
  • the invention may allow an improved image analysis system.
  • the system may enable or facilitate image analysis on client devices with reduced capability, complexity and/or computational resource.
  • the client device may for example be a portable device such as a mobile phone or PDA.
  • the client device may be battery powered and the invention may allow increased battery life.
  • the invention may furthermore allow an efficient and/or low resource overhead, and in particular only the image feature data required for the semantic analysis may be communicated to the remote server thereby resulting in a low communication overhead.
  • the approach allows a distributed image analysis which may allow a practical implementation on a low resource client device.
  • performing only enough analysis to determine image feature data at the client device allows both low device resource usage as well as low communication resource usage.
  • a distributed image analysis which may allow a practical implementation on a low resource client device.
  • performing only enough analysis to determine image feature data at the client device allows both low device resource usage as well as low communication resource usage.
  • CML033 4 7EV much faster image analysis may be performed at a remote server with much higher computational power, which can therefore be used for the complex and time consuming analysis to generate semantic image data.
  • the image feature data may comprise a feature vector and/or visual descriptors for image segments of the image.
  • the semantic image data may comprise a semantic labelling of the image.
  • the approach may e.g. be applied to video sequences or to individual images such as pictures or photos.
  • the remote server furthermore comprises means for generating segmentation data for the image by combining image segments indicated by the image feature data and the second transmitting means is arranged to transmit the segmentation data to the client device.
  • the segmentation data may comprise a segmentation map and may include semantic labels for one or more of the segments.
  • the remote server comprises means for transmitting a subset indication of a subset of image feature data pertinent to the semantic analysis of at least one analysed image
  • the client device comprises subset means for determining the image feature data to transmit to the remote server for the first image in response to the subset indication
  • the image feature data pertinent to the semantic analysis of at least one analysed image may be data which meets an impact criterion for the image analysis.
  • the pertinent image data may be data which has been used in the image analysis of the analysed data or data which is required or desired for the image analysis.
  • the second analysing means is arranged to determine a relevance measure for different image feature data to the generated semantic image data and may determine the subset in response to the relevance measure.
  • a communication device comprising: means for providing at least a first image; analysing means for analysing the first image to generate image feature data; transmitting means for transmitting the image feature data to a remote server for semantic image analysis; means for receiving semantic image data determined in response to the image feature data from the remote server.
  • a method of image analysis comprising : a client device performing the steps of: providing at least a first image, analysing the first image to generate image feature data, transmitting the image feature data to a remote server for semantic image
  • CML033 4 7EV analysis and receiving semantic image data from the remote server; and the remote server performing the steps of: receiving the image feature data, generating the semantic image data by performing a semantic analysis for the first image in response to the image feature data, transmitting the semantic image data to the client device .
  • FIG. 1 illustrates an example of a communication system in accordance with some embodiments of the invention
  • FIG. 2 illustrates an example of a client device in accordance with some embodiments of the invention
  • FIG. 3 illustrates an example of an image analysis server in accordance with some embodiments of the invention
  • FIG. 4 illustrates exemplary images on which an image analysis may be performed
  • FIG. 5 illustrates an example of a method of image analysis in accordance with some embodiments of the invention .
  • image analysis of an image from the mobile device is distributed between the mobile device itself and a remote server.
  • the analysis stage of a semantic labelling algorithm is split between the mobile device and the remote server.
  • An image feature extraction algorithm is performed on the mobile device using an image (or video) descriptor toolbox and a knowledge assisted analysis and semantic labelling is performed on the remote server where more computing resources are available.
  • the image analysis of the remote server is then performed based only on the feature vectors which have a smaller footprint in memory (typically 1-5% of the raw image size) .
  • the server then returns the semantic labels and final segmentation map to the client device which stores it together with the image .
  • the system uses a separation or distribution of the image analysis process to generate descriptors at a client device for the purposes of a more complex analysis
  • CML033 4 7EV at a remote server allows the use of automatic semantic annotation of images on a resource- constrained device by distributing the image analysis process where the more complex tasks are executed on a more powerful device of a communication network.
  • the annotation process is a tedious and laborious task, and is particularly difficult on a user interface constrained device such as a mobile handset.
  • the current approach allows a semantic annotation to be automatically performed for the user with the results being provided faster than if the analysis was performed on the mobile device using its limited computational power.
  • the semantic labelling can be provided faster and with lower computational and communication resource usage than systems transferring the image to be analysed to a remote server or other devices (e.g. by connecting the client device to a Personal Computer via a wireless connection) . This may further allow the user to correct any mistakes made by the semantic annotation process while the context in which the picture was taken is still fresh in the user's mind.
  • FIG. 1 illustrates an example of a communication system in accordance with some embodiments of the invention.
  • the communication system comprises a client device 101 for which one or more images or video sequences should be
  • the client device 101 is a wireless device which communicates with a base station 103 over a radio air interface.
  • the base station 103 is coupled to a communication network 105 which is further coupled to a remote server 107.
  • the communication system may for example be a cellular communication system, a wireless local area network (WLAN) or any other wireless communication network or combination of these.
  • the client device 101 is a mobile phone of a cellular communication system, such as a GSM or UMTS system.
  • FIG. 2 illustrates the client device 101 in more detail.
  • the client device 101 comprises an image generator 201 which provides an image to be analysed.
  • the image generator 201 may for example comprise a digital camera which can provide a digital image or video sequence.
  • the client device 101 may be a mobile camera phone.
  • the image to be analysed may simply be received from another source such as an internal storage or an external device.
  • the image generator 201 is coupled to an image processor 203 which analyses the image to generate image feature data.
  • the processing by the image processor 203 may comprise several steps.
  • the image processor 203 is arranged to segment the image into various image segments with corresponding characteristics.
  • the image segmentation may for example be based on colour,
  • the image segmentation will generally generate a number of image segments where the pixels of each segment tend to belong to the same object. For example, for an image of a beach landscape, the initial segmentation may for example generate a number of blue image segments which may be part of the sea, a number of bright image segments which may be part of the sun etc .
  • the image processing may be performed by using algorithms selected from a standardised image or video descriptor toolbox.
  • the initial segmentation may comprise a colour segmentation of up to 8 dominant colours in accordance with the MPEG-7 Dominant Colour descriptor standard.
  • a spatial segmentation can also be performed using motion information. This can be assisted by the motion estimation module present in the video encoding subsystem of the handset.
  • the initial segmentation tends to be an over-segmentation resulting in a potentially large number of image segments for each image object in the picture. For example, any image area corresponding to the sea will typically result in a large number of image segments with varying degrees of blue, green, grey etc pixels.
  • image processor 203 proceeds to generate feature data for the segments.
  • image feature data in the form of an image vector describing visual characteristics of each segment may be generated.
  • the image feature data may for example
  • CML033 4 7EV characterise the colour, brightness level, shape and texture of each image segment. Specifically, the dominant colour can determined for each segment in accordance with to the MPEG-7 Dominant Colour descriptor standard. Other features such as texture can be represented with the MPEG-7 Homogenous Texture descriptor standard
  • the image feature data may not comprise individual characteristics for each segment but rather the initial segments may be grouped together and the image feature data may comprise indications of the characteristics of a group rather than the individual segments .
  • the image processor 203 is coupled to a transmit controller 205 which is further coupled to a transceiver 207.
  • the transmit controller 205 receives the image feature data from the image processor 203 and generates a suitable data message (or plurality of messages) for transmission in accordance with the communication standards of the communication system.
  • the data message is then fed to the transceiver 207 which transmits it to the base station 103 over the air interface.
  • the data message is addressed to the remote server 107 and is forwarded from the base station 103 to the remote server 107 via the communication network 105.
  • a typical 2 megapixel picture in compressed JPEG format (such as those taken with a typical high-end camera phone) tends to be between 100 and 350KB.
  • CML033 4 7EV visual descriptors can be used to extract feature vectors that are represented in RDF (Resource Description Framework) format and which typically results in a size of around 8KB depending on the level of detail. Using suitable compression, this can be reduced by approximately 85%. This represents an overall metadata size of around only 0.3% to 1.2% of the original JPEG compressed image. Assuming an uplink bandwidth of 64kbps, the image would need between 12 and 45 seconds to be transmitted whereas the visual descriptors would only take 0.15 seconds. The bandwidth savings also means cost savings in systems where the user pays for the communication resource used (such as in cellular communication systems) . Where multiple images need to be automatically annotated, the upload time can be very impractical for complete images.
  • RDF Resource Description Framework
  • FIG. 3 illustrates the remote server 107 in more detail.
  • the remote server 107 comprises a network interface 301 which interfaces the remote server 107 to the communication network 105.
  • the network interface 301 receives the data message from the client device 101.
  • the network interface 301 is furthermore coupled to a network receive processor 303 which is forwarded the received data message.
  • the network receive processor 303 extracts the image feature data from the received data message and feeds it to a semantic label processor 305 coupled to the network receive processor 303.
  • the semantic label processor 305 is arranged to perform a semantic analysis of the image based on the image feature data received from the client device 101. Specifically,
  • the semantic label processor 305 can perform a semantic labelling analysis which is based only on the image feature data.
  • the visual descriptors and image characteristics of the segments generated by the image processor 203 are used by the semantic label processor 305 to generate semantic labels for the image.
  • the semantic label processor 305 may combine image segments into larger image segments corresponding to individual objects and may label the objects.
  • a number of image segments may be combined into a single image object which is then labelled "sea".
  • the semantic label processor 305 performs a knowledge assisted semantic analysis to generate the semantic labels.
  • the semantic label processor 305 examines the image feature data relating to the image being analysed, and assigns to each region a list of possible concepts along with a degree of confidence. Those concepts are used (along with the degrees and spatial information of the regions) for the construction of an RDF description that is the actual system's output: a semantic interpretation of the multimedia content.
  • the initial labelling can be made. This initial labelling is further improved by taking into account the semantic relevance of the labels,
  • CML033 4 7EV including whether they are contained in a domain ontology relating to the scene (e.g. a label of "bear" for an image region in a beach scene would be rejected) and taking into account spatial context, i.e. how certain concepts are usually related in terms of their spatial arrangement (e.g. usually sky is depicted above the sea, and an aeroplane will be usually depicted within the sky and not within the sea) . After discarding false labels there will be further use of the spatial knowledge to refine the regions, e.g. merging regions that all depict sky into one big sky region.
  • the semantic label processor 305 is coupled to a network transmit processor 307 which is coupled to the network interface 301.
  • the semantic label processor 305 provides the semantic label data to the network transmit processor 307 which generates a suitable data message and transmits
  • the client device 101 furthermore comprises a receive controller 209 which is coupled to the transceiver 207.
  • receive controller 209 When the data message is received from the remote server 107, it is forwarded to the receive controller 209 which extracts the semantic label data.
  • the receive controller 209 is coupled to a store processor 211 which is furthermore coupled to the image generator 201.
  • the store processor 211 is arranged to store the semantic label data together with the image in an internal or external image store.
  • the automatically generated semantic labelling can then be used by the user or other applications to e.g. identify and search for images.
  • the user may furthermore be provided with an opportunity to modify or correct the generated semantic labelling data.
  • the semantic label processor 305 also generates larger image segments corresponding to image object or regions.
  • the initial over segmentation performed by the image processor 203 of the client device 101 is transformed into a semantically more significant segmentation.
  • the segmentation information may be transmitted to the client device 101 and specifically a segmentation map of the picture may be transmitted to the client device 101 and stored with the semantic labelling and the image.
  • the described approach enables or facilitates the use of automatic semantic annotation of images on a resource-constrained device by distributing the image
  • only a subset of the generated feature data is transmitted from the client device to the remote server.
  • a feedback system is implemented wherein the data used in the analysis of previously annotated images is used to determine an appropriate subset of feature data to transmit.
  • the remote server 107 also comprises an analysis subset processor 309 which can generate and transmit a subset indication of a subset of image feature data which is pertinent to the semantic analysis of the image being analysed.
  • the semantic label processor 305 also evaluates which image feature data was particularly relevant and which image feature data was of little significance.
  • the semantic label processor 305 may identify that only the colour descriptors had been used to generate high confidence region labels, and therefore additional descriptors such as motion or shape would not be useful in the decision making for further images of the same type of scene.
  • additional descriptors such as motion or shape would not be useful in the decision making for further images of the same type of scene.
  • a descriptor that applies over a larger region is sufficient for high confidence labels and thus descriptors for smaller individual regions would not be needed.
  • the analysis subset processor 309 may in some embodiments receive a relevance indication for how relevant or
  • This relevance indication may then be used to determine whether the corresponding data should be indicated as pertinent or non-pertinent.
  • the relevance indications may directly be transmitted to the client device 101 and used to determine which image feature data to include for future images .
  • the analysis subset processor 309 is coupled to the network transmit processor 307 which is arranged to receive the indication of the relevant data and to include this in the data message transmitted to the client device 101.
  • the client device 101 When the client device 101 receives the data message from the remote server 105 it stores the indication of the feature data that was used for the analysis together with the semantic labelling.
  • the client device 101 furthermore comprises a subset processor 213 which is arranged to determine a subset of the generated image feature data which should be transmitted to the image to be analysed depending on the data that was used for previous images.
  • the feedback system allows a learning system wherein information of the relevant feature data for different images is gradually built up and is used to reduce the data transmitted when a new image is analysed. This can reduce the amount of time spent computing the feature data and can reduce the transmission time and communication resource usage.
  • the approach utilises the fact that the knowledge-assisted analysis may not require all possible visual descriptors of all segments in order
  • CML033 4 7EV to perform an accurate labelling and that transmitting additional feature vectors yields diminishing returns in accuracy.
  • the subset processor 213 specifically identifies an image from the stored collection of already analysed pictures which is similar to the current image to be analysed. It will be appreciated that any suitable similarity criterion can be used. For example, the subset processor 213 can evaluate a distance criterion for the feature vectors on a global scale e.g. by plotting the feature vectors from the current image and already analysed images and computing local distance between them using the Euclidean distance or other metric.
  • the similarity criterion may also take into account locations associated with the images. For example, for each image a current location may be stored (e.g. obtained from a GPS receiver or input by a user) and the similarity criterion may require that the images are taken within a given distance of each other. Similarly, the similarity criterion may include a requirement that the images are taken within a given time interval of each other.
  • a general image characteristic such as a general brightness level of the images may be considered.
  • a user annotation of the images may be taken into account. For example, the user may manually include an annotation of the images when taken and this annotation may be stored with the image. The user annotation of the current image may then be compared to user annotation of the stored images and
  • CML033 4 7EV the similarity criterion may include e.g. a requirement of at least one user annotation being the same.
  • the client device 101 may first determine an initial image category for the image and the image feature data to transmit may be determined based on this category.
  • a rough image classification can initially be performed on the image. For instance, based on simple image characteristics such as brightness level and variations it can be determined whether the image is an indoor or outdoor image and/or whether flash has been used or not.
  • the information of the image feature data that was used for the analysis is then used to select a subset of the image feature data to transmit for the current image. For example, if a standardised toolbox is used for the generation of the feature data, the subset processor 213 may select that only the feature data which was generated by the algorithms for which the data was also used when analysing the previous picture is transmitted.
  • the remote server 107 reports to the client device 101 what feature data was used in its analysis for certain classes of images (or video clips) and the client device 101 stores this in its local data store. The client device 101 then compares the current image with previous images in its data store to determine a suitable set of visual descriptors to send.
  • FIG. 4 illustrates a frame selected from a house image.
  • CML033 4 7EV computes 13 visual descriptors and a segmentation mask based on MPEG-7 visual descriptors. From experimentation, it is clear that not all of the visual descriptors are needed in order to annotate images and video clips and a subset can be used for certain content.
  • the house image required Dominant Colour and the Colour Space Descriptor, as well as a segmentation mask to arrive at the labels house, sky, mountain, and field.
  • An added advantage is that the user of the client device can also choose to sacrifice some accuracy by using a reduced subset of visual descriptors in order to reduce computation time and hence conserve battery life, for instance.
  • the image feature data comprises scalable metadata.
  • the visual descriptors may be represented by scalable metadata which provides an increasingly accurate description for an increasing data size.
  • a similar feedback system may be used to provide information from the remote server of the required accuracy of the scalable metadata.
  • CML033 4 7EV information may then be used for new images by truncating the scalable metadata at a level indicated by the previous image.
  • FIG. 5 illustrates a method of image analysis in accordance with some embodiments of the invention.
  • the method initiates in step 501 wherein a client device provides at least a first image.
  • Step 501 is followed by step 503 wherein the client device analyses the image to generate image feature data.
  • Step 503 is followed by step 505 wherein the client device transmits the image feature data to a remote server for semantic image analysis.
  • Step 505 is followed by step 507 wherein the remote server receives the image feature data from the client device.
  • Step 507 is followed by step 509 wherein the remote server generates the semantic image data by performing a
  • Step 509 is followed by step 511 wherein the remote server transmits the semantic image data to the client device .
  • Step 511 is followed by step 513 wherein the client device receives the semantic image data from the remote server.
  • the invention can be implemented in any suitable form including hardware, software, firmware or any combination of these.
  • the invention may optionally be implemented at least partly as computer software running on one or more data processors and/or digital signal processors.
  • the elements and components of an embodiment of the invention may be physically, functionally and logically implemented
  • CML033 4 7EV in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors .

Abstract

A system for image analysis comprises a client device (101) with an image processor (203) which analyses an image to generate image feature data. A transmitter (205, 207) transmits the image feature data to a remote server (107) for semantic image analysis. The remote server (107) comprises a receiver (301, 303) which receives the image feature data. A semantic label processor (305) then generates the semantic image data by performing a semantic analysis for the image in response to the image feature data. A transmitter (307, 301) transmits the semantic image data to the client device (101) where it is received by a receiver (207,209). The distributed analysis allows complex semantic image analysis to be performed efficiently for low complexity and limited resource devices while maintaining a low communication resource usage.

Description

SEMANTIC IMAGE ANALYSIS
Field of the invention
The invention relates to semantic image analysis and in particular, but not exclusively, to knowledge based semantic labelling of digitally encoded images.
Background of the Invention
As images are increasingly stored, distributed and processed as digitally encoded images, such as individual images or video images, the amount and variety of encoded images has increased substantially.
However, the increasing amount of image data has increased the need and desirability of automated and technical processing of pictures or video sequences with no or less human input or involvement. For example, manual human analysis and indexing of images, such as photos, is frequently used when managing image collections. However, such operations are very cumbersome and time consuming in the human domain and there is a desire to increasingly perform such operations as automated or semi-automated processes in the technical domain .
Specifically, automatic semantic labelling of images is an area that has attracted significant research interest. Conventional image analysis algorithms have been developed to describe an image in terms of its colour histogram, edges, lines and texture type. The results of the image analyses have been used to segment images into multiple closed regions. More recent algorithms have combined these techniques with knowledge-assisted techniques that use ontologies and domain knowledge to combine regions in the typically oversegmented image and to assign semantic labels to these regions.
However, semantic labelling algorithms are extremely resource demanding and require very high processor and memory resources. Therefore, semantic labelling is in practice limited to high performance computer systems and even for such systems the image labelling is typically a very slow process thereby making it impractical for labelling of large image collections.
The research consortium aceMedia has been formed by a number of companies to develop and research algorithms and processes in the field of knowledge assisted multimedia management. Within the consortium, a knowledge-assisted analysis (KAA) module for semantic labelling has been developed. For this system, the initial labelling has a duration of around 2 minutes for a relatively small image of 0.5 megapixel on an Intel Pentium P4 2.8GHz personal computer with a memory usage of around 500MB of memory.
The complexity and resource demand renders conventional semantic image labelling impractical for lower resource systems. However, such automated annotation would be of particularly high importance for many low complexity, mobile and/or user interface constrained devices, such as mobile phones or Personal Digital Assistants.
CML03347EV Furthermore, even if the required computational resource is available the high processing need results in a high power consumption thereby making the semantic labelling suboptimal or impractical for many battery powered devices .
For example, many high-end mobile devices, such as mobile phones, today include a 2 megapixel or higher resolution cameras and these lead to images of significant size (typically around 100-350KB for a 2 megapixel image) . Semantic labelling of these images on the device, if at all possible, will result in extremely long processing times and a significant reduction in the battery life. Transmission of such an image or video clip to a server for processing is also time consuming.
Hence, an improved image analysis system would be advantageous and in particular a system allowing increased flexibility, efficient implementation, increased practicality, improved suitability for low complexity/resource devices and/or improved performance would be advantageous.
Summary of the Invention
Accordingly, the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.
According to a first aspect of the invention there is provided a system for image analysis, the system comprising :
CML03347EV a client device comprising: means for providing at least a first image, first analysing means for analysing the first image to generate image feature data, first transmitting means for transmitting the image feature data to a remote server for semantic image analysis, and means for receiving semantic image data from the remote server; and the remote server comprising: means for receiving the image feature data, second analysing means for generating the semantic image data by performing a semantic analysis for the first image in response to the image feature data, second transmitting means for transmitting the semantic image data to the client device .
The invention may allow an improved image analysis system. In particular, the system may enable or facilitate image analysis on client devices with reduced capability, complexity and/or computational resource. The client device may for example be a portable device such as a mobile phone or PDA. The client device may be battery powered and the invention may allow increased battery life. The invention may furthermore allow an efficient and/or low resource overhead, and in particular only the image feature data required for the semantic analysis may be communicated to the remote server thereby resulting in a low communication overhead.
The approach allows a distributed image analysis which may allow a practical implementation on a low resource client device. In particular, performing only enough analysis to determine image feature data at the client device allows both low device resource usage as well as low communication resource usage. In many embodiments, a
CML03347EV much faster image analysis may be performed at a remote server with much higher computational power, which can therefore be used for the complex and time consuming analysis to generate semantic image data.
The image feature data may comprise a feature vector and/or visual descriptors for image segments of the image. The semantic image data may comprise a semantic labelling of the image.
The approach may e.g. be applied to video sequences or to individual images such as pictures or photos.
In accordance with an optional feature of the invention, the remote server furthermore comprises means for generating segmentation data for the image by combining image segments indicated by the image feature data and the second transmitting means is arranged to transmit the segmentation data to the client device.
This may allow an efficient and high performance image analysis. The segmentation data may comprise a segmentation map and may include semantic labels for one or more of the segments.
In accordance with an optional feature of the invention, the remote server comprises means for transmitting a subset indication of a subset of image feature data pertinent to the semantic analysis of at least one analysed image, and the client device comprises subset means for determining the image feature data to transmit to the remote server for the first image in response to the subset indication.
CML03347EV This may allow increased efficiency and may in particular allow reduced communication resource usage while allowing a high performance image analysis. The image feature data pertinent to the semantic analysis of at least one analysed image may be data which meets an impact criterion for the image analysis. For example, the pertinent image data may be data which has been used in the image analysis of the analysed data or data which is required or desired for the image analysis. In some embodiments, the second analysing means is arranged to determine a relevance measure for different image feature data to the generated semantic image data and may determine the subset in response to the relevance measure.
According to another aspect of the invention, there is provided a communication device comprising: means for providing at least a first image; analysing means for analysing the first image to generate image feature data; transmitting means for transmitting the image feature data to a remote server for semantic image analysis; means for receiving semantic image data determined in response to the image feature data from the remote server.
According to another aspect of the invention, there is provided a method of image analysis, the system comprising : a client device performing the steps of: providing at least a first image, analysing the first image to generate image feature data, transmitting the image feature data to a remote server for semantic image
CML03347EV analysis, and receiving semantic image data from the remote server; and the remote server performing the steps of: receiving the image feature data, generating the semantic image data by performing a semantic analysis for the first image in response to the image feature data, transmitting the semantic image data to the client device .
These and other aspects, features and advantages of the invention will be apparent from and elucidated with reference to the embodiment (s) described hereinafter.
Brief Description of the Drawings
Embodiments of the invention will be described, by way of example only, with reference to the drawings, in which
FIG. 1 illustrates an example of a communication system in accordance with some embodiments of the invention;
FIG. 2 illustrates an example of a client device in accordance with some embodiments of the invention;
FIG. 3 illustrates an example of an image analysis server in accordance with some embodiments of the invention;
FIG. 4 illustrates exemplary images on which an image analysis may be performed; and
FIG. 5 illustrates an example of a method of image analysis in accordance with some embodiments of the invention .
CML03347EV Detailed Description of Some Embodiments of the Invention
The following description focuses on embodiments of the invention applicable to an application wherein images of a mobile phone are analysed. However, it will be appreciated that the invention is not limited to this application but may be applied to many other systems and devices.
In the described examples, image analysis of an image from the mobile device is distributed between the mobile device itself and a remote server. In particular, the analysis stage of a semantic labelling algorithm is split between the mobile device and the remote server. An image feature extraction algorithm is performed on the mobile device using an image (or video) descriptor toolbox and a knowledge assisted analysis and semantic labelling is performed on the remote server where more computing resources are available. Thus, rather than transmitting the whole image to the remote server, only the feature vectors are sent to the remote server. The image analysis of the remote server is then performed based only on the feature vectors which have a smaller footprint in memory (typically 1-5% of the raw image size) . The server then returns the semantic labels and final segmentation map to the client device which stores it together with the image .
Thus, the system uses a separation or distribution of the image analysis process to generate descriptors at a client device for the purposes of a more complex analysis
CML03347EV at a remote server. The approach allows the use of automatic semantic annotation of images on a resource- constrained device by distributing the image analysis process where the more complex tasks are executed on a more powerful device of a communication network.
Thus, whereas users prefer to annotate images shortly after they have been taken, the annotation process is a tedious and laborious task, and is particularly difficult on a user interface constrained device such as a mobile handset. The current approach allows a semantic annotation to be automatically performed for the user with the results being provided faster than if the analysis was performed on the mobile device using its limited computational power. Furthermore, as some analysis steps are performed on the client device to generate compact image feature data suitable for automatic semantic labelling, the semantic labelling can be provided faster and with lower computational and communication resource usage than systems transferring the image to be analysed to a remote server or other devices (e.g. by connecting the client device to a Personal Computer via a wireless connection) . This may further allow the user to correct any mistakes made by the semantic annotation process while the context in which the picture was taken is still fresh in the user's mind.
FIG. 1 illustrates an example of a communication system in accordance with some embodiments of the invention.
The communication system comprises a client device 101 for which one or more images or video sequences should be
CML03347EV analysed. In the example, the client device 101 is a wireless device which communicates with a base station 103 over a radio air interface. The base station 103 is coupled to a communication network 105 which is further coupled to a remote server 107.
The communication system may for example be a cellular communication system, a wireless local area network (WLAN) or any other wireless communication network or combination of these. In the specific example, the client device 101 is a mobile phone of a cellular communication system, such as a GSM or UMTS system.
FIG. 2 illustrates the client device 101 in more detail.
The client device 101 comprises an image generator 201 which provides an image to be analysed. The image generator 201 may for example comprise a digital camera which can provide a digital image or video sequence. Specifically, the client device 101 may be a mobile camera phone. In other embodiments, the image to be analysed may simply be received from another source such as an internal storage or an external device.
The image generator 201 is coupled to an image processor 203 which analyses the image to generate image feature data. The processing by the image processor 203 may comprise several steps.
In the specific example, the image processor 203 is arranged to segment the image into various image segments with corresponding characteristics. The image segmentation may for example be based on colour,
CML03347EV brightness level etc. The image segmentation will generally generate a number of image segments where the pixels of each segment tend to belong to the same object. For example, for an image of a beach landscape, the initial segmentation may for example generate a number of blue image segments which may be part of the sea, a number of bright image segments which may be part of the sun etc .
Specifically, the image processing may be performed by using algorithms selected from a standardised image or video descriptor toolbox. For example, the initial segmentation may comprise a colour segmentation of up to 8 dominant colours in accordance with the MPEG-7 Dominant Colour descriptor standard.
Furthermore, for a video sequence a spatial segmentation can also be performed using motion information. This can be assisted by the motion estimation module present in the video encoding subsystem of the handset.
The initial segmentation tends to be an over-segmentation resulting in a potentially large number of image segments for each image object in the picture. For example, any image area corresponding to the sea will typically result in a large number of image segments with varying degrees of blue, green, grey etc pixels.
Following the initial segmentation, the image processor 203 proceeds to generate feature data for the segments. Thus image feature data in the form of an image vector describing visual characteristics of each segment may be generated. The image feature data may for example
CML03347EV characterise the colour, brightness level, shape and texture of each image segment. Specifically, the dominant colour can determined for each segment in accordance with to the MPEG-7 Dominant Colour descriptor standard. Other features such as texture can be represented with the MPEG-7 Homogenous Texture descriptor standard
In some embodiments, the image feature data may not comprise individual characteristics for each segment but rather the initial segments may be grouped together and the image feature data may comprise indications of the characteristics of a group rather than the individual segments .
The image processor 203 is coupled to a transmit controller 205 which is further coupled to a transceiver 207. The transmit controller 205 receives the image feature data from the image processor 203 and generates a suitable data message (or plurality of messages) for transmission in accordance with the communication standards of the communication system. The data message is then fed to the transceiver 207 which transmits it to the base station 103 over the air interface. The data message is addressed to the remote server 107 and is forwarded from the base station 103 to the remote server 107 via the communication network 105.
By generating and transmitting only the image feature data rather than the full image, a much faster and reduced resource demanding system is achieved. For example, a typical 2 megapixel picture in compressed JPEG format (such as those taken with a typical high-end camera phone) tends to be between 100 and 350KB. The
CML03347EV visual descriptors can be used to extract feature vectors that are represented in RDF (Resource Description Framework) format and which typically results in a size of around 8KB depending on the level of detail. Using suitable compression, this can be reduced by approximately 85%. This represents an overall metadata size of around only 0.3% to 1.2% of the original JPEG compressed image. Assuming an uplink bandwidth of 64kbps, the image would need between 12 and 45 seconds to be transmitted whereas the visual descriptors would only take 0.15 seconds. The bandwidth savings also means cost savings in systems where the user pays for the communication resource used (such as in cellular communication systems) . Where multiple images need to be automatically annotated, the upload time can be very impractical for complete images.
FIG. 3 illustrates the remote server 107 in more detail. The remote server 107 comprises a network interface 301 which interfaces the remote server 107 to the communication network 105. The network interface 301 receives the data message from the client device 101.
The network interface 301 is furthermore coupled to a network receive processor 303 which is forwarded the received data message. The network receive processor 303 extracts the image feature data from the received data message and feeds it to a semantic label processor 305 coupled to the network receive processor 303.
The semantic label processor 305 is arranged to perform a semantic analysis of the image based on the image feature data received from the client device 101. Specifically,
CML03347EV the semantic label processor 305 can perform a semantic labelling analysis which is based only on the image feature data. Thus, in the example, the visual descriptors and image characteristics of the segments generated by the image processor 203 are used by the semantic label processor 305 to generate semantic labels for the image. For example, the semantic label processor 305 may combine image segments into larger image segments corresponding to individual objects and may label the objects. Thus, in the beach picture example, a number of image segments may be combined into a single image object which is then labelled "sea".
It will be appreciated that any suitable algorithm or approach to perform the semantic analysis of the image can be used without detracting from the invention.
In a specific example, the semantic label processor 305 performs a knowledge assisted semantic analysis to generate the semantic labels.
Specifically, the semantic label processor 305 examines the image feature data relating to the image being analysed, and assigns to each region a list of possible concepts along with a degree of confidence. Those concepts are used (along with the degrees and spatial information of the regions) for the construction of an RDF description that is the actual system's output: a semantic interpretation of the multimedia content. By comparison of the low level features of the image regions with prototypical examples, the initial labelling can be made. This initial labelling is further improved by taking into account the semantic relevance of the labels,
CML03347EV including whether they are contained in a domain ontology relating to the scene (e.g. a label of "bear" for an image region in a beach scene would be rejected) and taking into account spatial context, i.e. how certain concepts are usually related in terms of their spatial arrangement (e.g. usually sky is depicted above the sea, and an aeroplane will be usually depicted within the sky and not within the sea) . After discarding false labels there will be further use of the spatial knowledge to refine the regions, e.g. merging regions that all depict sky into one big sky region. More information may be found in the aceMedia annual public report for 2005 available at www.acemedia.org and also "Knowledge- Assisted Video Analysis Using a Genetic Algorithm", N. Voisine, S. Dasiopoulou, V. Mezaris, E. Spyrou, T. Athanasiadis, I. Kompatsiaris, Y . Avrithis, M. G. Strintzis, Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005) , Montreux, Switzerland, April 13-15, 2005 and "Using a Multimedia Ontology Infrastructure for Semantic Annotation of Multimedia Content", Thanos Athanasiadis, Vassilis Tzouvaras, Kosmas Petridis, Frederic Precioso, Yannis Avrithis and Yiannis Kompatsiaris, 5th International Workshop on Knowledge Markup and Semantic Annotation (SemAnnot 2005) at the 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, Nov. 2005 .
The semantic label processor 305 is coupled to a network transmit processor 307 which is coupled to the network interface 301. The semantic label processor 305 provides the semantic label data to the network transmit processor 307 which generates a suitable data message and transmits
CML03347EV this to the client device 101 using the network interface 301.
The client device 101 furthermore comprises a receive controller 209 which is coupled to the transceiver 207. When the data message is received from the remote server 107, it is forwarded to the receive controller 209 which extracts the semantic label data. The receive controller 209 is coupled to a store processor 211 which is furthermore coupled to the image generator 201. The store processor 211 is arranged to store the semantic label data together with the image in an internal or external image store. The automatically generated semantic labelling can then be used by the user or other applications to e.g. identify and search for images. In some embodiments, the user may furthermore be provided with an opportunity to modify or correct the generated semantic labelling data.
In the example, the semantic label processor 305 also generates larger image segments corresponding to image object or regions. Thus, the initial over segmentation performed by the image processor 203 of the client device 101 is transformed into a semantically more significant segmentation. In some embodiments, the segmentation information may be transmitted to the client device 101 and specifically a segmentation map of the picture may be transmitted to the client device 101 and stored with the semantic labelling and the image.
Thus, the described approach enables or facilitates the use of automatic semantic annotation of images on a resource-constrained device by distributing the image
CML03347EV analysis process such that the more complex tasks are executed on a more powerful remote server.
In some embodiments, only a subset of the generated feature data is transmitted from the client device to the remote server. Specifically, a feedback system is implemented wherein the data used in the analysis of previously annotated images is used to determine an appropriate subset of feature data to transmit.
In this example, the remote server 107 also comprises an analysis subset processor 309 which can generate and transmit a subset indication of a subset of image feature data which is pertinent to the semantic analysis of the image being analysed. Thus, when an image is semantically analysed, the semantic label processor 305 also evaluates which image feature data was particularly relevant and which image feature data was of little significance.
For example, the semantic label processor 305 may identify that only the colour descriptors had been used to generate high confidence region labels, and therefore additional descriptors such as motion or shape would not be useful in the decision making for further images of the same type of scene. In addition, for some scenes a descriptor that applies over a larger region is sufficient for high confidence labels and thus descriptors for smaller individual regions would not be needed.
The analysis subset processor 309 may in some embodiments receive a relevance indication for how relevant or
CML03347EV important the different parameters of the image feature data was for the analysis. This relevance indication may then be used to determine whether the corresponding data should be indicated as pertinent or non-pertinent. In some embodiments, the relevance indications may directly be transmitted to the client device 101 and used to determine which image feature data to include for future images .
The analysis subset processor 309 is coupled to the network transmit processor 307 which is arranged to receive the indication of the relevant data and to include this in the data message transmitted to the client device 101.
When the client device 101 receives the data message from the remote server 105 it stores the indication of the feature data that was used for the analysis together with the semantic labelling. The client device 101 furthermore comprises a subset processor 213 which is arranged to determine a subset of the generated image feature data which should be transmitted to the image to be analysed depending on the data that was used for previous images.
Thus, the feedback system allows a learning system wherein information of the relevant feature data for different images is gradually built up and is used to reduce the data transmitted when a new image is analysed. This can reduce the amount of time spent computing the feature data and can reduce the transmission time and communication resource usage. The approach utilises the fact that the knowledge-assisted analysis may not require all possible visual descriptors of all segments in order
CML03347EV to perform an accurate labelling and that transmitting additional feature vectors yields diminishing returns in accuracy.
The subset processor 213 specifically identifies an image from the stored collection of already analysed pictures which is similar to the current image to be analysed. It will be appreciated that any suitable similarity criterion can be used. For example, the subset processor 213 can evaluate a distance criterion for the feature vectors on a global scale e.g. by plotting the feature vectors from the current image and already analysed images and computing local distance between them using the Euclidean distance or other metric.
Additionally or alternatively, the similarity criterion may also take into account locations associated with the images. For example, for each image a current location may be stored (e.g. obtained from a GPS receiver or input by a user) and the similarity criterion may require that the images are taken within a given distance of each other. Similarly, the similarity criterion may include a requirement that the images are taken within a given time interval of each other.
As another example, a general image characteristic such as a general brightness level of the images may be considered. Also, in some embodiments, a user annotation of the images may be taken into account. For example, the user may manually include an annotation of the images when taken and this annotation may be stored with the image. The user annotation of the current image may then be compared to user annotation of the stored images and
CML03347EV the similarity criterion may include e.g. a requirement of at least one user annotation being the same.
In some embodiments, the client device 101 may first determine an initial image category for the image and the image feature data to transmit may be determined based on this category. Thus, a rough image classification can initially be performed on the image. For instance, based on simple image characteristics such as brightness level and variations it can be determined whether the image is an indoor or outdoor image and/or whether flash has been used or not.
When a similar image has been found, the information of the image feature data that was used for the analysis is then used to select a subset of the image feature data to transmit for the current image. For example, if a standardised toolbox is used for the generation of the feature data, the subset processor 213 may select that only the feature data which was generated by the algorithms for which the data was also used when analysing the previous picture is transmitted.
Thus, in the approach, the remote server 107 reports to the client device 101 what feature data was used in its analysis for certain classes of images (or video clips) and the client device 101 stores this in its local data store. The client device 101 then compares the current image with previous images in its data store to determine a suitable set of visual descriptors to send.
As a specific example, FIG. 4 illustrates a frame selected from a house image. The toolbox used by aceMedia
CML03347EV computes 13 visual descriptors and a segmentation mask based on MPEG-7 visual descriptors. From experimentation, it is clear that not all of the visual descriptors are needed in order to annotate images and video clips and a subset can be used for certain content.
Specifically, in experiments, the house image required Dominant Colour and the Colour Space Descriptor, as well as a segmentation mask to arrive at the labels house, sky, mountain, and field.
Thus, as shown in the example, only certain descriptors are needed for certain image annotation problems. For all content resembling the house image, it is not necessary to send the complete set of descriptors to the remote server 107 but rather efficient automated annotation can be achieved by sending only the same small set of descriptors that were successfully used to label the specific example frame.
An added advantage is that the user of the client device can also choose to sacrifice some accuracy by using a reduced subset of visual descriptors in order to reduce computation time and hence conserve battery life, for instance.
In some embodiments, the image feature data comprises scalable metadata. Specifically, the visual descriptors may be represented by scalable metadata which provides an increasingly accurate description for an increasing data size. In this case, a similar feedback system may be used to provide information from the remote server of the required accuracy of the scalable metadata. This
CML03347EV information may then be used for new images by truncating the scalable metadata at a level indicated by the previous image.
It will be appreciated that although the previous description has focussed on the semantic labelling of individual images, the described approach is equally applicable to images from video sequences. In such cases, individual frames may be analysed individually but in most embodiments the semantic analysis will also take into account the correlation in time of the images. For example, motion data may be included in the semantic analysis .
FIG. 5 illustrates a method of image analysis in accordance with some embodiments of the invention.
The method initiates in step 501 wherein a client device provides at least a first image.
Step 501 is followed by step 503 wherein the client device analyses the image to generate image feature data.
Step 503 is followed by step 505 wherein the client device transmits the image feature data to a remote server for semantic image analysis.
Step 505 is followed by step 507 wherein the remote server receives the image feature data from the client device.
Step 507 is followed by step 509 wherein the remote server generates the semantic image data by performing a
CML03347EV semantic analysis for the image in response to the image feature data.
Step 509 is followed by step 511 wherein the remote server transmits the semantic image data to the client device .
Step 511 is followed by step 513 wherein the client device receives the semantic image data from the remote server.
It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional units and processors. However, it will be apparent that any suitable distribution of functionality between different functional units or processors may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processors or controllers may be performed by the same processor or controllers. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or organization.
The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. The invention may optionally be implemented at least partly as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may be physically, functionally and logically implemented
CML03347EV in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors .
Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the accompanying claims. Additionally, although a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be combined in accordance with the invention. In the claims, the term comprising does not exclude the presence of other elements or steps.
Furthermore, although individually listed, a plurality of means, elements or method steps may be implemented by e.g. a single unit or processor. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. Also the inclusion of a feature in one category of claims does not imply a limitation to this category but rather indicates that the feature is equally applicable to other claim categories as appropriate.
Furthermore, the order of features in the claims does not imply any specific order in which the features must be worked and in particular the order of individual steps in
CML03347EV a method claim does not imply that the steps must be performed in this order. Rather, the steps may be performed in any suitable order.
CML03347EV

Claims

1. A system for image analysis, the system comprising: a client device comprising: means for providing at least a first image, first analysing means for analysing the first image to generate image feature data, first transmitting means for transmitting the image feature data to a remote server for semantic image analysis, and means for receiving semantic image data from the remote server; and the remote server comprising: means for receiving the image feature data, second analysing means for generating the semantic image data by performing a semantic analysis for the first image in response to the image feature data, second transmitting means for transmitting the semantic image data to the client device.
2. The system of claim 1 wherein the first analysing means is arranged to segment the image into image segments and to generate the image feature data as image characteristics of the image segments.
3. The system of claim 2 wherein the remote server furthermore comprises means for generating segmentation data for the image by combining image segments indicated by the image feature data and the second transmitting means is arranged to transmit the segmentation data to the client device.
CML03347EV
4. The system of claim 1 wherein the remote server comprises means for transmitting a subset indication of a subset of image feature data pertinent to the semantic analysis of at least one analysed image, and the client device comprises subset means for determining the image feature data to transmit to the remote server for the first image in response to the subset indication.
5. The system of claim 4 wherein the subset means is arranged to select a first analysed image in response to a similarity criterion for the analysed image and the first image and to determine the image feature data to transmit for the first image in response to the subset indication for the first analysed image.
6. The system of claim 5 wherein the subset means is arranged to determine a distance indication between image feature data of the first image and image feature data of the first analysed image and to evaluate the similarity criterion in response to the distance indication.
7. The system of claim 5 wherein the subset means is arranged to evaluate the similarity criterion taking into account at least one parameter selected from the group consisting of:
- a location associated with the first image;
- a brightness level for the first image;
- a user annotation of the first image; and
- a time characteristic associated with the first image.
8. The system of any of the claims 4 wherein the client device comprises means for determining an initial image
CML03347EV category for the first image and the subset means is arranged to determine the image feature data to transmit in response to the initial image category.
5 10. The system of claim 1 wherein the image feature data comprises scalable metadata.
11. The system of claim 10 wherein the remote server comprises means for transmitting a first indication of 10 metadata pertinent to the semantic analysis of at least one analysed image, and the first analysing means is arranged to determine a scalability level for the scalable metadata in response to the first indication.
15 12. The system of claim 1 wherein the first analysing means is arranged to determine the image feature data in response to visual descriptors for image segments of the first image.
20 13. The system of claim 1 wherein the semantic image data comprises semantic labelling data.
14. The system of claim 1 wherein the semantic analysis is a knowledge assisted semantic analysis.
25
15. The system of claim 1 wherein the first image is an image of a video sequence
16. The system of claim 1 wherein the client device is a 30 mobile communication unit.
17. A communication device comprising: means for providing at least a first image;
CML03347EV analysing means for analysing the first image to generate image feature data; transmitting means for transmitting the image feature data to a remote server for semantic image analysis; means for receiving semantic image data determined in response to the image feature data from the remote server.
18. A method of image analysis, the system comprising: a client device performing the steps of: providing at least a first image, analysing the first image to generate image feature data, transmitting the image feature data to a remote server for semantic image analysis, and receiving semantic image data from the remote server; and the remote server performing the steps of: receiving the image feature data, generating the semantic image data by performing a semantic analysis for the first image in response to the image feature data, transmitting the semantic image data to the client device.
CML03347EV
PCT/US2007/077818 2006-09-27 2007-09-07 Semantic image analysis WO2008039635A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0619039.1 2006-09-27
GB0619039A GB2442255B (en) 2006-09-27 2006-09-27 Semantic image analysis

Publications (2)

Publication Number Publication Date
WO2008039635A2 true WO2008039635A2 (en) 2008-04-03
WO2008039635A3 WO2008039635A3 (en) 2009-04-16

Family

ID=37434767

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/077818 WO2008039635A2 (en) 2006-09-27 2007-09-07 Semantic image analysis

Country Status (2)

Country Link
GB (1) GB2442255B (en)
WO (1) WO2008039635A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867192A (en) * 2012-09-04 2013-01-09 北京航空航天大学 Scene semantic shift method based on supervised geodesic propagation
US10055672B2 (en) 2015-03-11 2018-08-21 Microsoft Technology Licensing, Llc Methods and systems for low-energy image classification
US10268886B2 (en) 2015-03-11 2019-04-23 Microsoft Technology Licensing, Llc Context-awareness through biased on-device image classifiers

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2477793A (en) * 2010-02-15 2011-08-17 Sony Corp A method of creating a stereoscopic image in a client device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020122596A1 (en) * 2001-01-02 2002-09-05 Bradshaw David Benedict Hierarchical, probabilistic, localized, semantic image classifier
US20020188602A1 (en) * 2001-05-07 2002-12-12 Eastman Kodak Company Method for associating semantic information with multiple images in an image database environment
US20030123737A1 (en) * 2001-12-27 2003-07-03 Aleksandra Mojsilovic Perceptual method for browsing, searching, querying and visualizing collections of digital images

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2399983A (en) * 2003-03-24 2004-09-29 Canon Kk Picture storage and retrieval system for telecommunication system
FR2878392A1 (en) * 2004-10-25 2006-05-26 Cit Alcatel METHOD OF EXCHANGING INFORMATION BETWEEN A MOBILE TERMINAL AND A SERVER

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020122596A1 (en) * 2001-01-02 2002-09-05 Bradshaw David Benedict Hierarchical, probabilistic, localized, semantic image classifier
US20020188602A1 (en) * 2001-05-07 2002-12-12 Eastman Kodak Company Method for associating semantic information with multiple images in an image database environment
US20030123737A1 (en) * 2001-12-27 2003-07-03 Aleksandra Mojsilovic Perceptual method for browsing, searching, querying and visualizing collections of digital images

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
'Proceedings of the eleventh ACM international conference on Multimedia, Berkeley, CA, USA, Session: Managing images', 14 August 2003 article LIU, HAO ET AL.: 'Automatic browsing of large pictures on mobile devices', pages 148 - 155 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867192A (en) * 2012-09-04 2013-01-09 北京航空航天大学 Scene semantic shift method based on supervised geodesic propagation
US10055672B2 (en) 2015-03-11 2018-08-21 Microsoft Technology Licensing, Llc Methods and systems for low-energy image classification
US10268886B2 (en) 2015-03-11 2019-04-23 Microsoft Technology Licensing, Llc Context-awareness through biased on-device image classifiers

Also Published As

Publication number Publication date
GB2442255B (en) 2009-01-21
GB0619039D0 (en) 2006-11-08
GB2442255A (en) 2008-04-02
WO2008039635A3 (en) 2009-04-16

Similar Documents

Publication Publication Date Title
WO2021088510A1 (en) Video classification method and apparatus, computer, and readable storage medium
US11941883B2 (en) Video classification method, model training method, device, and storage medium
CN104754413B (en) Method and apparatus for identifying television signals and recommending information based on image search
US8781152B2 (en) Identifying visual media content captured by camera-enabled mobile device
US8185596B2 (en) Location-based communication method and system
US11206385B2 (en) Volumetric video-based augmentation with user-generated content
US8515933B2 (en) Video search method, video search system, and method thereof for establishing video database
CN102214222B (en) Presorting and interacting system and method for acquiring scene information through mobile phone
CN103581705A (en) Method and system for recognizing video program
CN112347941B (en) Motion video collection intelligent generation and distribution method based on 5G MEC
CN202998337U (en) Video program identification system
US7359633B2 (en) Adding metadata to pictures
GB2532194A (en) A method and an apparatus for automatic segmentation of an object
CN103020173A (en) Video image information searching method and system for mobile terminal and mobile terminal
WO2008039635A2 (en) Semantic image analysis
CN106250396B (en) Automatic image label generation system and method
CN102055932A (en) Method for searching television program and television set using same
CN104572830A (en) Method and method for processing recommended shooting information
CN111797266B (en) Image processing method and apparatus, storage medium, and electronic device
CN103533353B (en) A kind of near video coding system
CN107194003A (en) Photo frame display methods and device
CN114443900A (en) Video annotation method, client, server and system
CN112199547A (en) Image processing method and device, storage medium and electronic equipment
CN108804596B (en) Network information pushing method and device and server
CN113038254B (en) Video playing method, device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07842020

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07842020

Country of ref document: EP

Kind code of ref document: A2