WO2016095487A1 - Human-computer interaction-based method for parsing high-level semantics of image - Google Patents

Human-computer interaction-based method for parsing high-level semantics of image Download PDF

Info

Publication number
WO2016095487A1
WO2016095487A1 PCT/CN2015/082908 CN2015082908W WO2016095487A1 WO 2016095487 A1 WO2016095487 A1 WO 2016095487A1 CN 2015082908 W CN2015082908 W CN 2015082908W WO 2016095487 A1 WO2016095487 A1 WO 2016095487A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
human
computer interaction
semantics
semantic
Prior art date
Application number
PCT/CN2015/082908
Other languages
French (fr)
Chinese (zh)
Inventor
林格
罗甜
罗笑南
Original Assignee
中山大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中山大学 filed Critical 中山大学
Publication of WO2016095487A1 publication Critical patent/WO2016095487A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene

Abstract

A human-computer interaction-based method for parsing high-level semantics of an image, comprising: scanning a source image on the basis of a portable scanning device; identifying a target in the source image; filtering and parsing a content in the source image, and extracting effective knowledge; and organizing semantics and passing the image content to a user in a voice form. For a visual disorder group and a group with the poor self-learning ability, the weak groups can be helped to experience a different world merely by means of simple scanning work, without using a visual system to describe an image by means of a computer, and this process can also be regarded as a part of their entertainment life. The method is simple in operation and good in portability.

Description

一种基于人机交互的图像高级语义解析的方法A Method of Image Advanced Semantic Analysis Based on Human-Computer Interaction 技术领域Technical field
本发明涉及人机交互技术领域,尤其涉一种基于人机交互的图像高级语义解析的方法。The invention relates to the technical field of human-computer interaction, in particular to a method for high-level semantic analysis of images based on human-computer interaction.
背景技术Background technique
随着互联网的普及,存储技术、多媒体技术和数据库技术快速发展,人们在图像应用上所提出的要求日益增长。物理学界认为,人类特有的三种信息是语言、符号和图像,信息的传播在很大程度上依赖于视觉,至少有80%的外界信息是通过视觉感知获得的,视觉是人和动物最重要的感觉。一幅图像中包含的语义信息相当丰富,但是不是任何群体都有正常的视觉功能或有良好的理解能力,所以如何借助计算机自动解析图像是一项有意义且具有挑战的任务。而最终得到准确的语义解析与表达实现过程中需要借助计算机自动对图像进行标注。With the popularity of the Internet, storage technology, multimedia technology and database technology have developed rapidly, and people's demands on image applications are growing. The physics community believes that the three kinds of information unique to human beings are languages, symbols and images. The dissemination of information depends largely on vision. At least 80% of external information is obtained through visual perception. Visual is the most important thing for humans and animals. a feeling of. The semantic information contained in an image is quite rich, but not all groups have normal visual functions or good understanding, so how to automatically parse images by computer is a meaningful and challenging task. In the end, accurate semantic analysis and expression implementation are required to automatically mark the image by means of a computer.
图像语义的研究主要集中在基于图像各层语义的分类及检索、低层语义特征的提取、中层对象语义的描述等方面。进入20世纪90年代后,基于内容的图像检索(Content-Based Image Retrieval,CBIR)成为一个研究热点,也成为多媒体数据库、数字图书馆等重大研究项目中的关键技术。CBIR从一定程度上解决了基于文本的图像检索的局限性,它通过计算图像视觉特征(如颜色、纹理、形状等)间的相似度来匹配图像,以及运用可视化的查询方式来代替基于文本的图像检索。实现了使用颜色、纹理、形状及区域等图像视觉内容特征的检索和“以图找图”的检索模式的飞跃。基于内容的图像检索融合了图像理解、模式识别信息技术等领域知识,是多种高新技术的合成。一些研究者重点对图像底层视觉特征提取及表示进行研究,并取得了一定的成果。然而,在实际应用中,传统的CBIR系统的检索结果往往难以令人满意,不能满足人们按照语义检索图像的需求,这主要因为用户往往对所需的图像只存在有关图像描述的对象、事件以及表达的情 感等含义上的一些高层概念(如度假、城市、肖像等),用户需要的是图像语义的查询,而不是图像的底层视觉特征。这里提到的图像的含义就是图像的高层语义特征,它包含了人们对图像内容的理解,这种理解要根据人的认知知识来判断,并不能够直接从图像的底层特征获得。这就产生了基于内容的图像检索系统中存在的“语义鸿沟”问题,即人对图像内容的理解与计算机自动提取的图像视觉特征间存在的巨大的差异。进入21世纪,图像检索围绕图像语义(Image Semantic)这一热点展开,其目的是使计算机检索图像的能力达到人的理解水平,实现更为贴近用户理解能力的自然而简洁的查询方式,并提高图像检索的精度。基于语义的图像检索(Semantic-Based Image Retrieval,SBIR)立足于图像的语义特征,研究如何将图像的底层视觉特征映射到图像高层语义,以及如何描述这些高层语义。随着2001年9月“多媒体内容描述接口”MPEG-7标准的推出和逐渐完善,数字化图像将具有统一的视觉特征描述参数和表达复杂语义关系的描述定义语言,这将有利于基于语义的图像检索技术取得突破性进展,并走向实用化和通用化。图像语义自动标注是基于语义的图像检索的关键环节,已经成为图像检索中的研究热点。图像语义的自动标注就是为图像添加关键字来表示图像的语义内容,能够将图像的视觉特征转化为图像的标注字信息,继承了关键字检索的高效率,也克服了手工标注费时费力的缺点。算法的步骤一般有两个方面:首先对标注了同一语义的所有图像底层特征组成的集合进行统计学习,得到该语义类的训练模型;其次对于一幅待标注的图像,同样提取图像底层特征,根据已求得的语义类的训练模型,获得属于该图像语义的概率,因而可以求得在待标注的图像中,所有语义概念或者说文本关键字出现的概率。对图像的语义概率按序排列,选择概率最高的若干个关键词作为此图像的语义标签。图像语义的自动标注作为图像检索领域研究的热点,具有广泛的应用前景,主要包括医学图像分类、数字化图书馆的建立和管理、数码照片的检索和管理、视频检索、卫星遥感图像处理等方面。 The research of image semantics mainly focuses on the classification and retrieval based on the semantics of each layer of the image, the extraction of low-level semantic features, and the description of semantics of middle-level objects. After entering the 1990s, Content-Based Image Retrieval (CBIR) became a research hotspot and became a key technology in major research projects such as multimedia databases and digital libraries. CBIR solves the limitations of text-based image retrieval to a certain extent. It matches images by calculating the similarity between image visual features (such as color, texture, shape, etc.) and uses visual query instead of text-based query. search image. A leap in the retrieval of image visual content features such as color, texture, shape and region, and the retrieval mode of "seeking a picture" is realized. Content-based image retrieval combines domain knowledge, pattern recognition information technology and other domain knowledge, and is a synthesis of various high-tech. Some researchers have focused on the extraction and representation of visual features at the bottom of the image, and have achieved certain results. However, in practical applications, the retrieval results of traditional CBIR systems are often unsatisfactory and cannot satisfy the needs of people to retrieve images according to semantics. This is mainly because users often only have objects, events and descriptions about image descriptions for the desired images. Expressing feelings Some high-level concepts (such as vacations, cities, portraits, etc.) in the sense of meaning, the user needs the query of image semantics, not the underlying visual features of the image. The meaning of the image mentioned here is the high-level semantic feature of the image, which contains people's understanding of the image content. This understanding is judged according to the cognitive knowledge of the person and cannot be directly obtained from the underlying features of the image. This creates the "semantic gap" problem in content-based image retrieval systems, that is, the huge difference between the understanding of image content and the visual features of images automatically extracted by computers. In the 21st century, image retrieval is carried out around the hotspot of Image Semantic. Its purpose is to make the ability of computer to retrieve images to reach the level of human understanding, to achieve a natural and concise query method that is closer to the user's understanding ability, and to improve The accuracy of image retrieval. Semantic-Based Image Retrieval (SBIR) is based on the semantic features of images, and studies how to map the underlying visual features of images to the high-level semantics of images and how to describe these high-level semantics. With the introduction and gradual improvement of the "Multimedia Content Description Interface" MPEG-7 standard in September 2001, digital images will have uniform visual feature description parameters and description definition languages that express complex semantic relationships, which will facilitate semantic-based images. The search technology has made breakthroughs and has become practical and general. Image semantic automatic annotation is a key link of semantic-based image retrieval, and has become a research hotspot in image retrieval. The automatic annotation of image semantics is to add keywords to the image to represent the semantic content of the image. It can transform the visual features of the image into the annotation information of the image, inherit the high efficiency of keyword retrieval, and overcome the shortcomings of manual labeling. . The steps of the algorithm generally have two aspects: firstly, statistically learning the set of all the underlying features of the image marked with the same semantics, and obtaining the training model of the semantic class; secondly, for the image to be labeled, the underlying features of the image are also extracted. According to the trained training model of the semantic class, the probability of belonging to the semantics of the image is obtained, and thus the probability of occurrence of all semantic concepts or text keywords in the image to be labeled can be obtained. The semantic probabilities of the images are arranged in order, and several keywords with the highest probability are selected as the semantic labels of the images. As a hotspot in the field of image retrieval, automatic annotation of image semantics has broad application prospects, including medical image classification, digital library establishment and management, digital photo retrieval and management, video retrieval, and satellite remote sensing image processing.
在图像语义描述中,图像内容描述具有“像素-区域-目标-场景”的层次包含关系,而语义描述的本质就是采用合理的构词方式进行词汇编码(Encoding)和注解(Annotation)的过程。这种过程与图像内容的各层描述密切相关,图像像素和区域信息源于中低层数据驱动,根据结构型数据的相似特性对像素(区域)进行“标记”(Labeling),可为高层语义编码提供有效的低层实体对应关系。目标和场景的中层“分类”(Categorization)特性也具有明显的编码特性,每一类别均可视为简单的语义描述,为多语义分析的拓展提供较好的原型描述。In the image semantic description, the image content description has a hierarchical inclusion relationship of "pixel-area-target-scene", and the essence of semantic description is the process of vocabulary encoding (Encoding) and annotation (Annotation) using reasonable word formation. This process is closely related to the description of each layer of the image content. Image pixel and region information are driven by the middle and low layer data. Labeling (pixels) according to the similarity characteristics of the structure data can be used for high-level semantic coding. Provide a valid low-level entity correspondence. The middle-level "Categorization" feature of the target and scene also has obvious coding characteristics, and each category can be regarded as a simple semantic description, providing a better prototype description for the expansion of multi-semantic analysis.
我们描述一幅图像的不同属性,例如这些底层特征,颜色、纹理、边缘或形状等,已经成为了计算机视觉领域中重要课题,识别出一幅图像中的这些信息也许在大多数实践应用中提供了有用的信息。但是,这绝对不是人类同这个视觉世界进行交流的层次,也不是对视力障碍群体所提供的描述方式。我们需要做的不仅是一幅场景中识别出许多单独的目标,还要分辨出不同的环境并感知进行的复杂的活动和社交关系。这是图像理解的高层语义识别,图1为图像理解过程的示意图。We describe the different properties of an image, such as these underlying features, colors, textures, edges or shapes, etc., which have become an important topic in the field of computer vision. Identifying this information in an image may be provided in most practical applications. Useful information. However, this is definitely not the level of human interaction with this visual world, nor is it a description of the visually impaired group. What we need to do is not only to identify many individual goals in a scene, but also to identify different environments and perceive complex activities and social relationships. This is a high-level semantic recognition of image understanding, and Figure 1 is a schematic diagram of the image understanding process.
人机交互(human-computer interaction,HCI)是一门研究系统与用户之间的交互关系的学问。人与计算机系统相互沟通的平台,是人机对话的接口。以人为中心、自然、高效的交互是发展新一代人机交互技术的主要目标。人机交互技术的发展经历了3个阶段,其中,第3代人机交互界面——多模态用户界面,在多媒体界面的基础上,采用语音识别、视线跟踪、手势输入等新技术,使用户可用多种形态或多个通道以自然、并行和协作的方式进行交互,系统通过整合多通道精确和非精确信息,快速捕捉用户的意向,有效地提高人机交互的自然性和效率。Human-computer interaction (HCI) is a study of the interaction between systems and users. The platform for communication between people and computer systems is the interface for man-machine dialogue. People-centered, natural and efficient interaction is the main goal of developing a new generation of human-computer interaction technology. The development of human-computer interaction technology has gone through three stages. Among them, the third-generation human-computer interaction interface-multi-modal user interface, based on the multimedia interface, adopts new technologies such as speech recognition, line-of-sight tracking and gesture input. Users can interact in a natural, parallel and collaborative manner in multiple forms or channels. The system quickly captures the user's intention by integrating multi-channel accurate and inexact information, effectively improving the naturalness and efficiency of human-computer interaction.
根据图像标注方法的发展进程,目前文献中用于解决“语义鸿沟”问题的方法按其侧重点大致可分为三类:基于机器学习的方法;基于相关反馈的方法;基于本体的方法。According to the development process of image annotation methods, the methods used in the literature to solve the "semantic gap" can be roughly divided into three categories according to their focus: machine learning-based methods; related feedback-based methods; ontology-based methods.
(1)基于机器学习的方法 (1) Machine learning based method
目前采用机器学习和统计模型学习进行图像自动语义标注大体上可分为有监督语义标注和无监督语义标注两大类。有监督的分类方法首先通过学习、训练事先给定的经过语义标注的一组样本图像,获得图像语义分类器,然后利用分类器将未标注或未归类的图像归并到某一语义类。最常用的有监督学习技术有贝叶斯分类器和支持向量机(Support Vector Machine,SVM)技术。无监督语义标注根据图像内容将库中图像(或图像区域)聚类到某些有意义的集合,使得位于同一聚类内的图像的相似度尽可能大,而位于不同聚类的图像的相似度尽可能小。然后利用统计方法为每个聚类加一个类标签,以获得各个图像聚类中的语义信息。简单来说它的目标在于对输入数据进行合理有效的组织或聚类。该方法对于手工标注的训练集要求较低,训练数据和语义概念具有可扩展性。但是严格地说,单纯的图像聚类并不能为一个新的图像获取显式的语义标签,需要与其他技术结合使用来进行图像的自动语义标注,充分发挥其效率,并达到较高的检索精度。At present, automatic semantic annotation of images using machine learning and statistical model learning can be divided into two categories: supervised semantic annotation and unsupervised semantic annotation. The supervised classification method first obtains an image semantic classifier by learning and training a set of semantically annotated set of sample images, and then using a classifier to merge unlabeled or uncategorized images into a semantic class. The most commonly used supervised learning techniques are Bayesian classifiers and Support Vector Machine (SVM) techniques. Unsupervised semantic annotation clusters images (or image regions) in the library into meaningful collections based on image content, such that the similarity of images within the same cluster is as large as possible, while the images located in different clusters are similar. The degree is as small as possible. Then, a statistical method is used to add a class label to each cluster to obtain semantic information in each image cluster. Simply put, its goal is to organize and cluster the input data reasonably and effectively. This method requires less training set for manual labeling, and the training data and semantic concepts are scalable. Strictly speaking, pure image clustering can not obtain explicit semantic labels for a new image. It needs to be combined with other technologies to perform automatic semantic annotation of images, give full play to its efficiency, and achieve high retrieval accuracy. .
(2)基于相关反馈的方法(2) Method based on relevant feedback
相关反馈(Relevance Feedback,RF)的基本思想是指在检索过程中,用户根据先前检索结果借助权重调整已有的查询要求以给检索系统提供更多更直接的信息,从而使系统更好地满足用户的要求。简单的说,反馈的过程是用户和检索系统之间的一个交互过程,系统根据用户对当前检索结果的评价来调整用户的初始查询以及匹配模型的参数,从而达到对检索结果的优化。相关反馈在本质上还是一个学习过程,它的方法具有与人类学习方法类似的思路,是一种很有价值的研究语义映射的方法,在视觉特征层次和语义层次都能获得较好的检索效果。其具有样本数少、实时性要求强等特点,但是有可能产生检索时间过长,结果振荡等问题。The basic idea of Relevant Feedback (RF) is that during the retrieval process, the user adjusts the existing query requirements by weight according to the previous retrieval results to provide more and more direct information to the retrieval system, so that the system can better satisfy the system. User's request. Simply put, the feedback process is an interaction process between the user and the retrieval system. The system adjusts the initial query of the user and the parameters of the matching model according to the user's evaluation of the current retrieval result, thereby optimizing the retrieval result. Relevant feedback is essentially a learning process. Its method has similar ideas to human learning methods. It is a valuable method for studying semantic mapping. It can obtain better retrieval effects at both visual feature level and semantic level. . It has the characteristics of small number of samples and strong real-time requirements, but it may cause problems such as long retrieval time and oscillation of results.
(3)基于对象本体的方法(3) Method based on object ontology
本体(Ontology)在文本信息检索中有广泛的应用,但在图像检索领域起步较晚。本体指的是特定领域公认的关于该领域的对象(实际对象和逻辑对象)及其关系的概念化表述。它指出图像中不同的对 象可以用简单描述词的集合来定义,如“天空”定义为“在上方的、均匀的、蓝色的”区域。通过将颜色、位置、大小和形状等底层特征离散化后与映射到这些简单语义上,最终可以得到对象语义。对于类型比较单一的图像库,基于本体的方法能得到较好的效果。而对大型图像数据库而言,这一方法效果不佳。下图给出了一幅当前通过计算机自动实现标注的示意图,如图2中所示。Ontology has a wide range of applications in text information retrieval, but it started late in the field of image retrieval. Ontology refers to a well-known conceptual representation of objects in the field (actual and logical) and their relationships. It points to different pairs in the image The image can be defined by a collection of simple descriptors, such as "sky" defined as the "upper, uniform, blue" area. Object semantics can be obtained by discretizing and mapping the underlying features such as color, position, size, and shape to these simple semantics. For a single type of image library, the ontology-based method can get better results. For large image databases, this method does not work well. The figure below shows a schematic diagram of the current automatic implementation of the annotation by computer, as shown in Figure 2.
目前,在计算机视觉领域,大多数研究者将研究工作集中在目标识别和目标分类上,关于场景环境的分类也有许多模型被提出,但是一幅静态的图像中对事件的识别这类研究很少。而且大多数基于内容检索图像、对图像进行标注都是单一进行的,没有连贯性的将这些工作结合起来。则将一幅图像用计算机如何描述并用语言组织反馈给用户有很好的研究价值。At present, in the field of computer vision, most researchers focus their research on target recognition and target classification. There are also many models for the classification of scene environments, but there is very little research on the recognition of events in a static image. . Moreover, most of the images are retrieved based on the content, and the images are marked as a single one, and the work is combined without coherence. It is of great research value to describe an image in a computer and to use the language to organize feedback to the user.
发明内容Summary of the invention
本发明的目的在于克服现有技术的不足,本发明所提出的基于人机交互的图像高级语义解析的方法,能够帮助这样的弱势群体体会另一个不同的世界,也可以作为娱乐生活的一部分。The object of the present invention is to overcome the deficiencies of the prior art. The method for high-level semantic analysis of images based on human-computer interaction proposed by the present invention can help such a disadvantaged group to experience another different world and also be a part of entertainment life.
为了解决上述问题,本发明提出了一种基于人机交互的图像高级语义解析的方法,包括:In order to solve the above problems, the present invention provides a method for high-level semantic analysis of images based on human-computer interaction, including:
基于便携式扫描设备扫描源图像;Scanning the source image based on the portable scanning device;
对源图像中的目标进行识别;Identifying targets in the source image;
将源图像中的内容进行过滤和解析,并提炼出有效的知识;Filter and parse the content in the source image and extract effective knowledge;
组织语义将图像内容用语音形式传递给用户。Organizational semantics conveys image content to users in a voice form.
所述基于便携式扫描设备扫描源图像包括:The scanning of the source image based on the portable scanning device includes:
基于ARM的便携式扫描设备扫描源图像。ARM-based portable scanning device scans the source image.
所述对源图像中的目标进行识别包括:The identifying the target in the source image includes:
对图像的特征提取采用SIFT局部特征提取,同时结合HOG特征和GIST全局特征,能够更全面的获取图像信息。The feature extraction of images is performed by SIFT local feature extraction, and combined with HOG features and GIST global features, image information can be acquired more comprehensively.
所述将源图像中的内容进行过滤和解析,并提炼出有效的知识包 括:The filtering and parsing the content in the source image, and extracting a valid knowledge package include:
采取词袋模型图像分类方法提炼出有效的知识。Take the word bag model image classification method to extract effective knowledge.
所述词袋模型图像分类方法包括:The word bag model image classification method includes:
通过图像分割或随机采样等方式检测特征点;Detecting feature points by image segmentation or random sampling;
对图像提取局部特征,并生成描述符;Extracting local features from the image and generating descriptors;
将关于这些特征点的描述符利用聚类的方法,其中每一个聚类中心为一个视觉单词;The descriptors for these feature points utilize a clustering method in which each cluster center is a visual word;
将每个视觉单词出现的频率统计成视觉单词直方图。The frequency at which each visual word appears is counted as a visual word histogram.
所述组织语义将图像内容用语音形式传递给用户包括:The organizational semantics conveying the image content to the user in a voice form includes:
采用潜在语义提取技术将图像内容用语音形式传递给用户。The image content is delivered to the user in a voice form using a latent semantic extraction technique.
实施本发明实施例,本发明主要是针对视力障碍群体和自学能力较弱的群体,只需凭借简单的扫描工作,无需借助视觉系统通过计算机描述图像,能够帮助这样的弱势群体体会另一个不同的世界,也可以作为娱乐生活的一部分。操作简单,移植性良好。By implementing the embodiments of the present invention, the present invention is mainly directed to a group with visual impairment and a group with weak self-learning ability. It is only necessary to use a simple scanning work to describe an image through a computer without using a visual system, and can help such a vulnerable group to experience another different one. The world can also be part of the entertainment life. The operation is simple and the portability is good.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.
图1是现有技术中的图像处理过程流程图;1 is a flow chart of an image processing process in the prior art;
图2是现有技术中的自动图像标注示例图;2 is a diagram showing an example of automatic image labeling in the prior art;
图3是本发明实施例中的基于人机交互的图像高级语义解析的方法流程图;3 is a flowchart of a method for high-level semantic analysis of images based on human-computer interaction in an embodiment of the present invention;
图4是本发明实施例中的描设备结构原理图。4 is a schematic diagram showing the structure of a drawing device in an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方 案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical party in the embodiment of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. The present invention is clearly and completely described, and it is obvious that the described embodiments are only a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
本发明针对任意一幅图像(彩色图像或黑白图),借助一个手持便携式的扫描设备进行整体扫描,使得源图像信息录入该系统,系统对图像中的目标进行识别,并将其内容进行过滤和解析,提炼出有效知识,组织语义将图像内容用语音形式传递给用户。例如:一幅水上划船的图像,通过系统识别出一个人,一条船,一片湖,一根鱼竿,天空,树木等目标,系统进行目标分析及图像语义的组织,最后将用语音设备输出信息:人在湖上钓鱼。该系统发明主要目的在于帮助有视力障碍的患者(弱视,盲人等)或不识字的老人以及学前儿童在无人力协助情况下有效的识别图像内容,让该群体去了解不能接触的外界。这种基于人机交互的高级语义解析系统具有很好的兼容性及移植性,操作便捷。系统的工作流程图如图3中所示。The present invention is directed to any image (color image or black and white image) by means of a hand-held portable scanning device for overall scanning, so that source image information is entered into the system, the system recognizes the target in the image, and filters the content thereof. Analyze, extract effective knowledge, and organize semantics to deliver image content to users in a voice form. For example: an image of a water boating, through the system to identify a person, a boat, a lake, a fishing rod, sky, trees and other targets, the system for target analysis and image semantic organization, and finally will use voice equipment to output information : People are fishing on the lake. The main purpose of the system invention is to help visually impaired patients (amblyopia, blind people, etc.) or illiterate elderly people and preschool children to effectively identify image content without human assistance, so that the group can understand the outside world that cannot be contacted. This advanced semantic analysis system based on human-computer interaction has good compatibility and portability, and is easy to operate. The working flow chart of the system is shown in Figure 3.
(1)基于ARM的便携式扫描设备(硬件)(1) ARM-based portable scanning device (hardware)
硬件层主要由系统核心部分、扫描部分和人机接口部分组成。另外,为了扩充其功能及适应多种应用场合,预留了一些扩展接口。微处理器选用目前常见的三星S3C2410X芯片,芯片内核是带16KB数据Cache和16KB指令Cache的ARM9TDMI核,工作频率203MHz。存贮器采用64MB的NAND Flash和64MB的SDRAM。扫描部分采用SDIO掌上型扫描卡。基于微线性CMOS映像技术的此款SDIO ISC扫描卡,可扫描所有主流的线性条码。人机接口部分用三星公司的LT V350QV-F05型3.5寸TFT触摸屏,配以触摸板,可同时实现显示及键盘功能,有利于减小设备的体积。以太网口用于数据的传输和下载。预留U SB、RS232等接口以方便该设备的功能扩充。The hardware layer is mainly composed of the core part of the system, the scanning part and the human-machine interface part. In addition, in order to expand its functions and adapt to a variety of applications, some extension interfaces have been reserved. The microprocessor selects the current Samsung S3C2410X chip. The chip core is ARM9TDMI core with 16KB data cache and 16KB instruction cache. The working frequency is 203MHz. The memory uses 64MB of NAND Flash and 64MB of SDRAM. The scanning part uses the SDIO handheld scanning card. Based on micro-linear CMOS image technology, this SDIO ISC scan card scans all mainstream linear barcodes. The human-machine interface part uses Samsung's LT V350QV-F05 type 3.5-inch TFT touch screen with a touchpad, which can simultaneously display and keyboard functions, which is beneficial to reduce the size of the device. The Ethernet port is used for data transmission and download. U SB, RS232 and other interfaces are reserved to facilitate the expansion of the function of the device.
(2)特征提取技术(2) Feature extraction technology
由于SIFT特征对光照、尺度等具有不变性,对图像的特征提取采用SIFT局部特征提取,同时结合HOG特征和GIST全局特征,能 够更全面的获取图像信息。Since SIFT features are invariant to illumination, scale, etc., feature extraction of images is performed using SIFT local feature extraction, combined with HOG features and GIST global features. Get more comprehensive image information.
(3)BOW模型描述(3) Description of BOW model
随着局部特征在计算机视觉领域的广泛应用,基于局部特征的图像分类识别方法也得到了更为广泛的关注。由于局部特征在提取吋,每幅图像检测得到的特征点数目不统一,使得在机器训练时无法入手,并且这些方法都是基于特征点来进行匹配,其计算量大的缺点凸显而无法满足日益增大的图像数据库的需求。为了克服这些问题,美国斯坦福大学的Ll-feifei等学者首先将词袋模型作为一种特征表示应用到计算机图像处理领域。词袋模型图像分类方法不仅能很好的解决图像局部特征不统一的问题,而且表示方法也比较简单,训练分类快速,得到了极大的发展。受文本检索方法的启示,词袋模型由于其高性能受到国内外的学者越来越多的关注。词袋模型已经被广泛地应用于图像分类和检索中:With the wide application of local features in the field of computer vision, image classification and recognition methods based on local features have also received more extensive attention. Since the local features are extracted, the number of feature points detected by each image is not uniform, which makes it impossible to start training in machine training, and these methods are based on feature points for matching, and the disadvantages of large calculation amount are prominent and cannot be satisfied. Increased image database requirements. In order to overcome these problems, scholars such as Ll-feifei of Stanford University in the United States first applied the word bag model as a feature representation to the field of computer image processing. The word bag model image classification method can not only solve the problem that the local features of the image are not uniform, but also the representation method is simple, the training classification is fast, and it has been greatly developed. Inspired by the text retrieval method, the word bag model has attracted more and more attention from scholars at home and abroad due to its high performance. The word bag model has been widely used in image classification and retrieval:
词袋模型生成主要步骤为:The main steps in the generation of the word bag model are:
①通过图像分割或随机采样等方式检测特征点。1 Detection of feature points by image segmentation or random sampling.
②对图像提取局部特征(SIFT),并生成描述符。2 Extract local features (SIFT) from the image and generate descriptors.
③将关于这些特征点的描述符利用聚类的方法(通常采用K-means聚类)形成视觉词典(Visual Vocabulary),其中每一个聚类中心为一个视觉单词。3 Descriptors on these feature points are clustered (usually using K-means clustering) to form a visual dictionary (Visual Vocabulary), where each cluster center is a visual word.
④将每个视觉单词出现的频率统计成视觉单词直方图。4 Count the frequency of occurrence of each visual word into a visual word histogram.
(4)潜在语义提取技术(4) Latent Semantic Extraction Technology
自然语言处理(NLP)的很多应用都需要探究隐藏在字、词背后的涵义,简单的字面匹配绝难奏效,关键在于同义词和一词多义的把握.潜在语义分析(LSA)为此提供了部分解决问题的方法,即利用奇异值分解(SVD)将高维度的词汇-文档共现矩阵映射到低维度的潜在语义空间,使得表面毫不相关的词体现出深层次的联系。概率潜在语义分析(PLSA)作为潜在语义分析(LSA)的变种,拥有更坚实的数学基础及易于利用的数据生成模型,且已被证实能够为信息提取提供更好的词汇匹配。给定一个文档集合D={d1,d2,…,dM}和一个词集合 W={w1,w2,…,wN}以及一个文档和词的共现频率矩阵N≡(nij),n(di,wj)表示词wj在文档dj中出现的频率。使用Z={z1,z2,…,zK}表示潜在语义的集合,K为人工指定的一个常数。概率潜在语义分析假设“文档—词”对之间是条件独立的,并且潜在语义在文档或词上分布也是条件独立的.在上面假设的前提下,可使用下列公式来表示“文档—词”的条件概率:Many applications of natural language processing (NLP) need to explore the meaning behind the words and words. The simple literal matching is very difficult. The key lies in the synonym and the polysemy of the word. The latent semantic analysis (LSA) provides this. Part of the problem-solving method is to use the singular value decomposition (SVD) to map the high-dimensional vocabulary-document co-occurrence matrix to the low-dimensional potential semantic space, so that the surface unrelated words reflect deep connections. Probabilistic Latent Semantic Analysis (PLSA), as a variant of Latent Semantic Analysis (LSA), has a more solid mathematical foundation and an easy-to-use data generation model, and has been proven to provide better vocabulary matching for information extraction. Given a collection of documents D={d1,d2,...,dM} and a collection of words W = {w1, w2, ..., wN} and a co-occurrence frequency matrix N ≡ (nij) of a document and a word, n (di, wj) represents the frequency at which the word wj appears in the document dj. Use Z={z1,z2,...,zK} to represent a set of latent semantics, and K is a manually specified constant. Probabilistic latent semantic analysis assumes that the “document-word” pairs are conditionally independent, and the distribution of latent semantics on documents or words is conditionally independent. Under the assumptions above, the following formula can be used to represent “documents—words”. Conditional probability:
Figure PCTCN2015082908-appb-000001
Figure PCTCN2015082908-appb-000001
式(1)中的
Figure PCTCN2015082908-appb-000002
为潜在语义在词上的分布概率,也解释为词对潜在语义的贡献度。
Figure PCTCN2015082908-appb-000003
表示文档中的潜在语义分布概率,也解释为文档中具有相应潜在语义的概率。概率潜在语义分析根据极大似然估计原则,通过求取如下对数似然函数的极大值来计算PLSA的参数:
In equation (1)
Figure PCTCN2015082908-appb-000002
The probability of distribution of potential semantics on words is also interpreted as the contribution of words to latent semantics.
Figure PCTCN2015082908-appb-000003
Represents the probability of a potential semantic distribution in a document, also as the probability of having the corresponding underlying semantics in the document. Probabilistic Latent Semantic Analysis According to the principle of maximum likelihood estimation, the parameters of the PLSA are calculated by taking the maximum value of the log-likelihood function as follows:
Figure PCTCN2015082908-appb-000004
Figure PCTCN2015082908-appb-000004
在有隐含变量的模型中,极大似然估计的标准过程是期望最大(EM)算法,EM算法替于两个步骤直至收敛。In models with implicit variables, the standard process for maximum likelihood estimation is the Expectation Maximum (EM) algorithm, which replaces two steps until convergence.
在E步,利用当前估计的参数值来计算隐含变量的后验概率。In step E, the current estimated parameter values are used to calculate the posterior probability of the implicit variable.
Figure PCTCN2015082908-appb-000005
Figure PCTCN2015082908-appb-000005
Figure PCTCN2015082908-appb-000006
Figure PCTCN2015082908-appb-000006
Figure PCTCN2015082908-appb-000007
Figure PCTCN2015082908-appb-000007
在M步,利用上一步的期望值来最大化当前的参数估计。In step M, the expected value of the previous step is used to maximize the current parameter estimate.
相对于潜在语义分析中的SVD分解,EM算法具有线性的收敛速度,且简单实现,能使似然函数达到局部最优。Compared with the SVD decomposition in latent semantic analysis, the EM algorithm has a linear convergence speed and is simple to implement, which can make the likelihood function reach local optimum.
在构建了图像区域BOW描述后,我们就可以利用PLSA来进行 区域潜在语义的发现.我们将图像中的每个区域看作一个单独的文档,用d来表示,而视觉词汇就看作文档中的词汇,用w来表示,图像的区域潜在语义用z来表示,n(di,wj)表示视觉词汇wj在区域dj中出现的频率。After constructing the image area BOW description, we can use PLSA to Discover the potential semantics of the region. We treat each region in the image as a separate document, represented by d, and the visual vocabulary is treated as a vocabulary in the document, represented by w, and the potential semantics of the region of the image is z. It is indicated that n(di, wj) represents the frequency at which the visual vocabulary wj appears in the region dj.
基于PLSA方法的区域潜在语义提取可分成两个步骤:The regional potential semantic extraction based on the PLSA method can be divided into two steps:
学习阶段:对由训练图像生成的所有图像区域集合,应用PLSA来进行训练,通过EM算法迭代公式(3)(4)(5)(6)直到收敛,从而得到
Figure PCTCN2015082908-appb-000008
这里
Figure PCTCN2015082908-appb-000009
实际上就是区域潜在语义模型,它描述了在图像区域中潜在语义出现时视觉词汇的分布规律。
Learning stage: Applying PLSA to all sets of image regions generated by the training image, and training it by EM algorithm to iterate formula (3)(4)(5)(6) until convergence
Figure PCTCN2015082908-appb-000008
Here
Figure PCTCN2015082908-appb-000009
In fact, it is a regional latent semantic model that describes the distribution of visual vocabulary when latent semantics appear in the image region.
推断阶段:对于测试图像的所有区域,保持
Figure PCTCN2015082908-appb-000010
不变,同样用EM算法迭代公式(3)(5)(6)直至收敛,从而得到每个分块区域的
Figure PCTCN2015082908-appb-000011
表示了分块区域具有潜在语义z的概率。
Inference phase: for all areas of the test image, keep
Figure PCTCN2015082908-appb-000010
Invariant, the EM algorithm is also used to iterate the formula (3)(5)(6) until convergence, so that each block region is obtained.
Figure PCTCN2015082908-appb-000011
Indicates the probability that a tiled region has a latent semantic z.
假设我们定义区域潜在语义的个数为T,L层空间金字塔分块得到的区域数为N=(4L-1)/3。对于每一个分块区域di,我们可以得到一个T维的特征向量
Figure PCTCN2015082908-appb-000012
考虑到区域的潜在语义在空间上的分布也有助于图像场景分类,因此,我们最终将图像所有分块区域的T维特征向量连接为一个向量
Figure PCTCN2015082908-appb-000013
这就是我们定义的图像区域潜在语义特征。获得图像区域潜在语义特征后,我们就可以构建SVM分类器模型来对图像进行场景分类。
Suppose we define the number of potential semantics of the region as T, and the number of regions obtained by the L-space pyramid is N=(4L-1)/3. For each block region di, we can get a T-dimensional feature vector
Figure PCTCN2015082908-appb-000012
Considering that the spatial distribution of the potential semantics of the region also contributes to the classification of image scenes, we finally connect the T-dimensional feature vectors of all the block regions of the image into a vector.
Figure PCTCN2015082908-appb-000013
This is the latent semantic feature of the image region we define. After obtaining the latent semantic features of the image region, we can construct the SVM classifier model to classify the image.
本发明主要是针对视力障碍群体和自学能力较弱的群体,只需凭借简单的扫描工作,无需借助视觉系统通过计算机描述图像,能够帮助这样的弱势群体体会另一个不同的世界,也可以作为娱乐生活的一部分。操作简单,移植性良好。The invention is mainly directed to a group with visual impairment and a group with weak self-learning ability. It is only necessary to use a simple scanning work to describe images through a computer without using a visual system, and can help such a disadvantaged group to experience another different world and also be used as entertainment. Part of life. The operation is simple and the portability is good.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。 A person skilled in the art may understand that all or part of the various steps of the foregoing embodiments may be performed by a program to instruct related hardware. The program may be stored in a computer readable storage medium, and the storage medium may include: Read Only Memory (ROM), Random Access Memory (RAM), disk or optical disk.
另外,以上对本发明实施例所提供的基于人机交互的图像高级语义解析的方法进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。 In addition, the method for performing high-level semantic analysis of images based on human-computer interaction provided by the embodiments of the present invention is described in detail above. The principles and implementation manners of the present invention are described in the specific examples, and the description of the above embodiments is only The method for understanding the present invention and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in specific embodiments and application scopes. The description should not be construed as limiting the invention.

Claims (6)

  1. 一种基于人机交互的图像高级语义解析的方法,其特征在于,包括:A method for image semantic analysis based on human-computer interaction, characterized in that it comprises:
    基于便携式扫描设备扫描源图像;Scanning the source image based on the portable scanning device;
    对源图像中的目标进行识别;Identifying targets in the source image;
    将源图像中的内容进行过滤和解析,并提炼出有效的知识;Filter and parse the content in the source image and extract effective knowledge;
    组织语义将图像内容用语音形式传递给用户。Organizational semantics conveys image content to users in a voice form.
  2. 如权利要求1所述的基于人机交互的图像高级语义解析的方法,其特征在于,所述基于便携式扫描设备扫描源图像包括:The method for performing high-level semantic analysis of image based on human-computer interaction according to claim 1, wherein the scanning the source image based on the portable scanning device comprises:
    基于ARM的便携式扫描设备扫描源图像。ARM-based portable scanning device scans the source image.
  3. 如权利要求2所述的基于人机交互的图像高级语义解析的方法,其特征在于,所述对源图像中的目标进行识别包括:The method for performing high-level semantic analysis of image based on human-computer interaction according to claim 2, wherein the identifying the target in the source image comprises:
    对图像的特征提取采用SIFT局部特征提取,同时结合HOG特征和GIST全局特征,能够更全面的获取图像信息。The feature extraction of images is performed by SIFT local feature extraction, and combined with HOG features and GIST global features, image information can be acquired more comprehensively.
  4. 如权利要求3所述的基于人机交互的图像高级语义解析的方法,其特征在于,所述将源图像中的内容进行过滤和解析,并提炼出有效的知识包括:The method for human-computer interaction-based image high-level semantic analysis according to claim 3, wherein the filtering and parsing the content in the source image and extracting effective knowledge comprises:
    采取词袋模型图像分类方法提炼出有效的知识。Take the word bag model image classification method to extract effective knowledge.
  5. 如权利要求4所述的基于人机交互的图像高级语义解析的方法,其特征在于,所述词袋模型图像分类方法包括:The method for classifying image semantics based on human-computer interaction according to claim 4, wherein the word bag model image classification method comprises:
    通过图像分割或随机采样等方式检测特征点;Detecting feature points by image segmentation or random sampling;
    对图像提取局部特征,并生成描述符;Extracting local features from the image and generating descriptors;
    将关于这些特征点的描述符利用聚类的方法,其中每一个聚类中心为一个视觉单词; The descriptors for these feature points utilize a clustering method in which each cluster center is a visual word;
    将每个视觉单词出现的频率统计成视觉单词直方图。The frequency at which each visual word appears is counted as a visual word histogram.
  6. 如权利要求5所述的基于人机交互的图像高级语义解析的方法,其特征在于,所述组织语义将图像内容用语音形式传递给用户包括:The method for human-computer interaction-based image advanced semantic parsing according to claim 5, wherein the organizing semantics to deliver the image content to the user in a voice form comprises:
    采用潜在语义提取技术将图像内容用语音形式传递给用户。 The image content is delivered to the user in a voice form using a latent semantic extraction technique.
PCT/CN2015/082908 2014-12-17 2015-06-30 Human-computer interaction-based method for parsing high-level semantics of image WO2016095487A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410790684.X 2014-12-17
CN201410790684.XA CN104484666A (en) 2014-12-17 2014-12-17 Advanced image semantic parsing method based on human-computer interaction

Publications (1)

Publication Number Publication Date
WO2016095487A1 true WO2016095487A1 (en) 2016-06-23

Family

ID=52759207

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/082908 WO2016095487A1 (en) 2014-12-17 2015-06-30 Human-computer interaction-based method for parsing high-level semantics of image

Country Status (2)

Country Link
CN (1) CN104484666A (en)
WO (1) WO2016095487A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109191379A (en) * 2018-07-26 2019-01-11 北京纵目安驰智能科技有限公司 A kind of semanteme marking method of panoramic mosaic, system, terminal and storage medium
CN109857884A (en) * 2018-12-20 2019-06-07 郑州轻工业学院 A kind of automated graphics semantic description method
CN109902714A (en) * 2019-01-18 2019-06-18 重庆邮电大学 A kind of multi-modality medical image search method based on more figure regularization depth Hash
CN110119701A (en) * 2019-04-30 2019-08-13 东莞恒创智能科技有限公司 The coal mine fully-mechanized mining working unsafe acts recognition methods of view-based access control model relationship detection
CN112001380A (en) * 2020-07-13 2020-11-27 上海翎腾智能科技有限公司 Method and system for recognizing Chinese meaning phrases based on artificial intelligence realistic scene
CN112650852A (en) * 2021-01-06 2021-04-13 广东泰迪智能科技股份有限公司 Event merging method based on named entity and AP clustering
CN113986431A (en) * 2021-10-27 2022-01-28 武汉戴维南科技有限公司 Visual debugging method and system for automatic robot production line

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104484666A (en) * 2014-12-17 2015-04-01 中山大学 Advanced image semantic parsing method based on human-computer interaction
TWI553494B (en) * 2015-11-04 2016-10-11 創意引晴股份有限公司 Multi-modal fusion based Intelligent fault-tolerant video content recognition system and recognition method
CN105426447B (en) * 2015-11-09 2019-02-01 北京工业大学 A kind of related feedback method based on the learning machine that transfinites
US11514244B2 (en) 2015-11-11 2022-11-29 Adobe Inc. Structured knowledge modeling and extraction from images
CN105740402B (en) 2016-01-28 2018-01-02 百度在线网络技术(北京)有限公司 The acquisition methods and device of the semantic label of digital picture
CN106777125B (en) * 2016-12-16 2020-10-23 广东顺德中山大学卡内基梅隆大学国际联合研究院 Image description generation method based on neural network and image attention point
CN109040693B (en) * 2018-08-31 2020-11-10 上海赛特斯信息科技股份有限公司 Intelligent alarm system and method
CN109275027A (en) * 2018-09-26 2019-01-25 Tcl海外电子(惠州)有限公司 Speech output method, electronic playback devices and the storage medium of video
CN110046271B (en) * 2019-03-22 2021-06-22 中国科学院西安光学精密机械研究所 Remote sensing image description method based on voice guidance
CN110399519B (en) * 2019-07-29 2021-06-18 吉林大学 Extensible multi-semantic image correlation feedback method
JP7467999B2 (en) * 2020-03-10 2024-04-16 セイコーエプソン株式会社 Scan system, program, and method for generating scan data for a scan system
CN115187996B (en) * 2022-09-09 2023-01-06 中电科新型智慧城市研究院有限公司 Semantic recognition method and device, terminal equipment and storage medium
CN116758591B (en) * 2023-08-18 2023-11-21 厦门瑞为信息技术有限公司 Station special passenger recognition and interaction system and method based on image semantic recognition

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020018594A1 (en) * 2000-07-06 2002-02-14 Mitsubishi Electric Research Laboratories, Inc. Method and system for high-level structure analysis and event detection in domain specific videos
CN103077625A (en) * 2013-01-30 2013-05-01 中国盲文出版社 Blind electronic reader and blind assistance reading method
CN103745200A (en) * 2014-01-02 2014-04-23 哈尔滨工程大学 Facial image identification method based on word bag model
CN104142995A (en) * 2014-07-30 2014-11-12 中国科学院自动化研究所 Social event recognition method based on visual attributes
CN104484666A (en) * 2014-12-17 2015-04-01 中山大学 Advanced image semantic parsing method based on human-computer interaction

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001125897A (en) * 1999-10-28 2001-05-11 Sony Corp Device and method for learning language
CN102054178B (en) * 2011-01-20 2016-08-17 北京联合大学 A kind of image of Chinese Painting recognition methods based on local semantic concept
CN102831482A (en) * 2012-08-01 2012-12-19 浙江兴旺宝明通网络有限公司 Heuristic inquiry system based on intelligent question answering aimed at pump valve industry
CN203433526U (en) * 2013-01-22 2014-02-12 华东师范大学 Two-dimensional code electronic reader and application system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020018594A1 (en) * 2000-07-06 2002-02-14 Mitsubishi Electric Research Laboratories, Inc. Method and system for high-level structure analysis and event detection in domain specific videos
CN103077625A (en) * 2013-01-30 2013-05-01 中国盲文出版社 Blind electronic reader and blind assistance reading method
CN103745200A (en) * 2014-01-02 2014-04-23 哈尔滨工程大学 Facial image identification method based on word bag model
CN104142995A (en) * 2014-07-30 2014-11-12 中国科学院自动化研究所 Social event recognition method based on visual attributes
CN104484666A (en) * 2014-12-17 2015-04-01 中山大学 Advanced image semantic parsing method based on human-computer interaction

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109191379A (en) * 2018-07-26 2019-01-11 北京纵目安驰智能科技有限公司 A kind of semanteme marking method of panoramic mosaic, system, terminal and storage medium
CN109857884A (en) * 2018-12-20 2019-06-07 郑州轻工业学院 A kind of automated graphics semantic description method
CN109857884B (en) * 2018-12-20 2023-02-07 郑州轻工业学院 Automatic image semantic description method
CN109902714A (en) * 2019-01-18 2019-06-18 重庆邮电大学 A kind of multi-modality medical image search method based on more figure regularization depth Hash
CN110119701A (en) * 2019-04-30 2019-08-13 东莞恒创智能科技有限公司 The coal mine fully-mechanized mining working unsafe acts recognition methods of view-based access control model relationship detection
CN110119701B (en) * 2019-04-30 2023-04-07 东莞恒创智能科技有限公司 Visual relationship detection-based coal mine fully mechanized coal mining face unsafe behavior identification method
CN112001380A (en) * 2020-07-13 2020-11-27 上海翎腾智能科技有限公司 Method and system for recognizing Chinese meaning phrases based on artificial intelligence realistic scene
CN112001380B (en) * 2020-07-13 2024-03-26 上海翎腾智能科技有限公司 Recognition method and system for Chinese meaning phrase based on artificial intelligence reality scene
CN112650852A (en) * 2021-01-06 2021-04-13 广东泰迪智能科技股份有限公司 Event merging method based on named entity and AP clustering
CN113986431A (en) * 2021-10-27 2022-01-28 武汉戴维南科技有限公司 Visual debugging method and system for automatic robot production line
CN113986431B (en) * 2021-10-27 2024-02-02 武汉戴维南科技有限公司 Visual debugging method and system for automatic robot production line

Also Published As

Publication number Publication date
CN104484666A (en) 2015-04-01

Similar Documents

Publication Publication Date Title
WO2016095487A1 (en) Human-computer interaction-based method for parsing high-level semantics of image
Miech et al. Learning a text-video embedding from incomplete and heterogeneous data
Datta et al. Content-based image retrieval: approaches and trends of the new age
Su et al. Improving image classification using semantic attributes
Liu et al. Image retagging using collaborative tag propagation
Wang et al. Combining global, regional and contextual features for automatic image annotation
Lan et al. Image retrieval with structured object queries using latent ranking svm
Wang et al. Active SVM-based relevance feedback using multiple classifiers ensemble and features reweighting
Alemu et al. Image retrieval in multimedia databases: A survey
Zhang et al. Sentiment analysis on microblogging by integrating text and image features
Mishra et al. Image mining in the context of content based image retrieval: a perspective
Moumtzidou et al. ITI-CERTH participation to TRECVID 2012.
Sharma et al. Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey
Feng et al. Graph-based multi-space semantic correlation propagation for video retrieval
Liu et al. LIRIS-Imagine at ImageCLEF 2011 Photo Annotation Task.
Abdulbaqi et al. A sketch based image retrieval: a review of literature
Tang et al. An efficient concept detection system via sparse ensemble learning
Seddati et al. DeepSketch2Image: deep convolutional neural networks for partial sketch recognition and image retrieval
Lin et al. Image auto-annotation via tag-dependent random search over range-constrained visual neighbours
Li et al. Automatic image annotation with continuous PLSA
Sousa et al. Geometric matching for clip-art drawing retrieval
Deljooi et al. A Novel Semantic Statistical Model for Automatic Image Annotation Using the Relationship between the Regions Based on Multi-Criteria Decision Making.
Zheng et al. Discovering discriminative patches for free-hand sketch analysis
Zhao et al. Relevance topic model for unstructured social group activity recognition
Bhandari et al. Ontology based image recognition: A review

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15868999

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15868999

Country of ref document: EP

Kind code of ref document: A1