CN104317867A - System for carrying out entity clustering on web pictures returned by search engine - Google Patents

System for carrying out entity clustering on web pictures returned by search engine Download PDF

Info

Publication number
CN104317867A
CN104317867A CN201410554684.XA CN201410554684A CN104317867A CN 104317867 A CN104317867 A CN 104317867A CN 201410554684 A CN201410554684 A CN 201410554684A CN 104317867 A CN104317867 A CN 104317867A
Authority
CN
China
Prior art keywords
concept
picture
context
cluster
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410554684.XA
Other languages
Chinese (zh)
Other versions
CN104317867B (en
Inventor
朱其立
赵凯祺
蔡智源
隋清宇
魏恩勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201410554684.XA priority Critical patent/CN104317867B/en
Publication of CN104317867A publication Critical patent/CN104317867A/en
Application granted granted Critical
Publication of CN104317867B publication Critical patent/CN104317867B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention relates to a system for carrying out entity clustering on web pictures returned by a search engine. The system comprises an offline system and an online system, wherein the offline system is used for preprocessing a source webpage in which all pictures are stored, the online system is used for receiving the inquiry, submitting the inquiry to the search engine and receiving multiple pages of returned picture results, concept element data and text of the source webpage are found for each page of returned results, an inquiry context and a picture context are extracted from the concept text, the online system carries out the three-layer clustering on the element data, the context and the expanded context after the context is expanded in a concept manner, a relevant descriptive concept is automatically marked for each category so as to know the entity of each category. The three-layer clustering algorithm has identical time complexity with an ordinary layering clustering algorithm; by subdividing the characteristics, more precision in the input of each layer, i.e. the output of a previous layer can be realized, the clustering effect can be effectively improved, and an accurate descriptive concept can be provided.

Description

The Web page picture returned search engine carries out the system of entity cluster
Technical field
The present invention relates to the natural language processing of field of computer technology, text mining, particularly, relate to the system that the Web page picture returned search engine carries out entity cluster.
Background technology
Along with the universal of internet and Web page picture growing, Web page picture search becomes a large daily use of Internet user gradually.Current photographic search engine mainly returns with the relevant picture of searching keyword.And these pictures often comprise multiple entity of the same name.User needs from Search Results, find desired picture, needs to browse the picture checked and often open and return.In order to improve the readability of Search Results, distinguishing Search Results according to different entities becomes an improvement of image search engine oppositely.
Image clustering is the method for automatic distinguishing different entities.In research in the past, D.Cai is (see Cai, D., He, X., Ma, W.Y., Wen, J.R., Zhang, H.:Organizing www images based on the analysis of page layout and web link structure.ICME 2004) utilize the mode of the piecemeal of view-based access control model to extract the context of Web page picture, and utilize this context and webpage link information to carry out cluster.But due to the instability of vision piecemeal, and the noise data in context, the precision of cluster has very large restriction; Z.Fu is (see Fu, Z., Ip, H.H.S., Lu, H., Lu, Z.:Multi-modal constraint propagation for heterogeneous image clustering.MultiMedia 2011) provide the framework of a kind of combination according to multiple module such as the label of image and the visual signature of image, the constraint by transmitting class on multiple figure realizes image clustering.The deficiency of the extraction precision of current visual signature, this framework can propagate the mistake that visual signature comprises.And the method needs to carry out constraint tramsfer in multiple figure, causes cluster inefficiency, is not suitable for the cluster to online picture search result.Current image clustering method can not the concept of providing a description property go to mark to each class.
Summary of the invention
The present invention is directed to deficiency of the prior art, provide the system that a Web page picture returned search engine carries out entity cluster, picture search result is organized according to different entities better, and each entity class has high precision, there is between different entities obvious discrimination.The present invention divide into online and off-line two parts whole framework, substantially reduces the time overhead of on-line talking.
For achieving the above object, the technical solution adopted in the present invention is as follows:
The Web page picture returned search engine carries out a system for entity cluster, comprises off-line system and on-line system two parts, wherein:
Off-line system, for carrying out pre-service to the source web page at all picture places, comprises extraction Web Page Metadata, former web page text and metadata concept is changed into the set (Concept Vectors) of one group of cum rights concept.Metadata after generalities and web page contents are inquired about for on-line system.
On-line system, for receiving inquiry, be submitted to search engine and also receive the multipage image results returned, returning results for each page, find generalities metadata and the text of source web page, and in the text of generalities, extract context (inquiry context) and the picture context of searching keyword, on-line system utilizes metadata respectively, context, and by wikipedia, the extended context after concept expansion is carried out to context and carry out three strata classes, and be the descriptive concept that each classification automatic marking is relevant, to understand the entity of each classification.
Described off-line system carries out Metadata Extraction, comprises the extraction to entry effective in URL, picture ALT attribute, to the extraction of the effective entry of URL, utilizes two classification device to classify to effective and invalid entry, and returns effective entry.Picture ALT attribute can directly obtain from html source code.
Described off-line system comprises generalities module, comprise the generalities to metadata and the former web page text of picture, generalities are by being mapped to the conceptive of wikipedia the word in metadata and text, metadata and text is made to change into the set of cum rights concept, to calculate similarity, for clustering algorithm, the weights of each concept are the importance of this concept to picture, and it is defined as follows:
CF - IDF ( c , d ) = CF ( c , d ) × log | D | DF ( c )
Wherein, CF-IDF (c, d) for concept c is to the importance of picture d, comprise two-part product: the frequency CF (c that concept occurs at picture context, d), and reverse context frequency, wherein reverse context frequency is inversely proportional to the contextual quantity D F (c) that concept occurred.
Described on-line system comprises text context abstraction module, contextual information is extracted in the former web page text of generalities, comprise the contextual extraction of picture and inquire about contextual extraction, picture context and inquiry context are all intercepted by the window of a fixed size, 50 concepts before and after such as picture or searching keyword, the text context extracted forms a concept vector, for calculating picture similarity.
Described on-line system comprises three layers of cluster arithmetic module, comprises metadata cluster, text context cluster, and context concept extended clustering three modules, wherein:
Ground floor cluster, carries out polymerization hierarchical clustering by the Concept Vectors after metadata generalities, obtains the cluster result that in class, precision is high, and merges the Concept Vectors of Concept Vectors as class of all pictures in each class.
Wherein, being polymerized hierarchical clustering algorithm utilizes the generalities of class to carry out the Similarity Measure of class.The generalities of class by the Concept Vectors of the picture in class is added, and remove the lower concept of vectorial intermediate value, obtain high-precision genus.The generalities of class define with following formula:
V ( C ) { c } = Σ d ∈ C CF - IDF ( c , d )
Wherein, c is concept, and C is class, and d is picture in class, and CF-IDF (c, d) is for concept is to the importance of picture.
Second layer cluster, adds the contextual Concept Vectors of generalities, the Concept Vectors of the class obtained after upgrading all ground floor clusters, and carries out polymerization hierarchical clustering to these classes obtained further in the Concept Vectors of each picture.
Third layer cluster, replaces to the vector of each picture the Concept Vectors of expansion, the Concept Vectors of the class obtained after upgrading all second layer clusters, and carries out polymerization hierarchical clustering to these Concept Vectors further.
Wherein, the expansion of vector utilizes the conceptual description page of wikipedia, relevant concept is joined in the Concept Vectors of picture, and upgrades the Concept Vectors of each class.It is more newly defined as following formula:
V ′ ( C ) { c } = Σ c i ∈ V C ( V ( C ) { c i } × CF - IDF ( c , d c i ) )
Wherein, for concept c is to concept c iwikipedia the importance of the page, c are described ifor the concept in current genus vector, before this context extension process is maximum by selected value, k concept is filtered noise data.
With the genus vector drawn after three strata classes to the relevant description concept of each picture category mark: choose the highest front several concept of the Concept Vectors intermediate value of each class for describing the entity representated by such.
The technical matters that the present invention solves comprises:
1. abstract image contextual information, and contextual information is expressed as the vector in concept space, for the calculating of image similarity provides feature.
2. because some image exists the situation of contextual information quantity not sufficient, the invention provides a kind of mechanism of extended context information, contextual Concept Vectors is expanded by wikipedia or other knowledge bases.
3., because different features is different with the degree of correlation of picture, the degree of confidence of the feature that the degree of correlation is higher is higher, and the present invention, in order to effectively utilize the feature of the different degree of correlation to improve the precision of cluster, expands the Concept Vectors of picture successively, and cluster.
Below by way of the contrast that related art and the present invention of retrieval carry out, technical characteristic of the present invention is described.
Coordinate indexing 1:
Application Number (patent): 2012101444570, title: a kind of method of picture cluster and device
This patent documentation, by visual signature to picture, comprises global characteristics and local feature has carried out twice cluster, and second time cluster is cut on the basis of first time cluster.
Technical essential compares:
1. this patent is according to the content of picture, and namely visual signature carries out picture cluster, and utilizes the contextual feature of picture to carry out cluster in the present invention.
2. the secondary cluster of this patent cuts into little class large class, and the present invention synthesizes large class from little birdsing of the same feather flock together, and utilizes each expansion concept vector to carry out the screening of feature, filtered noise data.
3. the Concept Vectors representation that the present invention adopts can describe concept for each class mark, and cannot provide conceptual description based on the cluster mode of image content.
Coordinate indexing 2:
Application Number (patent): 2013106111554, title: a kind of massive image retrieval system based on the compact feature of cluster
This patent documentation carries out cluster by the local feature of image to the image in image library.First retrieve picture cluster by searching keyword when search and then return corresponding image.
Technical essential compares:
1. this patent generates the compact feature of cluster according to the local feature of picture, carries out picture cluster, and utilizes the contextual feature of picture to carry out cluster in the present invention.
2. this patent improves the speed of retrieval by image clustering, and the present invention is by carrying out cluster Search Results and generalities distinguish the Search Results of each classification to provide.
Coordinate indexing 3:
Application Number (patent): 201210545637X, title: a kind of balanced image clustering method based on hierarchical cluster
The picture number of required traversal when the mode of picture cluster that utilizes this patent documentation reduces search.Picture cluster is based on image high dimensional feature data.
Technical essential compares:
1. this patent is according to the high dimensional feature of picture, carries out picture cluster, and utilizes the contextual feature of picture to carry out cluster in the present invention.
2. need the picture of traversal when this patent reduces retrieval by image clustering, the image clustering mode of employing is hierarchical clustering, and the present invention is based on three kinds of different contextual features, is promoted the precision of cluster by the mode of three strata classes.
Coordinate indexing 4:
Application Number (patent): 201210163641X, title: image clustering method
This patent obtains time data and the position data of image by capture apparatus, and utilizes time and position and speed data to carry out cluster as feature.
Technical essential compares:
1. this patent carries out cluster mainly for shooting image, and the present invention is directed to Web page picture and carry out cluster.The image of shooting does not have contextual information, and Web page picture not necessarily takes image, and major part does not have shooting time and position.Both features are different.
2. this patent carries out cluster based on sequence of events, and the present invention is based on Concept Vectors.Concept Vectors may be used for the generation describing concept.
Coordinate indexing 5:
Application Number (patent): 2009801523973, title: use content-based filtering and based on the cluster of theme by image layout in the page
The content of the picture that this patent captures based on equipment, i.e. visual signature, according to different Subject Clusterings, and is mapped to the result of cluster in corresponding photo album.
Technical essential compares:
1. the visual signature cluster of this patent utilization picture, and the present invention utilizes the context of Web page picture to carry out cluster.
2. picture is passed through figure chip layout on the different pages by this patent, and the present invention provides the Search Results of classification for user and describes concept accordingly.
Coordinate indexing 6:
Application Number (patent): 2010105171639, title: image clustering method and system
This patent adopts the mode of parameter estimation to set up the digraph of image, and carries out image clustering in the mode splitting digraph.The segmentation of digraph forms multiple subgraph, and the image of each subgraph is classified as a class.
Technical essential compares:
1. the mode of this patent utilization figure carries out cluster, and image library is expressed as a digraph.The present invention is polymerized picture by mode from small to large and forms picture category, and each strata class considers different image contextual characteristics.
Coordinate indexing 7:
Application Number (patent): 2005800393866, title: image clustering method and system
This patent utilization temporally point patterns carries out cluster to image according to event, and the clustering algorithm of employing carries out the cluster of different layers according to different time ranges.
Technical essential compares:
1. the layer in the multi-level clustering of this patent is different time scope, and layer of the present invention is the layer defined according to different characteristic.
2. this patent carries out cluster according to sequence of events, and the present invention distinguishes different picture categories according to different entities.
Compared with prior art, the present invention creatively utilizes three kinds of different features, with three layers of clustering algorithm of correspondence, cluster is carried out to picture, and for each class provides concept tagging, picture search result is organized according to different entities better, and each entity class has high precision, there is between different entities obvious discrimination.The present invention divide into online and off-line two parts whole framework, substantially reduces the time overhead of on-line talking.
Accompanying drawing explanation
By reading the detailed description done non-limiting example with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 illustrates system framework figure of the present invention;
Fig. 2 illustrates three layers of clustering algorithm exemplary plot of the present invention.
Embodiment
Elaborate to embodiments of the invention below in conjunction with accompanying drawing, the present embodiment is implemented under premised on invention technical scheme, give detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.
The task of the present embodiment is the searching keyword " bean " to user's input, obtain search engine picture search result, cluster is carried out to the example of the difference " bean " in result, to distinguish different entities, and provides front different concept tagging for each different " bean ".
As shown in Figure 1, all original web page that the Metadata Extraction module of off-line system of the present invention is relevant to the present embodiment " bean " carry out the extraction of metadata context.URL as certain webpage is:
“http://domain.com/53C316-C2oJ5/mr_bean.jpg”
Word is separated by decollator by Metadata Extraction module, and utilizes two classification device to be detected by significant character.As: " mr bean ".The generalities module of off-line system has carried out generalities to the metadata of " bean " and related web page, obtains metadata Concept Vectors and text concept vector.
When after the searching keyword " bean " receiving user, the text context abstraction module of on-line system finds the position of picture and searching keyword " bean " from the text of generalities, and extracts 50 concepts in front and back as text context Concept Vectors.Utilize metadata Concept Vectors and text context Concept Vectors, on-line system carries out three strata classes.
As shown in Figure 2, first three strata generic modules of on-line system calculate picture similarity according to metadata Concept Vectors and carry out polymer layer time cluster (Concept Vectors of picture 1 and picture 2 all comprises concept " Mr.Bean ", and picture 3 and picture 4 all do not find effective metadata concept).In polymerization hierarchical clustering, the Concept Vectors of the similarity class between class calculates.System calculates the Concept Vectors of class from the result of ground floor cluster, and as picture 1 and picture 2 define a class, this type of Concept Vectors comprises concept " Mr.Bean ".
Second layer cluster carries out further cluster by the Concept Vectors expanding picture on the basis of ground floor cluster.The Concept Vectors of the class formed as picture in Fig. 21 and picture 2 adds concept " Rowan Atkinson ", and the Concept Vectors of picture 3 adds " Rowan Atkinson " and " Comedy ", and picture 4 adds " Blackadder ".Because the vector after expansion has how common concept, on-line system merges some similar classes through second time hierarchical clustering, obtains more large class.As picture in Fig. 21,2,3 define new class, and the Concept Vectors of class is expanded to " Mr.Bean ", " Rowan Atkinson ", " Comedy ".
First third layer cluster is expanded the vector wikipedia of each class or picture, and as picture in Fig. 21,2, add " Blackadder " in the Concept Vectors of the class of 3 compositions, picture 4 adds " Rowan Atkinson ".By the expansion based on wikipedia, between class vector, have higher similarity.On-line system goes to be polymerized some classes originally do not merged because quantity of information is not enough further by third time hierarchical clustering.Picture 1 is comprised, 2, in the class of 3 as the picture 4 in Fig. 2 can be merged into by spread vector.
After three layers of clustering algorithm terminate, the classification that on-line system is separately different, presents to user all entities and picture thereof.Each entity front several concept of concept most representative in corresponding Concept Vectors (being worth maximum) describes.Class in such as Fig. 2 can use " Mr.Bean ", " Rowan Atkinson ", and " Comedy ", concepts such as " Blackadder " describes the picture of the American comedian about handou sir by name.
Above specific embodiments of the invention are described.It is to be appreciated that the present invention is not limited to above-mentioned particular implementation, those skilled in the art can make various distortion or amendment within the scope of the claims, and this does not affect flesh and blood of the present invention.

Claims (8)

1. the Web page picture returned search engine carries out a system for entity cluster, it is characterized in that, comprises off-line system and on-line system, wherein:
Off-line system, for carrying out pre-service to the source web page at all picture places, comprises extraction Web Page Metadata, former web page text and metadata concept are changed into the set of one group of cum rights concept, that is, Concept Vectors, the metadata after generalities and web page contents are inquired about for on-line system;
On-line system, for receiving inquiry, be submitted to search engine and also receive the multipage image results returned, returning results for each page, find generalities metadata and the text of source web page, and in the text of generalities, extract context and the picture context of searching keyword, on-line system utilizes metadata respectively, context, and the extended context after concept expansion is carried out to context carry out three strata classes, and be the descriptive concept that each classification automatic marking is relevant, to understand the entity of each classification.
2. the Web page picture returned search engine according to claim 1 carries out the system of entity cluster, it is characterized in that, described off-line system carries out Metadata Extraction, comprise the extraction to entry effective in URL, picture ALT attribute, wherein to the extraction of the effective entry of URL, be utilize two classification device to classify to effective and invalid entry, and return effective entry.
3. the Web page picture returned search engine according to claim 1 carries out the system of entity cluster, it is characterized in that, described off-line system comprises generalities module, for carrying out concept expansion to context, text is by generalities module, convert the set of cum rights concept to, the weights of each concept are the importance of this concept to picture, and it is defined as follows:
CF - IDF ( c , d ) = CF ( c , d ) × log | D | DF ( c )
Wherein, CF-IDF(c, d) for concept c is to the importance of picture d, comprise two-part product: the frequency CF (c that concept occurs at picture context, d), and reverse context frequency, wherein reverse context frequency is inversely proportional to the contextual quantity D F (c) that concept occurred, D is the contextual set of all pictures.
4. the Web page picture returned search engine according to claim 1 carries out the system of entity cluster, it is characterized in that, on-line system comprises text context abstraction module, for inputted searching keyword, extracts its generalities inquiry context and picture context.
5. the Web page picture returned search engine according to claim 4 carries out the system of entity cluster, it is characterized in that, described on-line system comprises three layers of cluster arithmetic module, this module is according to the metadata extracted, context, and context three category feature of expansion is from the highest metadata of degree of confidence, to context, the cluster of three levels is carried out to extended context, wherein:
Ground floor cluster, carries out polymerization hierarchical clustering by the Concept Vectors after metadata generalities, obtains the cluster result that in class, precision is high, and merges the Concept Vectors of Concept Vectors as class of all pictures in each class;
Second layer cluster, adds the contextual Concept Vectors of generalities, the Concept Vectors of the class obtained after upgrading all ground floor clusters, and carries out polymerization hierarchical clustering to these classes obtained further in the Concept Vectors of each picture;
Third layer cluster, replaces to the vector of each picture the Concept Vectors of expansion, the Concept Vectors of the class obtained after upgrading all second layer clusters, and carries out polymerization hierarchical clustering to these Concept Vectors further.
6. the Web page picture returned search engine according to claim 5 carries out the system of entity cluster, it is characterized in that, the polymerization hierarchical clustering algorithm used utilizes the generalities of class to carry out the Similarity Measure of class, the generalities of class are by being added the Concept Vectors of the picture in class, and remove the concept that vectorial intermediate value is lower, obtain high-precision genus, the generalities of class define with following formula:
V ( C ) { c } = Σ d ∈ C CF - IDF ( c , d )
Wherein, c is concept, and C is class, and d is picture in class, and CF-IDF (c, d) is for concept is to the importance of picture.
7. the Web page picture returned search engine according to claim 5 carries out the system of entity cluster, it is characterized in that, third layer cluster carries out contextual expansion by wikipedia, the Concept Vectors of picture is replaced to the Concept Vectors of expansion, and upgrade the Concept Vectors of each class, be more newly defined as following formula:
V ′ ( C ) { c } = Σ c i ∈ V C ( V ( C ) { c i } × CF - IDF ( c , d c i ) )
Wherein, for concept c is to concept c iwikipedia the importance of the page, V are described cfor the set of all concepts of current genus vector, c ifor the concept in current genus vector, before context extension process is maximum by selected value, k concept is filtered noise data.
8. the Web page picture returned search engine according to claim 1 carries out the system of entity cluster, it is characterized in that, the genus vector drawn after utilizing described three strata classes, to the relevant description concept of each picture category mark, chooses the highest front several concept of the Concept Vectors intermediate value of each class for describing the entity representated by such.
CN201410554684.XA 2014-10-17 2014-10-17 The system that entity cluster is carried out to the Web page picture that search engine returns Expired - Fee Related CN104317867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410554684.XA CN104317867B (en) 2014-10-17 2014-10-17 The system that entity cluster is carried out to the Web page picture that search engine returns

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410554684.XA CN104317867B (en) 2014-10-17 2014-10-17 The system that entity cluster is carried out to the Web page picture that search engine returns

Publications (2)

Publication Number Publication Date
CN104317867A true CN104317867A (en) 2015-01-28
CN104317867B CN104317867B (en) 2018-02-09

Family

ID=52373099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410554684.XA Expired - Fee Related CN104317867B (en) 2014-10-17 2014-10-17 The system that entity cluster is carried out to the Web page picture that search engine returns

Country Status (1)

Country Link
CN (1) CN104317867B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279264A (en) * 2015-10-26 2016-01-27 深圳市智搜信息技术有限公司 Semantic relevancy calculation method of document
CN105426925A (en) * 2015-12-28 2016-03-23 联想(北京)有限公司 Image marking method and electronic equipment
CN106844336A (en) * 2016-12-26 2017-06-13 博彦科技股份有限公司 Data model processing method and processing device
CN107408156A (en) * 2015-03-09 2017-11-28 皇家飞利浦有限公司 For carrying out semantic search and the system and method for extracting related notion from clinical document
CN108780462A (en) * 2016-03-13 2018-11-09 科尔蒂卡有限公司 System and method for being clustered to multimedia content element
CN109919175A (en) * 2019-01-16 2019-06-21 浙江大学 A kind of more classification methods of entity of combination attribute information

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090094020A1 (en) * 2007-10-05 2009-04-09 Fujitsu Limited Recommending Terms To Specify Ontology Space
CN101751439A (en) * 2008-12-17 2010-06-23 中国科学院自动化研究所 Image retrieval method based on hierarchical clustering
CN102902821A (en) * 2012-11-01 2013-01-30 北京邮电大学 Methods for labeling and searching advanced semantics of imagse based on network hot topics and device
CN103577537A (en) * 2013-09-24 2014-02-12 上海交通大学 Image sharing website picture-oriented multi-pairing similarity determining method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090094020A1 (en) * 2007-10-05 2009-04-09 Fujitsu Limited Recommending Terms To Specify Ontology Space
CN101751439A (en) * 2008-12-17 2010-06-23 中国科学院自动化研究所 Image retrieval method based on hierarchical clustering
CN102902821A (en) * 2012-11-01 2013-01-30 北京邮电大学 Methods for labeling and searching advanced semantics of imagse based on network hot topics and device
CN103577537A (en) * 2013-09-24 2014-02-12 上海交通大学 Image sharing website picture-oriented multi-pairing similarity determining method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107408156A (en) * 2015-03-09 2017-11-28 皇家飞利浦有限公司 For carrying out semantic search and the system and method for extracting related notion from clinical document
CN105279264A (en) * 2015-10-26 2016-01-27 深圳市智搜信息技术有限公司 Semantic relevancy calculation method of document
CN105279264B (en) * 2015-10-26 2018-07-03 深圳市智搜信息技术有限公司 A kind of semantic relevancy computational methods of document
CN105426925A (en) * 2015-12-28 2016-03-23 联想(北京)有限公司 Image marking method and electronic equipment
CN105426925B (en) * 2015-12-28 2019-03-08 联想(北京)有限公司 Image labeling method and electronic equipment
CN108780462A (en) * 2016-03-13 2018-11-09 科尔蒂卡有限公司 System and method for being clustered to multimedia content element
CN106844336A (en) * 2016-12-26 2017-06-13 博彦科技股份有限公司 Data model processing method and processing device
CN109919175A (en) * 2019-01-16 2019-06-21 浙江大学 A kind of more classification methods of entity of combination attribute information

Also Published As

Publication number Publication date
CN104317867B (en) 2018-02-09

Similar Documents

Publication Publication Date Title
US9183281B2 (en) Context-based document unit recommendation for sensemaking tasks
CN104317867A (en) System for carrying out entity clustering on web pictures returned by search engine
Hindle et al. Clustering web video search results based on integration of multiple features
EP1426882A2 (en) Information storage and retrieval
WO2008073784A1 (en) Web site structure analysis
GB2395808A (en) Information retrieval
CN109815386B (en) User portrait-based construction method and device and storage medium
Papadopoulos et al. Image clustering through community detection on hybrid image similarity graphs
Trevisiol et al. Retrieving geo-location of videos with a divide & conquer hierarchical multimodal approach
Nesi et al. Ge (o) Lo (cator): Geographic information extraction from unstructured text data and Web documents
Ruocco et al. A scalable algorithm for extraction and clustering of event-related pictures
Li et al. Improving relevance judgment of web search results with image excerpts
Sergieh et al. Geo-based automatic image annotation
WO2023057988A1 (en) Generation and use of content briefs for network content authoring
Li et al. Word2image: towards visual interpreting of words
Rome et al. Towards a formal concept analysis approach to exploring communities on the world wide web
Gkoufas et al. Suppl 1: Combining textual and visual information for image retrieval in the medical domain
CN106168947A (en) A kind of related entities method for digging and system
Shchekotykhin et al. AllRight: automatic ontology instantiation from tabular web documents
CN105279172A (en) Video matching method and device
Cheung et al. A shape-based searching system for industrial components
Kelm et al. Multimodal geo-tagging in social media websites using hierarchical spatial segmentation
Naseer et al. Wrapper Extraction and Integration using GNN
Zhou et al. Automatic image annotation by using relevant keywords extracted from auxiliary text documents
Doulaverakis et al. Exploiting visual similarities for ontology alignment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180209

Termination date: 20201017