CN104317867A - System for carrying out entity clustering on web pictures returned by search engine - Google Patents
System for carrying out entity clustering on web pictures returned by search engine Download PDFInfo
- Publication number
- CN104317867A CN104317867A CN201410554684.XA CN201410554684A CN104317867A CN 104317867 A CN104317867 A CN 104317867A CN 201410554684 A CN201410554684 A CN 201410554684A CN 104317867 A CN104317867 A CN 104317867A
- Authority
- CN
- China
- Prior art keywords
- concept
- picture
- context
- cluster
- class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Abstract
The invention relates to a system for carrying out entity clustering on web pictures returned by a search engine. The system comprises an offline system and an online system, wherein the offline system is used for preprocessing a source webpage in which all pictures are stored, the online system is used for receiving the inquiry, submitting the inquiry to the search engine and receiving multiple pages of returned picture results, concept element data and text of the source webpage are found for each page of returned results, an inquiry context and a picture context are extracted from the concept text, the online system carries out the three-layer clustering on the element data, the context and the expanded context after the context is expanded in a concept manner, a relevant descriptive concept is automatically marked for each category so as to know the entity of each category. The three-layer clustering algorithm has identical time complexity with an ordinary layering clustering algorithm; by subdividing the characteristics, more precision in the input of each layer, i.e. the output of a previous layer can be realized, the clustering effect can be effectively improved, and an accurate descriptive concept can be provided.
Description
Technical field
The present invention relates to the natural language processing of field of computer technology, text mining, particularly, relate to the system that the Web page picture returned search engine carries out entity cluster.
Background technology
Along with the universal of internet and Web page picture growing, Web page picture search becomes a large daily use of Internet user gradually.Current photographic search engine mainly returns with the relevant picture of searching keyword.And these pictures often comprise multiple entity of the same name.User needs from Search Results, find desired picture, needs to browse the picture checked and often open and return.In order to improve the readability of Search Results, distinguishing Search Results according to different entities becomes an improvement of image search engine oppositely.
Image clustering is the method for automatic distinguishing different entities.In research in the past, D.Cai is (see Cai, D., He, X., Ma, W.Y., Wen, J.R., Zhang, H.:Organizing www images based on the analysis of page layout and web link structure.ICME 2004) utilize the mode of the piecemeal of view-based access control model to extract the context of Web page picture, and utilize this context and webpage link information to carry out cluster.But due to the instability of vision piecemeal, and the noise data in context, the precision of cluster has very large restriction; Z.Fu is (see Fu, Z., Ip, H.H.S., Lu, H., Lu, Z.:Multi-modal constraint propagation for heterogeneous image clustering.MultiMedia 2011) provide the framework of a kind of combination according to multiple module such as the label of image and the visual signature of image, the constraint by transmitting class on multiple figure realizes image clustering.The deficiency of the extraction precision of current visual signature, this framework can propagate the mistake that visual signature comprises.And the method needs to carry out constraint tramsfer in multiple figure, causes cluster inefficiency, is not suitable for the cluster to online picture search result.Current image clustering method can not the concept of providing a description property go to mark to each class.
Summary of the invention
The present invention is directed to deficiency of the prior art, provide the system that a Web page picture returned search engine carries out entity cluster, picture search result is organized according to different entities better, and each entity class has high precision, there is between different entities obvious discrimination.The present invention divide into online and off-line two parts whole framework, substantially reduces the time overhead of on-line talking.
For achieving the above object, the technical solution adopted in the present invention is as follows:
The Web page picture returned search engine carries out a system for entity cluster, comprises off-line system and on-line system two parts, wherein:
Off-line system, for carrying out pre-service to the source web page at all picture places, comprises extraction Web Page Metadata, former web page text and metadata concept is changed into the set (Concept Vectors) of one group of cum rights concept.Metadata after generalities and web page contents are inquired about for on-line system.
On-line system, for receiving inquiry, be submitted to search engine and also receive the multipage image results returned, returning results for each page, find generalities metadata and the text of source web page, and in the text of generalities, extract context (inquiry context) and the picture context of searching keyword, on-line system utilizes metadata respectively, context, and by wikipedia, the extended context after concept expansion is carried out to context and carry out three strata classes, and be the descriptive concept that each classification automatic marking is relevant, to understand the entity of each classification.
Described off-line system carries out Metadata Extraction, comprises the extraction to entry effective in URL, picture ALT attribute, to the extraction of the effective entry of URL, utilizes two classification device to classify to effective and invalid entry, and returns effective entry.Picture ALT attribute can directly obtain from html source code.
Described off-line system comprises generalities module, comprise the generalities to metadata and the former web page text of picture, generalities are by being mapped to the conceptive of wikipedia the word in metadata and text, metadata and text is made to change into the set of cum rights concept, to calculate similarity, for clustering algorithm, the weights of each concept are the importance of this concept to picture, and it is defined as follows:
Wherein, CF-IDF (c, d) for concept c is to the importance of picture d, comprise two-part product: the frequency CF (c that concept occurs at picture context, d), and reverse context frequency, wherein reverse context frequency is inversely proportional to the contextual quantity D F (c) that concept occurred.
Described on-line system comprises text context abstraction module, contextual information is extracted in the former web page text of generalities, comprise the contextual extraction of picture and inquire about contextual extraction, picture context and inquiry context are all intercepted by the window of a fixed size, 50 concepts before and after such as picture or searching keyword, the text context extracted forms a concept vector, for calculating picture similarity.
Described on-line system comprises three layers of cluster arithmetic module, comprises metadata cluster, text context cluster, and context concept extended clustering three modules, wherein:
Ground floor cluster, carries out polymerization hierarchical clustering by the Concept Vectors after metadata generalities, obtains the cluster result that in class, precision is high, and merges the Concept Vectors of Concept Vectors as class of all pictures in each class.
Wherein, being polymerized hierarchical clustering algorithm utilizes the generalities of class to carry out the Similarity Measure of class.The generalities of class by the Concept Vectors of the picture in class is added, and remove the lower concept of vectorial intermediate value, obtain high-precision genus.The generalities of class define with following formula:
Wherein, c is concept, and C is class, and d is picture in class, and CF-IDF (c, d) is for concept is to the importance of picture.
Second layer cluster, adds the contextual Concept Vectors of generalities, the Concept Vectors of the class obtained after upgrading all ground floor clusters, and carries out polymerization hierarchical clustering to these classes obtained further in the Concept Vectors of each picture.
Third layer cluster, replaces to the vector of each picture the Concept Vectors of expansion, the Concept Vectors of the class obtained after upgrading all second layer clusters, and carries out polymerization hierarchical clustering to these Concept Vectors further.
Wherein, the expansion of vector utilizes the conceptual description page of wikipedia, relevant concept is joined in the Concept Vectors of picture, and upgrades the Concept Vectors of each class.It is more newly defined as following formula:
Wherein,
for concept c is to concept c
iwikipedia the importance of the page, c are described
ifor the concept in current genus vector, before this context extension process is maximum by selected value, k concept is filtered noise data.
With the genus vector drawn after three strata classes to the relevant description concept of each picture category mark: choose the highest front several concept of the Concept Vectors intermediate value of each class for describing the entity representated by such.
The technical matters that the present invention solves comprises:
1. abstract image contextual information, and contextual information is expressed as the vector in concept space, for the calculating of image similarity provides feature.
2. because some image exists the situation of contextual information quantity not sufficient, the invention provides a kind of mechanism of extended context information, contextual Concept Vectors is expanded by wikipedia or other knowledge bases.
3., because different features is different with the degree of correlation of picture, the degree of confidence of the feature that the degree of correlation is higher is higher, and the present invention, in order to effectively utilize the feature of the different degree of correlation to improve the precision of cluster, expands the Concept Vectors of picture successively, and cluster.
Below by way of the contrast that related art and the present invention of retrieval carry out, technical characteristic of the present invention is described.
Coordinate indexing 1:
Application Number (patent): 2012101444570, title: a kind of method of picture cluster and device
This patent documentation, by visual signature to picture, comprises global characteristics and local feature has carried out twice cluster, and second time cluster is cut on the basis of first time cluster.
Technical essential compares:
1. this patent is according to the content of picture, and namely visual signature carries out picture cluster, and utilizes the contextual feature of picture to carry out cluster in the present invention.
2. the secondary cluster of this patent cuts into little class large class, and the present invention synthesizes large class from little birdsing of the same feather flock together, and utilizes each expansion concept vector to carry out the screening of feature, filtered noise data.
3. the Concept Vectors representation that the present invention adopts can describe concept for each class mark, and cannot provide conceptual description based on the cluster mode of image content.
Coordinate indexing 2:
Application Number (patent): 2013106111554, title: a kind of massive image retrieval system based on the compact feature of cluster
This patent documentation carries out cluster by the local feature of image to the image in image library.First retrieve picture cluster by searching keyword when search and then return corresponding image.
Technical essential compares:
1. this patent generates the compact feature of cluster according to the local feature of picture, carries out picture cluster, and utilizes the contextual feature of picture to carry out cluster in the present invention.
2. this patent improves the speed of retrieval by image clustering, and the present invention is by carrying out cluster Search Results and generalities distinguish the Search Results of each classification to provide.
Coordinate indexing 3:
Application Number (patent): 201210545637X, title: a kind of balanced image clustering method based on hierarchical cluster
The picture number of required traversal when the mode of picture cluster that utilizes this patent documentation reduces search.Picture cluster is based on image high dimensional feature data.
Technical essential compares:
1. this patent is according to the high dimensional feature of picture, carries out picture cluster, and utilizes the contextual feature of picture to carry out cluster in the present invention.
2. need the picture of traversal when this patent reduces retrieval by image clustering, the image clustering mode of employing is hierarchical clustering, and the present invention is based on three kinds of different contextual features, is promoted the precision of cluster by the mode of three strata classes.
Coordinate indexing 4:
Application Number (patent): 201210163641X, title: image clustering method
This patent obtains time data and the position data of image by capture apparatus, and utilizes time and position and speed data to carry out cluster as feature.
Technical essential compares:
1. this patent carries out cluster mainly for shooting image, and the present invention is directed to Web page picture and carry out cluster.The image of shooting does not have contextual information, and Web page picture not necessarily takes image, and major part does not have shooting time and position.Both features are different.
2. this patent carries out cluster based on sequence of events, and the present invention is based on Concept Vectors.Concept Vectors may be used for the generation describing concept.
Coordinate indexing 5:
Application Number (patent): 2009801523973, title: use content-based filtering and based on the cluster of theme by image layout in the page
The content of the picture that this patent captures based on equipment, i.e. visual signature, according to different Subject Clusterings, and is mapped to the result of cluster in corresponding photo album.
Technical essential compares:
1. the visual signature cluster of this patent utilization picture, and the present invention utilizes the context of Web page picture to carry out cluster.
2. picture is passed through figure chip layout on the different pages by this patent, and the present invention provides the Search Results of classification for user and describes concept accordingly.
Coordinate indexing 6:
Application Number (patent): 2010105171639, title: image clustering method and system
This patent adopts the mode of parameter estimation to set up the digraph of image, and carries out image clustering in the mode splitting digraph.The segmentation of digraph forms multiple subgraph, and the image of each subgraph is classified as a class.
Technical essential compares:
1. the mode of this patent utilization figure carries out cluster, and image library is expressed as a digraph.The present invention is polymerized picture by mode from small to large and forms picture category, and each strata class considers different image contextual characteristics.
Coordinate indexing 7:
Application Number (patent): 2005800393866, title: image clustering method and system
This patent utilization temporally point patterns carries out cluster to image according to event, and the clustering algorithm of employing carries out the cluster of different layers according to different time ranges.
Technical essential compares:
1. the layer in the multi-level clustering of this patent is different time scope, and layer of the present invention is the layer defined according to different characteristic.
2. this patent carries out cluster according to sequence of events, and the present invention distinguishes different picture categories according to different entities.
Compared with prior art, the present invention creatively utilizes three kinds of different features, with three layers of clustering algorithm of correspondence, cluster is carried out to picture, and for each class provides concept tagging, picture search result is organized according to different entities better, and each entity class has high precision, there is between different entities obvious discrimination.The present invention divide into online and off-line two parts whole framework, substantially reduces the time overhead of on-line talking.
Accompanying drawing explanation
By reading the detailed description done non-limiting example with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 illustrates system framework figure of the present invention;
Fig. 2 illustrates three layers of clustering algorithm exemplary plot of the present invention.
Embodiment
Elaborate to embodiments of the invention below in conjunction with accompanying drawing, the present embodiment is implemented under premised on invention technical scheme, give detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.
The task of the present embodiment is the searching keyword " bean " to user's input, obtain search engine picture search result, cluster is carried out to the example of the difference " bean " in result, to distinguish different entities, and provides front different concept tagging for each different " bean ".
As shown in Figure 1, all original web page that the Metadata Extraction module of off-line system of the present invention is relevant to the present embodiment " bean " carry out the extraction of metadata context.URL as certain webpage is:
“http://domain.com/53C316-C2oJ5/mr_bean.jpg”
Word is separated by decollator by Metadata Extraction module, and utilizes two classification device to be detected by significant character.As: " mr bean ".The generalities module of off-line system has carried out generalities to the metadata of " bean " and related web page, obtains metadata Concept Vectors and text concept vector.
When after the searching keyword " bean " receiving user, the text context abstraction module of on-line system finds the position of picture and searching keyword " bean " from the text of generalities, and extracts 50 concepts in front and back as text context Concept Vectors.Utilize metadata Concept Vectors and text context Concept Vectors, on-line system carries out three strata classes.
As shown in Figure 2, first three strata generic modules of on-line system calculate picture similarity according to metadata Concept Vectors and carry out polymer layer time cluster (Concept Vectors of picture 1 and picture 2 all comprises concept " Mr.Bean ", and picture 3 and picture 4 all do not find effective metadata concept).In polymerization hierarchical clustering, the Concept Vectors of the similarity class between class calculates.System calculates the Concept Vectors of class from the result of ground floor cluster, and as picture 1 and picture 2 define a class, this type of Concept Vectors comprises concept " Mr.Bean ".
Second layer cluster carries out further cluster by the Concept Vectors expanding picture on the basis of ground floor cluster.The Concept Vectors of the class formed as picture in Fig. 21 and picture 2 adds concept " Rowan Atkinson ", and the Concept Vectors of picture 3 adds " Rowan Atkinson " and " Comedy ", and picture 4 adds " Blackadder ".Because the vector after expansion has how common concept, on-line system merges some similar classes through second time hierarchical clustering, obtains more large class.As picture in Fig. 21,2,3 define new class, and the Concept Vectors of class is expanded to " Mr.Bean ", " Rowan Atkinson ", " Comedy ".
First third layer cluster is expanded the vector wikipedia of each class or picture, and as picture in Fig. 21,2, add " Blackadder " in the Concept Vectors of the class of 3 compositions, picture 4 adds " Rowan Atkinson ".By the expansion based on wikipedia, between class vector, have higher similarity.On-line system goes to be polymerized some classes originally do not merged because quantity of information is not enough further by third time hierarchical clustering.Picture 1 is comprised, 2, in the class of 3 as the picture 4 in Fig. 2 can be merged into by spread vector.
After three layers of clustering algorithm terminate, the classification that on-line system is separately different, presents to user all entities and picture thereof.Each entity front several concept of concept most representative in corresponding Concept Vectors (being worth maximum) describes.Class in such as Fig. 2 can use " Mr.Bean ", " Rowan Atkinson ", and " Comedy ", concepts such as " Blackadder " describes the picture of the American comedian about handou sir by name.
Above specific embodiments of the invention are described.It is to be appreciated that the present invention is not limited to above-mentioned particular implementation, those skilled in the art can make various distortion or amendment within the scope of the claims, and this does not affect flesh and blood of the present invention.
Claims (8)
1. the Web page picture returned search engine carries out a system for entity cluster, it is characterized in that, comprises off-line system and on-line system, wherein:
Off-line system, for carrying out pre-service to the source web page at all picture places, comprises extraction Web Page Metadata, former web page text and metadata concept are changed into the set of one group of cum rights concept, that is, Concept Vectors, the metadata after generalities and web page contents are inquired about for on-line system;
On-line system, for receiving inquiry, be submitted to search engine and also receive the multipage image results returned, returning results for each page, find generalities metadata and the text of source web page, and in the text of generalities, extract context and the picture context of searching keyword, on-line system utilizes metadata respectively, context, and the extended context after concept expansion is carried out to context carry out three strata classes, and be the descriptive concept that each classification automatic marking is relevant, to understand the entity of each classification.
2. the Web page picture returned search engine according to claim 1 carries out the system of entity cluster, it is characterized in that, described off-line system carries out Metadata Extraction, comprise the extraction to entry effective in URL, picture ALT attribute, wherein to the extraction of the effective entry of URL, be utilize two classification device to classify to effective and invalid entry, and return effective entry.
3. the Web page picture returned search engine according to claim 1 carries out the system of entity cluster, it is characterized in that, described off-line system comprises generalities module, for carrying out concept expansion to context, text is by generalities module, convert the set of cum rights concept to, the weights of each concept are the importance of this concept to picture, and it is defined as follows:
Wherein, CF-IDF(c, d) for concept c is to the importance of picture d, comprise two-part product: the frequency CF (c that concept occurs at picture context, d), and reverse context frequency, wherein reverse context frequency is inversely proportional to the contextual quantity D F (c) that concept occurred, D is the contextual set of all pictures.
4. the Web page picture returned search engine according to claim 1 carries out the system of entity cluster, it is characterized in that, on-line system comprises text context abstraction module, for inputted searching keyword, extracts its generalities inquiry context and picture context.
5. the Web page picture returned search engine according to claim 4 carries out the system of entity cluster, it is characterized in that, described on-line system comprises three layers of cluster arithmetic module, this module is according to the metadata extracted, context, and context three category feature of expansion is from the highest metadata of degree of confidence, to context, the cluster of three levels is carried out to extended context, wherein:
Ground floor cluster, carries out polymerization hierarchical clustering by the Concept Vectors after metadata generalities, obtains the cluster result that in class, precision is high, and merges the Concept Vectors of Concept Vectors as class of all pictures in each class;
Second layer cluster, adds the contextual Concept Vectors of generalities, the Concept Vectors of the class obtained after upgrading all ground floor clusters, and carries out polymerization hierarchical clustering to these classes obtained further in the Concept Vectors of each picture;
Third layer cluster, replaces to the vector of each picture the Concept Vectors of expansion, the Concept Vectors of the class obtained after upgrading all second layer clusters, and carries out polymerization hierarchical clustering to these Concept Vectors further.
6. the Web page picture returned search engine according to claim 5 carries out the system of entity cluster, it is characterized in that, the polymerization hierarchical clustering algorithm used utilizes the generalities of class to carry out the Similarity Measure of class, the generalities of class are by being added the Concept Vectors of the picture in class, and remove the concept that vectorial intermediate value is lower, obtain high-precision genus, the generalities of class define with following formula:
Wherein, c is concept, and C is class, and d is picture in class, and CF-IDF (c, d) is for concept is to the importance of picture.
7. the Web page picture returned search engine according to claim 5 carries out the system of entity cluster, it is characterized in that, third layer cluster carries out contextual expansion by wikipedia, the Concept Vectors of picture is replaced to the Concept Vectors of expansion, and upgrade the Concept Vectors of each class, be more newly defined as following formula:
Wherein,
for concept c is to concept c
iwikipedia the importance of the page, V are described
cfor the set of all concepts of current genus vector, c
ifor the concept in current genus vector, before context extension process is maximum by selected value, k concept is filtered noise data.
8. the Web page picture returned search engine according to claim 1 carries out the system of entity cluster, it is characterized in that, the genus vector drawn after utilizing described three strata classes, to the relevant description concept of each picture category mark, chooses the highest front several concept of the Concept Vectors intermediate value of each class for describing the entity representated by such.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410554684.XA CN104317867B (en) | 2014-10-17 | 2014-10-17 | The system that entity cluster is carried out to the Web page picture that search engine returns |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410554684.XA CN104317867B (en) | 2014-10-17 | 2014-10-17 | The system that entity cluster is carried out to the Web page picture that search engine returns |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104317867A true CN104317867A (en) | 2015-01-28 |
CN104317867B CN104317867B (en) | 2018-02-09 |
Family
ID=52373099
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410554684.XA Expired - Fee Related CN104317867B (en) | 2014-10-17 | 2014-10-17 | The system that entity cluster is carried out to the Web page picture that search engine returns |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104317867B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105279264A (en) * | 2015-10-26 | 2016-01-27 | 深圳市智搜信息技术有限公司 | Semantic relevancy calculation method of document |
CN105426925A (en) * | 2015-12-28 | 2016-03-23 | 联想(北京)有限公司 | Image marking method and electronic equipment |
CN106844336A (en) * | 2016-12-26 | 2017-06-13 | 博彦科技股份有限公司 | Data model processing method and processing device |
CN107408156A (en) * | 2015-03-09 | 2017-11-28 | 皇家飞利浦有限公司 | For carrying out semantic search and the system and method for extracting related notion from clinical document |
CN108780462A (en) * | 2016-03-13 | 2018-11-09 | 科尔蒂卡有限公司 | System and method for being clustered to multimedia content element |
CN109919175A (en) * | 2019-01-16 | 2019-06-21 | 浙江大学 | A kind of more classification methods of entity of combination attribute information |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090094020A1 (en) * | 2007-10-05 | 2009-04-09 | Fujitsu Limited | Recommending Terms To Specify Ontology Space |
CN101751439A (en) * | 2008-12-17 | 2010-06-23 | 中国科学院自动化研究所 | Image retrieval method based on hierarchical clustering |
CN102902821A (en) * | 2012-11-01 | 2013-01-30 | 北京邮电大学 | Methods for labeling and searching advanced semantics of imagse based on network hot topics and device |
CN103577537A (en) * | 2013-09-24 | 2014-02-12 | 上海交通大学 | Image sharing website picture-oriented multi-pairing similarity determining method |
-
2014
- 2014-10-17 CN CN201410554684.XA patent/CN104317867B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090094020A1 (en) * | 2007-10-05 | 2009-04-09 | Fujitsu Limited | Recommending Terms To Specify Ontology Space |
CN101751439A (en) * | 2008-12-17 | 2010-06-23 | 中国科学院自动化研究所 | Image retrieval method based on hierarchical clustering |
CN102902821A (en) * | 2012-11-01 | 2013-01-30 | 北京邮电大学 | Methods for labeling and searching advanced semantics of imagse based on network hot topics and device |
CN103577537A (en) * | 2013-09-24 | 2014-02-12 | 上海交通大学 | Image sharing website picture-oriented multi-pairing similarity determining method |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107408156A (en) * | 2015-03-09 | 2017-11-28 | 皇家飞利浦有限公司 | For carrying out semantic search and the system and method for extracting related notion from clinical document |
CN105279264A (en) * | 2015-10-26 | 2016-01-27 | 深圳市智搜信息技术有限公司 | Semantic relevancy calculation method of document |
CN105279264B (en) * | 2015-10-26 | 2018-07-03 | 深圳市智搜信息技术有限公司 | A kind of semantic relevancy computational methods of document |
CN105426925A (en) * | 2015-12-28 | 2016-03-23 | 联想(北京)有限公司 | Image marking method and electronic equipment |
CN105426925B (en) * | 2015-12-28 | 2019-03-08 | 联想(北京)有限公司 | Image labeling method and electronic equipment |
CN108780462A (en) * | 2016-03-13 | 2018-11-09 | 科尔蒂卡有限公司 | System and method for being clustered to multimedia content element |
CN106844336A (en) * | 2016-12-26 | 2017-06-13 | 博彦科技股份有限公司 | Data model processing method and processing device |
CN109919175A (en) * | 2019-01-16 | 2019-06-21 | 浙江大学 | A kind of more classification methods of entity of combination attribute information |
Also Published As
Publication number | Publication date |
---|---|
CN104317867B (en) | 2018-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9183281B2 (en) | Context-based document unit recommendation for sensemaking tasks | |
CN104317867A (en) | System for carrying out entity clustering on web pictures returned by search engine | |
Hindle et al. | Clustering web video search results based on integration of multiple features | |
EP1426882A2 (en) | Information storage and retrieval | |
WO2008073784A1 (en) | Web site structure analysis | |
GB2395808A (en) | Information retrieval | |
CN109815386B (en) | User portrait-based construction method and device and storage medium | |
Papadopoulos et al. | Image clustering through community detection on hybrid image similarity graphs | |
Trevisiol et al. | Retrieving geo-location of videos with a divide & conquer hierarchical multimodal approach | |
Nesi et al. | Ge (o) Lo (cator): Geographic information extraction from unstructured text data and Web documents | |
Ruocco et al. | A scalable algorithm for extraction and clustering of event-related pictures | |
Li et al. | Improving relevance judgment of web search results with image excerpts | |
Sergieh et al. | Geo-based automatic image annotation | |
WO2023057988A1 (en) | Generation and use of content briefs for network content authoring | |
Li et al. | Word2image: towards visual interpreting of words | |
Rome et al. | Towards a formal concept analysis approach to exploring communities on the world wide web | |
Gkoufas et al. | Suppl 1: Combining textual and visual information for image retrieval in the medical domain | |
CN106168947A (en) | A kind of related entities method for digging and system | |
Shchekotykhin et al. | AllRight: automatic ontology instantiation from tabular web documents | |
CN105279172A (en) | Video matching method and device | |
Cheung et al. | A shape-based searching system for industrial components | |
Kelm et al. | Multimodal geo-tagging in social media websites using hierarchical spatial segmentation | |
Naseer et al. | Wrapper Extraction and Integration using GNN | |
Zhou et al. | Automatic image annotation by using relevant keywords extracted from auxiliary text documents | |
Doulaverakis et al. | Exploiting visual similarities for ontology alignment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180209 Termination date: 20201017 |