CN104424296A - Query word classifying method and query word classifying device - Google Patents

Query word classifying method and query word classifying device Download PDF

Info

Publication number
CN104424296A
CN104424296A CN201310392966.XA CN201310392966A CN104424296A CN 104424296 A CN104424296 A CN 104424296A CN 201310392966 A CN201310392966 A CN 201310392966A CN 104424296 A CN104424296 A CN 104424296A
Authority
CN
China
Prior art keywords
query word
industry classification
industry
classification
leaf node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310392966.XA
Other languages
Chinese (zh)
Other versions
CN104424296B (en
Inventor
丘鹏
史可可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201310392966.XA priority Critical patent/CN104424296B/en
Publication of CN104424296A publication Critical patent/CN104424296A/en
Priority to HK15105059.0A priority patent/HK1204685A1/en
Application granted granted Critical
Publication of CN104424296B publication Critical patent/CN104424296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention provides a query word classifying method and a query word classifying device. The query word classifying method comprises the following steps of acquiring an inputted query word; generating a leaf industry category distribution vector which corresponds to the query word and comprises industry categories of at least one leaf node corresponding to the query word; determining the industry category of the leaf node of the query word according to category word vectors of the industry categories of various leaf nodes corresponding to the query word; and determining the industry category of the query word according to the determined industry category of the leaf node and a preset mapping relation of various industry categories. Various industry categories of various nodes correspond to corresponding category word vectors, and each category word vector comprises a plurality of category word. By the query word classifying method and the query word classifying device, an industry query result is accurate, and the query efficiency is high.

Description

Query word sorting technique and device
Technical field
The present invention relates to field of computer technology, particularly relate to a kind of query word sorting technique and device.
Background technology
Ecommerce utilizes computer technology, network technology and telecommunication technique, realizes the whole business procedure of electronization, digitizing and networking.Along with the development of ecommerce, arise at the historic moment in increasing E-commerce Specialty website.In E-commerce Specialty website of today, how carrying out industry-specific vertical segmentation is B2B(Business To Business) focus in market.
But on the one hand, user out of habit, often can not be linked to the industry market page on one's own initiative, cause the industry market page because of customer flow little, business development is not got up, cannot carry out effectively industry search; On the other hand, the search engine technique of existing use, when user input query word, only can return direct result corresponding to this query word, and trade division can not be carried out, as, during user's input " alcohol ", then directly return the content relevant to " alcohol ", and relevant trade division process can not be done according to " alcohol " this query word, thus cannot distinguish user want inquire about be medicinal alcohol, edible alcohol, or industrial alcohol, need user to distinguish further and accurately inquire about, cause inquiry inaccurate, search efficiency is low.
Visible, according to the query word of user, prior art cannot determine that user needs the industry of inquiring about, cause carrying out industry inquiry effectively, inquire about not accurate enough, search efficiency is low.
Summary of the invention
The invention provides a kind of query word sorting technique and device, cannot determine that user needs the industry of inquiring about according to the query word of user to solve prior art, cause carrying out industry inquiry effectively, inquire about not accurate enough, the problem that search efficiency is low.
In order to solve the problem, the invention discloses a kind of query word sorting technique, comprise: the query word obtaining input, generate the leaf industry classification distribution vector that described query word is corresponding, wherein, described leaf industry classification distribution vector contains the industry classification of at least one leaf node corresponding to described query word, all corresponding class term vector of industry-by-industry classification of node at different levels, and each described class term vector comprises multiple class term; According to the class term vector of the industry classification of each leaf node corresponding to described query word, determine the industry classification of the leaf node belonging to described query word; According to the industry classification of the described leaf node determined, according to the industry classification mapping relations at different levels pre-set, determine the industry classification at described query word place.
In order to solve the problem, the invention also discloses a kind of query word sorter, comprise: the first generation module, for obtaining the query word of input, generate the leaf industry classification distribution vector that described query word is corresponding, wherein, described leaf industry classification distribution vector contains the industry classification of at least one leaf node corresponding to described query word, the all corresponding class term vector of industry-by-industry classification of node at different levels, each described class term vector comprises multiple class term; Determination module, for the class term vector of the industry classification according to each leaf node corresponding to described query word, determines the industry classification of the leaf node belonging to described query word; First sort module, for the industry classification according to the described leaf node determined, according to the industry classification mapping relations at different levels pre-set, determines the industry classification at described query word place.
Compared with prior art, the present invention has the following advantages:
The present invention is by generating the leaf industry classification distribution vector corresponding with query word, include the industry classification of leaf node corresponding to one or more query word in this leaf industry classification distribution vector, industry-by-industry classification at different levels all has the class term vector comprising multiple class term; According to the relation of the class term vector of the industry classification of query word and leaf node, first determine the industry classification of the leaf node accurately belonging to query word; And then according to the industry classification mapping relations at different levels pre-set, to determine the industry classification at query word place.Wherein, industry classification mapping relations at different levels are divided into many levels from high to low, the bottom be the industry classification of leaf node, most top layer be then the industry classification of root node, the industry classification of one or more intermediate level node may be there is between the industry classification of leaf node and the industry classification of root node, also may not there is the industry classification of such intermediate node.Accurately determine query word should belonging to leaf node industry classification after, according to the sector classification mapping relations, the industry classification at any appropriate ways determination query word place can be adopted; And then, query word is categorized into the corresponding industry classification determined.Pass through the present invention, can determine that user needs the industry of inquiring about according to query word, and then provide the information that corresponds to query word in the sector and carry out inquiry in industry accurately for user, achieve industry inquiry effectively, and also improve search efficiency and the search experience of user; Meanwhile, from leaf node both the industry classification of accurate locating query word, make industry Query Result more accurate, search efficiency is also higher.
Accompanying drawing explanation
Fig. 1 is the flow chart of steps of a kind of query word sorting technique according to the embodiment of the present invention one;
Fig. 2 is the flow chart of steps of a kind of query word sorting technique according to the embodiment of the present invention two;
Fig. 3 is the flow chart of steps of a kind of query word sorting technique according to the embodiment of the present invention three;
Fig. 4 be embodiment illustrated in fig. 3 in a kind of generated query word and the schematic flow sheet of industry leaf classification corresponding relation;
Fig. 5 be embodiment illustrated in fig. 3 in the schematic diagram of industry classification mapping relations a kind of at different levels;
Fig. 6 is a kind of specific implementation process flow diagram embodiment illustrated in fig. 3;
Fig. 7 is the structured flowchart of a kind of query word sorter according to the embodiment of the present invention four;
Fig. 8 is a kind of system architecture schematic diagram using query word sorter to carry out industry-specific identification according to the embodiment of the present invention five;
Fig. 9 be embodiment illustrated in fig. 8 in the pullover schematic diagram of the industry-specific search of one.
Embodiment
For enabling above-mentioned purpose of the present invention, feature and advantage become apparent more, and below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation.
Embodiment one
With reference to Fig. 1, show the flow chart of steps of a kind of query word sorting technique according to the embodiment of the present invention one.
The query word sorting technique of the present embodiment comprises the following steps:
Step S102: the query word obtaining input, the leaf industry classification distribution vector that generated query word is corresponding.
Wherein, the leaf industry classification distribution vector that query word is corresponding contains the industry classification of at least one leaf node corresponding to this query word.Industry classification may have multistage node, and every grade of node may have multiple industry classification, and all corresponding class term vector of industry-by-industry classification of node at different levels, each class term vector comprises multiple class term.
In multistage industry classification, when multistage industry classification represents with tree structure, the leaf node of this tree structure is first degree node, and the industry classification of this grade of node is the industry classification of leaf node.The industry classification of a corresponding leaf node of query word possibility, also the industry classification of the corresponding multiple leaf node of possibility, the industry classification of this one or more leaf node forms leaf industry classification distribution vector corresponding to query word.
When determining the industry classification of the leaf node that query word is corresponding, by those skilled in the art according to actual conditions, with reference to existing query word processing mode, the query word of user's input can be processed and corresponds to the industry classification of corresponding leaf node.Certainly, if the query word of user's input compares specification, then directly can correspond to the industry classification of corresponding leaf node, and need not process query word.
Step S104: according to the class term vector of the industry classification of each leaf node corresponding to query word, determine the industry classification of the leaf node belonging to query word.
As previously mentioned, the industry classification of a corresponding multiple leaf node of query word possibility, query word is corresponded to exactly one or setting a few leaf node industry classification on, add on the one hand the accuracy of industry inquiry and classification, also can improve the efficiency of industry inquiry and classification on the other hand because of the minimizing of subsequent query and grouped data amount.
In the present embodiment, query word is carried out corresponding coupling and process operating to the class term in the class term vector of the industry classification of each leaf node, accurately can determine the industry classification of the leaf node that query word should belong to.As the industry classification of the leaf node belonging to the matching degree of query word and class term vector or similarity determination query word, or, weight is arranged to the class term in class term vector, by the weight of class term of mating with query word, industry classification determining the leaf node belonging to query word etc.
Step S106: according to the industry classification of the leaf node determined, according to the industry classification mapping relations at different levels pre-set, determines the industry classification at query word place.
Industry classification may have multistage, has corresponding relation between every grade, and the corresponding relation between at different levels defines the mapping relations of industry classification at different levels.
For the industry classification of each leaf node in leaf industry classification distribution vector, it has one or more levels corresponding higher level's industry classification.After determining the industry classification of the leaf node belonging to query word, higher level's industry classification that can be corresponding according to the industry classification of this leaf node, determine the industry classification at query word place, as after the industry classification determining the leaf node belonging to query word is " one-piece dress ", the immediate superior industry classification that " one-piece dress " this industry classification is corresponding is " women's dress ", and higher level's industry classification of " women's dress " this industry classification is " clothes ".Therefore, when the query word of user's input is " white dress ", if when determining that the industry classification of the leaf node belonging to it should be " one-piece dress ", then possible mode corresponds to an industry classification for root node as acquiescence, then can determine that the industry classification at " white dress " place is for " clothes ".Certainly, be not limited thereto, in actual applications, those skilled in the art can also formulate suitable rule according to actual conditions, to determine the industry classification at query word place, as, by setting gender-related rule, determine that the industry classification at " white dress " place is " women's dress ".
Pass through the present embodiment, generate the leaf industry classification distribution vector corresponding with query word, include the industry classification of leaf node corresponding to one or more query word in this leaf industry classification distribution vector, industry-by-industry classification at different levels all has the class term vector comprising multiple class term; According to the relation of the class term vector of the industry classification of query word and leaf node, first determine the industry classification of the leaf node accurately belonging to query word; And then according to the industry classification mapping relations at different levels pre-set, to determine the industry classification at query word place.Wherein, industry classification mapping relations at different levels are divided into many levels from high to low, the bottom be the industry classification of leaf node, most top layer be then the industry classification of root node, the industry classification of one or more intermediate level node may be there is between the industry classification of leaf node and the industry classification of root node, also may not there is the industry classification of such intermediate node.Accurately determine query word should belonging to leaf node industry classification after, according to the sector classification mapping relations, the industry classification at any appropriate ways determination query word place can be adopted; And then, query word is categorized into the corresponding industry classification determined.Pass through the present embodiment, can determine that user needs the industry of inquiring about according to query word, and then provide the information that corresponds to query word in the sector and carry out inquiry in industry accurately for user, achieve industry inquiry effectively, and also improve search efficiency and the search experience of user; Meanwhile, from leaf node both the industry classification of accurate locating query word, make industry Query Result more accurate, search efficiency is also higher.
Embodiment two
With reference to Fig. 2, show the flow chart of steps of a kind of query word sorting technique according to the embodiment of the present invention two.
The query word sorting technique of the present embodiment comprises the following steps:
Step S202: the corresponding relation of the industry classification of generated query word and leaf node.
The corresponding relation of the industry classification of query word and leaf node can generate in the following manner: the search click information obtaining user, wherein, search click packets of information is drawn together: the merchandise news that the query word information of user's input and user click according to Query Result; According to search click information, and the corresponding relation of classification belonging to commodity and commodity, the corresponding relation of acquisition query word and industry leaf classification.
When specific implementation, can in the following ways one of at least, the corresponding relation of the industry classification of generated query word and leaf node.
Mode one, obtain user search click logs, wherein, contain the search click information of multiple user in user search click logs, this search is clicked packets of information and is drawn together: category information belonging to the merchandise news that the query word information of user's input, user click according to Query Result and commodity; Statistical study is carried out to the search click information in user search click logs, obtains the corresponding relation between query word information and merchandise news, and the corresponding relation belonging to merchandise news and commodity between category information; According to the corresponding relation between query word information and merchandise news, and the corresponding relation belonging to merchandise news and commodity between category information, the corresponding relation of generated query word and industry leaf classification.
Mode two, obtains the search click information of user, and wherein, search is clicked packets of information and drawn together: the merchandise news that the query word information of user's input and user click according to Query Result; From commodity message header, obtain category information belonging to merchandise news and commodity, determine the corresponding relation of category information belonging to merchandise news and commodity; According to search click information, and the corresponding relation of category information belonging to merchandise news and commodity, obtain the corresponding relation of query word and industry leaf classification.
It should be noted that, the corresponding relation of the industry classification of query word and leaf node can be realized by the invention process person, also can be provided by third party, as generated by third party's Core Generator, or, store data by access third party and realize.
Step S204: the query word obtaining input, according to the corresponding relation of the industry classification of query word and leaf node, determine query word corresponding, the industry classification that meets at least one regular leaf node of setting; Use the industry classification of the leaf node determined, the leaf industry classification distribution vector that generated query word is corresponding.
Wherein, setting rule suitably can be arranged according to actual conditions by those skilled in the art, to select one or more or whole industry classifications in the industry classification from the multiple leaf nodes corresponding to query word.
Visible, the industry classification of at least one leaf node corresponding to query word is contained in leaf industry classification distribution vector, and for all level node and all industry classifications, the industry-by-industry classification of node at different levels is all to there being a class term vector, and each class term vector comprises multiple class term.Preferably, each class term in each class term vector all can arrange weight.
As, as user's input " Korea Spro's version chiffon skirt ", system, according to the corresponding relation of industry classification of the query word set up and leaf node, is corresponded to the industry classification of " one-piece dress ", " stock's women's dress ", " princess's skirt " and " Japan and Korea S's women's dress " four leaf nodes.Each industry classification is to there being a class term vector, and e.g., " one-piece dress " class has class term compositions such as " one-piece dress " " skirt " " chiffon skirt " " Korea Spro's versions " to correspond to the class term vector of " one-piece dress " now; The class term that " stock's women's dress " class has the class term such as " stock's women's dress ", " women's dress ", " skirt ", " one-piece dress ", " chiffon skirt ", " crock ", " slacks ", " female's suit " to form now and corresponds to " stock's women's dress " is vectorial; The class term that " princess's skirt " class has the class term such as " princess's skirt ", " one-piece dress ", " skirt ", " chiffon skirt " to form now and corresponds to " princess's skirt " is vectorial; The class term that " Japan and Korea S's women's dress " class has the class term such as " Japan and Korea S's women's dress ", " skirt ", " one-piece dress ", " chiffon skirt " to form now and corresponds to " Japan and Korea S's women's dress " is vectorial, and the class term vector that these industry-by-industry classifications are corresponding is set up by large data statistical approach and generated.By upper, the industry classification distribution vector of the leaf node corresponding with " Korea Spro's version chiffon skirt " can be { one-piece dress, stock's women's dress, princess's skirt, Japan and Korea S's women's dress }.
Preferably, each class term in class term vector can also arrange weight, as { one-piece dress, 0.5; Skirt, 0.3; Chiffon skirt, 0.6; Korea Spro's version, 0.2} etc.By arranging weight, can make industry classification more targetedly and tendentiousness, more meeting the common custom of user, thus the industry classification that query word is corresponding can be determined quickly.
In addition, query word can be user when using search engine, fills in the entry showing oneself to be intended in search box.But it should be noted that, be not limited to the mode of search box input, other input mode, as clicked corresponding entry, link etc., includes in the scope of the present invention's " input ".Step S206: according to the class term vector of the industry classification of each leaf node corresponding to query word, determine the industry classification of the leaf node belonging to query word.
A kind of mode is, carrying out pre-service to query word, is at least one searching keyword by its pre-service, then, these searching keywords are mated with the class term in the class term vector of industry classification, the industry classification of the leaf node belonging to matching degree determination query word.As, be " Korea Spro's version " and " chiffon skirt " two searching keywords by " Korea Spro's version chiffon skirt " pre-service, find after coupling, class term matching degree in the class term vector that " Korea Spro's version " and " chiffon skirt " is corresponding with " one-piece dress " is the highest, is about to the industry classification that " one-piece dress " is defined as the leaf node belonging to " Korea Spro's version chiffon skirt ".
Preferably, when class term has weight, a kind of feasible pattern is: carry out pre-service to query word, generates at least one searching keyword; At least one searching keyword described is used to mate the class term vector of the industry classification of each leaf node; According to the weight of the class term vector of the industry classification of each leaf node that the match is successful, determine that the industry classification of each leaf node corresponds to the weight of this query word; The industry classification of leaf node the highest for weight is defined as the industry classification of the leaf node belonging to query word.Still for " Korea Spro's version chiffon skirt ", suppose: each class term weight of the class term vector that " one-piece dress " is corresponding is { one-piece dress, 0.8; Skirt, 0.3; Chiffon skirt, 0.6; Korea Spro's version, 0.2}; Each class term weight of the class term vector of " stock's women's dress " is { stock's women's dress, 0.9; Women's dress, 0.8; Skirt, 0.6; One-piece dress, 0.6; Chiffon skirt, 0.3; Crock, 0.4; Slacks, 0.2; Female is set with, 0.3}; Each class term weight of the class term vector of " princess's skirt " is { princess's skirt, 0.9; One-piece dress, 0.7; Skirt, 0.5; Chiffon skirt, 0.4}; Each class term weight of the class term vector of " Japan and Korea S's women's dress " is { Japan and Korea S's women's dress, 0.9; Skirt, 0.8; One-piece dress, 0.8; Chiffon skirt, 0.6}.Adding up to example with the simplest weight, is after " Korea Spro's version " and " chiffon skirt " by " Korea Spro's version chiffon skirt " pre-service, and mate with the class term vector of the industry classification of each leaf node, result is: " one-piece dress " { chiffon skirt, 0.6; Korea Spro's version, 0.2}, " stock's women's dress " { chiffon skirt, 0.3}, " princess's skirt " { chiffon skirt, 0.4}, " Japan and Korea S's women's dress " { chiffon skirt, 0.6}, then after weighting, the weight that " one-piece dress " corresponds to " Korea Spro's version chiffon skirt " is 0.8, and the weight that " stock's women's dress " corresponds to " Korea Spro's version chiffon skirt " is 0.3, the weight that " princess's skirt " corresponds to " Korea Spro's version chiffon skirt " is 0.4, and the weight that " Japan and Korea S's women's dress " corresponds to " Korea Spro's version chiffon skirt " is 0.6.Through weighted ratio comparatively, industry classification the highest for weight i.e. " one-piece dress " can be defined as the industry classification of the leaf node belonging to " Korea Spro's version chiffon skirt ".It should be noted that, be only simple exemplary illustration to the process of weight above, in actual applications, those skilled in the art can carry out various forms of weight process to the weight of each class term obtained according to actual needs, to satisfy the demands, the present invention is not restricted this.
Step S208: according to the industry classification of the leaf node determined, according to the industry classification mapping relations at different levels pre-set, determines the industry classification at query word place.
The simplest mode of one is, along this leaf node industry classification step by step upwards, until meet the industry classification that imposes a condition.
One more preferably mode is, according to the industry classification mapping relations at different levels pre-set, determines the upper level industry classification that the industry classification of leaf node is corresponding; Obtain the weight that upper level industry classification corresponding to the industry classification of leaf node corresponds to query word; Judge that upper level industry classification corresponds to the weight of query word whether more than the weight of the second setting threshold value; If so, then weight is defined as the industry classification at query word place more than the upper level industry classification of the second setting threshold value; If not, then continue upwards to search, until find the industry classification of weight more than the second setting threshold value, or, until the industry classification of root node.Wherein, obtain mode that higher level industry classification corresponds to the weight of query word can adopt and in step S206, determine that the industry classification of leaf node corresponds to the identical mode of the weight of query word; Second setting threshold value is suitably set according to actual needs by those skilled in the art, and the present invention is not restricted this.
Another mode is, according to the matching degree of the category name of query word and industry-by-industry classification, carries out absolute value weighting; According to the result of absolute value weighting, and the industry classification mapping relations at different levels pre-set, determine the industry classification at query word place.
Such as, the query word of user's input is " industrial spirit ", then may return relevant classification " industrial spirit ", " medicinal alcohol ", " edible alcohol ".And the statistics mapping relations of the feature of originally classifying mainly query word and classification, and the text relevant of query word and classification (text relevant of query word and classification refers to and first carries out Chinese word segmentation and part-of-speech tagging to the Biao Ti attribute that class all commodity users now fill in, statistics obtain the mapping of classification to Chinese word segmentation Term vector.A kind of computing method of text relevant are the Chinese word segmentation vector and the cosine similarity of class now between Chinese word segmentation vector that calculate query word.), and do not consider the matching degree of query word and category name.As: " industrial spirit " query word mates completely with " industrial spirit " classification, and " medicinal alcohol ", " edible alcohol " only have semi-match, therefore, more weights can be increased to " industrial spirit " classification, and " medicinal alcohol " of semi-match, the suitable weights lacking some of " edible alcohol " increase.
A kind of formula increasing weights is as follows:
ContextWeight=FactorContext*2*/SameTW(SumQW+SumCW)
Wherein, SameTW represent the same section of query word and category name participle TermWeight weight and; SumQW represent the total weight of query word participle TermWeight and; SumCW represent the total weight of classification status word TermWeight and; FactorContext represents the weight of the absolute value that query word mates with category name, and this parameter according to the matching degree of query word and classification, can be revised by parameter adjustment system optimization algorithm effect process.
Another kind of mode is obtain the Chinese word segmentation markup information of query word; According to the industry classification mapping relations at different levels pre-set, and the Chinese word segmentation markup information of query word, the industry classification at query word place is determined according to setting rule.
Such as, the system in the present embodiment employs the participle instrument AliWS based on ecommerce Chinese vocabulary bank, and part-of-speech tagging instrument TermWeight.AliWS Chinese word segmentation instrument can by the title participle of query word, commodity, such as query word " chiffon one-piece dress " can be divided into " chiffon/one-piece dress ".TermWeight can to participle part-of-speech tagging, according to a upper example, " chiffon (qualifier)/one-piece dress (product word) ".By Chinese word segmentation and part-of-speech tagging, the main purpose of user can be analyzed, such as can identify query word with product word, main qualifier, and the information such as region zoning word, product type.To merchandise news participles such as commodity title, attributes, can optimized algorithm model, improve user view to the coupling of the commodity wanted and precision; TermWeight can compose power to the significance level of each participle simultaneously, in such as " chiffon one-piece dress ", " one-piece dress " is product core word, show the main purpose of user search commodity, even more important, then set " one-piece dress " weight be 150(this as hypothesis, actual weight size is according to program result changes in demand), " chiffon " weight is 60.
Another mode is, calculates the entropy of industry classification; According to entropy, and the industry classification mapping relations at different levels pre-set, determine the industry classification at query word place.
Entropy is the information entropy of industry classification at different levels, and different query words has different classification coverages.Such as: the entropy of entropy > " in men's style jeans " the corresponding industry classification of entropy > " men's clothing " the corresponding industry classification of " clothes " corresponding industry classification.Therefore, according to the entropy of each layer industry classification, the feature of which level can be recognized as query word.
In addition, it should be noted that, industry classification mapping relations at different levels can be realized by the invention process person, also can be provided by third party, as generated by third party's Core Generator, or, store data by access third party and realize.
When through above-mentioned steps S202-208, when the industry classification at the query word place determined satisfies the demands, then can by query word Direct Classification to the industry classification determined.And if when can not satisfy the demands, the such as weight of industry classification is less than the second setting threshold value, then following additional project can be adopted to proceed the determination of query word place industry classification.
Step S210: judge whether the industry classification at the query word place determined meets established standards, if meet, then query word is categorized into the industry classification determined, perform step S214; If do not meet, then perform step S212.
Wherein, established standards is suitably arranged according to actual conditions by those skilled in the art, and the present invention is not restricted this.
Preferably, that can pass through the industry classification judging query word place recalls the whether satisfied 3rd setting threshold value of number, and/or, judge whether the clicks of the industry classification at query word place meets the 4th setting threshold value, judges whether the industry classification at the query word place determined meets established standards.Wherein, the 3rd setting threshold value and the 4th setting threshold value suitably can be arranged according to actual conditions by those skilled in the art, and the present invention is not restricted this.
Step S212: the industry classification using the first recognition rule and/or the second recognition rule determination query word place, performs step S214.
Wherein, the first recognition rule be used for according to outside the industry classification of the leaf node in disleaf sub-industry classification distribution vector, the industry classification of other leaf node, determine the industry classification at query word place; Second recognition rule is used for according to the weight of query word in industry-by-industry classification, determines the industry classification at query word place.
Preferably, when using the industry classification at the first recognition rule determination query word place, can according to outside the industry classification of the leaf node in query word and disleaf sub-industry classification distribution vector, the class term vector of the industry classification of other leaf node, determine the industry classification at query word place.
Still for " Korea Spro's version chiffon skirt ", suppose except { one-piece dress, stock's women's dress, princess's skirt, Japan and Korea S's women's dress }, simultaneously, in " foreign trade skirt " classification, the weight of " Korea Spro's version ", " chiffon skirt " class term is larger, but in first by system because of some reason, do not appear in leaf industry classification vector above.When the final industry classification determined can not satisfy the demands, then can using the first recognition rule, as set a recognition threshold, recalling in the leaf industry classification vector of query word " Korea Spro's version chiffon skirt " to help " foreign trade skirt " classification.And then, with " foreign trade skirt " for the industry classification of leaf node determined, according to the industry classification mapping relations at different levels pre-set, finally determine the industry classification at query word place.
Preferably, when using the industry classification at the second recognition rule determination query word place, class term vector that can be corresponding according to industry-by-industry classification, from the industry classification of root node to the industry classification of leaf node, determine whether the weight of query word in industry-by-industry classification is more than or equal to the first setting threshold value step by step, industry classification corresponding for the weight being more than or equal to setting threshold value is defined as the industry classification at query word place.
Still for " Korea Spro's version chiffon skirt ", when use the second recognition rule, first set up the relation of industry class word now, under such as secondary industry classification " women's dress ", comparatively secondary industry " men's clothing ", " children's garment " are large for the weight of " Korea Spro's version ", " chiffon skirt ".Use Bayes classifier, when determining that query word is " Korea Spro's version chiffon skirt ", recognize the probability of " women's dress ", " men's clothing ", " children's garment " respectively.Set a recognition threshold, when making the probability of query word and industry classification be greater than this threshold value, build one " bridge ", think that query word recognizes this industry classification, when avoiding bottom-up, because click the biased problem causing identification error.Such as, the industry classification of root node is " clothes ", the industry classification of the two-level node under root node is respectively " women's dress ", " men's clothing " and " children's garment ", use Bayes classifier, when determining that query word is " Korea Spro's version chiffon skirt ", recognize the probability difference 0.6,0.9 of " clothes ", " women's dress ", " men's clothing ", " children's garment ", 0.2,0.3.Setting identifies that difference is 0.8, then as user's input " Korea Spro's version chiffon skirt ", search downwards from root node, when to search to probability be " women's dress " of 0.9, think that it satisfies condition, the industry classification at " Korea Spro's version chiffon skirt " place should be " women's dress ".For another example, because the women's dress offer of website is many, transaction, the click data of website mainly focus on women's dress.Such as " short-sleeve T-shirt " uses the process need of step S202-208 to be first identified as leaf node and industry classification " woman style short-sleeve T-shirt ", then by the mapping relations of industry classification at different levels, could finally be identified as " clothes ".And by the query word built above " bridge " to industry classification, " short-sleeve T-shirt " directly can skip leaf node and industry classification, recognizes one-level industry classification " on clothes ", thus avoids classification biasing problem.
In addition, when class term is provided with weight, can also according to the matching result of the class term in query word and each class term vector, the training data of the artificial mark of contrast, the weight of the class term in each class term vector is adjusted, to make the weight of class term more can reflect actual classification results, and then it is more accurate that subsequent query word is classified.
Step S214: use the industry-specific search of setting industry-specific Search Results that is pullover or setting to show template, the industry class Query Result now at display query word place.
This step is optional step, does not need in some applications to show Query Result.
But industry-specific Search Results that is pullover by the industry-specific search of setting or setting shows template for displaying Query Result, while providing user to inquire about in industry accurately, also promote the development of the industry market page.
By the present embodiment, determine that user needs the industry of inquiring about according to query word, achieve industry inquiry effectively, and also improve search efficiency and the search experience of user; Meanwhile, from leaf node both the industry classification of accurate locating query word, make industry Query Result more accurate, search efficiency is also higher
Embodiment three
With reference to Fig. 3, show the flow chart of steps of a kind of query word sorting technique according to the embodiment of the present invention three.
The query word sorting technique of the present embodiment comprises the following steps:
Step S302: generate and preserve the corresponding relation of the industry classification of query word and leaf node.
A kind of mode is generated by the search click information in user search click logs and preserves the corresponding relation of the industry classification of query word and leaf node.Under this mode, need the setting getting record click behavior ready in the front end of searched page, e.g., this is user search click directly can to increase a Parametric Representation in the URL clicking merchandise news, classification navigation.Front end searched page can obtain category information belonging to the user profile of this page, query word information, merchandise news and commodity.Finally, these information combination are become the data of format specification, be kept in data warehouse, as user search click logs.Wherein, format specification can arrange adaptability setting, to enable information normal storage in data warehouse by those skilled in the art according to the database storage used.For the user search click logs of mass data, distributed computing system can be used to preserve large-scale user search click logs data, to guarantee that calculation task can complete on time.
Then, use the user search click logs stored, the OfferId(commodity sign (a kind of information in merchandise news) clicked after all users are inquired about) with the Offer(supply of commodities of website) information table (the displaying leaf classification containing Offer in Offer information table and issue leaf classification) is connected, can be obtained the classification belonging to commodity of commodity sign instruction by Offer information table, and then statistics Offer table obtains the mapping relations of query word to the industry classification of leaf node.The mapping relations generated according to this kind of mode are also referred to as the mapping relations generated according to click feature.
Another kind of mode is, by search click information and the merchandise news title of user, generates and preserves the corresponding relation of the industry classification of query word and leaf node.As, for the Alibaba Website, the important word such as product, attribute and respective weights in Offer title can be excavated by Alibaba's Chinese word segmentation modular system, set up industry classification to Term(product, attribute etc.) mapping relations, and then according to search click information set up the mapping relations of query word to industry classification.The mapping relations generated according to this kind of mode are also referred to as the mapping relations generated according to text relevant feature.
A kind of flow process using the industry classification corresponding relation of above-mentioned two kinds of mode generated query words and leaf node as shown in Figure 4, comprising: the Website front-end page receive multiple user search or after clicking input, generate search click data; Then, this search click data is saved to data warehouse; After distributed computing system obtains the search click data of user from data warehouse, multiple computing node (in figure, example is 3) is used to process, generated query word grouped data, also, the data of the corresponding relation of the industry classification of query word and leaf node; And then, set up dictionary index online service, use so that follow-up and search.
Step S304: generate and preserve industry classification mapping relations at different levels.
Industry classification for ecommerce is built, and a tree structure can be used to generate and preserve industry classification mapping relations at different levels, also can be called that industry classification is set.Claim bottom classification to be the industry classification of leaf node, from upper several which layer with regard to what classification, one-level classification is the industry classification of top layer classification and root node.As shown in Figure 5, as can be seen from Fig. 5, totally three layers, the sector classification tree, comprising: top layer classification: " clothes " a kind of industry classification tree of ecommerce; Secondary classification: " women's dress ", " men's clothing " and " children's garment "; Bottom industry classification corresponds to " women's dress ": " one-piece dress, woman style sweater, chiffon skirt ", and what correspond to " men's clothing " is " in men's style jeans, shirts ", and what correspond to " children's garment " is " virgin trousers with braces and virgin skirt ".Also namely, " one-piece dress, woman style sweater, chiffon skirt " is upwards mapped to " women's dress ", and " in men's style jeans, shirts " is upwards mapped to " men's clothing ", and " virgin trousers with braces and virgin skirt " is upwards mapped to " children's garment "; And " women's dress ", " men's clothing " and " children's garment " are all upwards mapped to " clothes ".
Industry classification belonging to commodity is multilayer, and this number is according to being safeguarded by the website optimize personnel of specialty.Industry classification tree is stable, and adding, delete and revise category information needs unified operational administrative.
Set by industry classification, the corresponding relation of the industry classification ownership industry of concrete commodity can be determined.Wherein, fuzzy for some, controversial industry classification attaching problem, can use the data of the merchandise news clicked when classification browses commodity based on user, and statistics determines the industry that classification belongs to.
In addition, each classification in industry classification tree all has class term vector, and the class term in each class term vector all can arrange weight.
Step S306: query word is classified.
In the present embodiment, query word classification is embodied as by some features and sorting algorithm, obtains the process of the classification distribution of Weight.
A kind of realization flow of classifying to query word as shown in Figure 6, comprising:
Step S3062: input inquiry word.
In the present embodiment, by the WEB page input inquiry word of Website front-end, the corresponding function module of background server end can be called by HTTP mode, carries out follow-up query word sort operation.As, call by HTTP mode the industry-specific recognizer module be deployed on the Query Process platform of Alibaba and realize follow-up query word sort operation.
Step S3064: determine the class term in the class term vector mated with query word.
Step S3066: the leaf industry classification distribution vector that generated query word is corresponding.
In the present embodiment, click the mode that statistics adopts " distribution of query word-> leaf industry classification ", in each leaf industry classification distribution vector, the weight of each leaf node industry classification is class term weight sum.
Step S3068: the industry classification determining the leaf node belonging to query word, upwards maps according to industry classification tree.
Step S30610: the industry classification determining query word place.
Query word finally being needed to determine, on that one-level industry classification, can adopt two kinds of basic modes: mode one, based on the result basis marked at query word Chinese word segmentation, determining by setting rule; Mode two, is input to parameter adjustment module by the industry classification weight vectors of each layer, determines that can query word in current layer confidence.
Such as, according to the result of Chinese part-of-speech tagging, regulate the weight of classification, industry.Contain sex word " woman style " in such as " woman style T-shirt ", therefore can fall weight to inconsistent to query word " men's clothing " relevant classification in last classification, industry weight.
In addition, can also in the following ways, as:
Mode three, determines according to the click entropy of query word to industry classification, that is: calculate the entropy of leaf industry classification distribution vector corresponding to query word, as the feature identifying industry classification level; According to the industry classification at this entropy determination query word place.
Mode four, determines according to query word and industry category name text associated weight, that is: corresponding to query word leaf industry classification class vector carries out absolute value weighting, and increase query word and industry classification are in the literal matching degree of name; According to the industry classification at weighted results determination query word place.
Mode five, determines the mode of place industry classification for low frequency query word, as follows:
Low frequency query word refers in the past (as 180 days) in a period of time, clicks and is less than set point number as the query word of 20 times.
When user uses search engine, often can use some query words do not occurred in user search click logs, therefore lack the basis of classification feature that user clicks.Further, search number of clicks is less than the query word (as 20 times) of set point number, can not confidence completely for simple click classifications.Such as, certain user search " Korea S's Dongdaemun surplus cultivate one's moral character one-piece dress ", in search click logs, have recorded this user click 3 Offer to " Bohemia's longuette ", if only use this click data as characteristic of division, and query word is not expanded, many correct classifications can be reduced and recall.Therefore, need to convert former query word.
Below two kinds of exemplary algorithm to query word conversion:
Algorithm one: KNN Query, i.e. KNN(K-Nearest Neighbor) query word algorithm, be the classification of K arest neighbors, be classified the individual nearest neighbours of classification of K that object is concentrated at training data, be used for instructing the classification being classified object.By the search click logs of user to Offer, the fall row of query word to Term can be set up out, use point term vector of former query word and other query words vector COS distance as contiguous categorized data set.Then, the setting number that comprehensive former query word is adjacent, as front ten query word classification results, as the classification results of former query word.And then, map in industry classification tree according to this classification results, determine the industry classification at query word place.
Wherein, Term represents Chinese word segmentation, obtains the Weight of multiple Term, the vector of part of speech after query word is carried out participle.Contiguous categorized data set is that a Chinese word segmentation Term is to corresponding relation that is relevant to it, that occurred the query word vector of user search click behavior.The vector of this contiguous categorized data set is also Weight.
Algorithm two: contracting word converts, e.g., use natural language processing Chinese word segmentation two side storehouse TermWeight, each participle weight given, removes the participle that weight is minimum, judges whether the query word after converting is high frequency words.If so, the query word classification results after then converting using this is as the classification results of former query word; Otherwise continue to remove the minimum participle of next weight, repeat this step until participle number is 1.And then, map in industry classification tree according to this classification results, determine the industry classification at query word place.
Wherein, the Chinese word segmentation vector of query word is that band gives weight according to the significance level of participle, when a query word of user's input does not search for click behavior in the past, then can remove the minimum participle of weight one by one, be combined into new query word.This new query word going word process to be continued until and be combined into hits appearance (namely having corresponding classification vector) at user search point.Then the classification vector using new query word corresponding is as the classification vector of this user input query word.
It should be noted that, above two kinds of mapping algorithms are only exemplary illustration, in actual applications, and the mapping algorithm that those skilled in the art can also adopt other suitable according to actual conditions.
Mode six, adopts natural language processing, e.g., uses Alibaba's Chinese word segmentation instrument AliWS participle and TermWeight attribute labeling, composes power.Such as, after query word participle, the method such as KNN, the conversion of contracting word can convert, under guarantee algorithm effect prerequisite, transform to data-intensive query word, increase the accuracy of query word classification.And then, determine the industry classification at this data-intensive query word place.
In the present embodiment, adopt the mode one after specializing, determine the industry classification at query word place.Also namely, one or multistage higher level's industry classification that the industry classification result that query word Chinese word segmentation marks being embodied as leaf node corresponding to query word maps; Setting rule is embodied as and judges whether have weight to exceed the industry classification of the weight of setting threshold value (also referred to as peak value) in the industry classifications at different levels after mapping, if having, then the sector classification to be defined as the industry classification at query word place; Otherwise, other suitable rule is then used to determine the industry classification at query word place, the industry classification that industry classification entropy is less than the entropy threshold of setting is there is as judged whether, or, judge whether to there is the maximum classification of weight and second poor industry classification being greater than regulation weight difference limen value of classification weight, or, be mapped to root industry classification etc. always.Wherein, above-mentioned each threshold value suitably can be arranged according to actual conditions by those skilled in the art.
Wherein, the classification vector that query word is corresponding is collated according to weights.Even if the classification that second classification weights is second largest, use the difference of the weights of first classification and the weights of second classification, can judge whether first classification is peak value classification (difference is enough large).
After adopting the mode one after specializing, this step in the present embodiment is judge whether query word has weight to exceed the classification of peak value weight in the industry classifications at different levels mapped, if having, then this classification is defined as the industry classification at query word place; If no, then adopt the Else Rule of setting, as above-mentioned according to the rule of entropy threshold or weight difference limen value determination industry classification, or be mapped to the rule of root industry classification always, determine the industry classification at query word place.
Step S30612: according to the industry classification at query word place, determines multilayer classification structure, exports Query Result.
The output of Query Result can adopt query word recognize the distribution of multilayer industry classification and represent that confidence is in the integer of which layer industry classification, as, return four layers of classification: the industry classification of leaf node, the industry classification of two-level node, the industry classification of first nodes, BU(Business Unit).
On the basis that above-mentioned query word is classified, can also operation parameter regulate, as operation parameter adjustment module, regulating query word to classify each feature may on the impact of classification results, the effect of optimized algorithm.
Below, the realization that parameter regulates is illustrated.
Parameter regulates needs 3 preconditions: (1) needs the artificial mark query word grouped data of distribution similar to actual search engine; (2) can the automated procedures of auto modification sorting parameter configuration item; (3) the machine learning algorithm module of linear regression problem can be processed.
Wherein,
About artificial mark query word grouped data, operation personnel can provide the grouped data of specialty, with the distribution making the sampling of query word meet target of prediction i.e. real space.As, can by the search traffic log of history, using searching times as weight, random selecting 1000 sample datas, will meet real space prerequisite with the sampling meeting machine learning.First by manual type by the classification of the classification of each level belonging to each query word mark, such as, query word " princess's skirt ", is categorized as " consumer goods-clothes-women's dress-one-piece dress " from top to bottom.
According to the business demand of query word classification, the problem that parameter regulates can be divided into two classes: the first parameter regulates problem, that (such as " princess's skirt " statistically can be associated with a Weight classification distribution vector in click logs---" one-piece dress ", " chiffon skirt ", " lace skirt " etc.) is distributed for the classification of every one deck, need the proportional roles allocating multiple linear feature (linear feature refers to two pairwise uncorrelateds between feature), to reach the optimum on algorithm effect.Use linear regression can solve this type of mathematical problem.The second parameter regulates problem, is the recognition threshold CateThre regulating each level classification, to determine that query word has specifically recognized this range upon range of mountains value classification.
Problem is regulated for the first parameter,
Automated procedures can according to query word classification results, the training data of the artificial mark of contrast, the scoring that the parameter calculating this regulates, scoring attach most importance to accuracy rate F average (namely in following formula γ value suitable be greater than 1), demand γ according to business can regulate (artificial micro-judgment accordingly, be deflection recall rate or deflection accuracy rate, in general a kind of business demand, γ can not change, and does not therefore belong to the process of machine learning in parameter learning).
Wherein, F mean value formula is as follows:
F γ = ( 1 + γ ) . precision ( γ 2 . precision ) + recall
In above-mentioned F mean value formula, each meaning of parameters is as follows:
Precision: represent that query word is assigned to accuracy rate---the correct sample number/total number of samples of classifying of correct classification and level by native system;
Recall: represent the recall rate that native system is classified to sample---there is sample number (recognizing classification or certain industry)/total number of samples of classification results
γ: γ when being 1, and accuracy rate is identical to F average weighing factor with recall rate.γ is larger, and the weight that accuracy rate affects F average is larger.Native system focuses on the accuracy rate of classification, and therefore γ is a value being greater than 1.
In computation process, for a set of feature weight of parameter regulation output, can obtain out under training sample, this cover accuracy rate of feature weight and recall rate, and finally retain F average maximum time that cover parameter.
Solving the first parameter to use linear regression below regulates problem to be described.
Linear regression is defined as:
Y i01X i12X i2+…+β pX ip,+ε i,i=1,...,n
To a random sample (Y i, X i1..., X ip) ,=1 ..., n, a linear regression model (LRM) hypothesis returns sub-Y iwith regressor X i1..., X ipbetween relation may be faulty.Add an error term ε i(being also a stochastic variable) catches except X i1..., X ipoutside any to Y iimpact.So a multiple linear regression model is expressed as above form.The observed reading distinguishing these variablees of random sum is very important.As a rule, observed reading or data (with lowercase token) include n value (y i, x i1..., x ip), i=1 ..., n.
There is p+1 parameter p 0..., β pneeding to determine, in order to estimate these parameters, using matrix Y=X β+ε token to be very useful.
Wherein, Y is one and includes observed reading Y 1..., Y ncolumn vector, ε includes the random component ε do not observed 1..., ε nand the observation matrix X of regressor:
Use matrix representation X is:
X generally includes a constant term.
If there is linear correlation between X row, that petty parameter vector β just can not be limited except non-beta with Least Square Method, such as requires that its some element sums are 0.
The initial object of regretional analysis is that the parameter of estimation model is to reach the best-fit to data.Among the various criterion determining a best-fit, least square method is very superior.This estimation can be expressed as:
β ^ = ( X T X ) - 1 X T y
The program of an automatic learning is realized by code through engineering approaches.Xi is i-th above-mentioned feature, β i is the weight of i-th feature, and Y is that the assessment result F mean vector of a combination parameter is (in above formula ).
Problem is regulated for the second parameter,
The second parameter regulates problem, is the recognition threshold CateThre regulating each level classification, to determine that query word has specifically recognized this range upon range of mountains value classification.Recognition sequence is bottom-up, first judges leaf classification, then arrives secondary, one-level, is finally BU.Such as query word " princess's skirt " obtains leaf industry classification distribution vector and weight is [" one-piece dress ": 0.5, " chiffon skirt ": 0.2, " lace skirt ": 0.2, " lace skirt ": 0.1], the industry classification threshold value LeafCateThre=0.4 of given identification peak value leaf node, has therefore recognized industry classification " one-piece dress ", looks for toward upper level according to industry classification tree, given peak value secondary classification SecondCateThre=0.5, has recognized " women's dress ".
Therefore, the peak value classification threshold value CateThre of every one deck has a great impact final algorithm effect.The method enumerated can be used, approach best algorithm effect.Rule of thumb, the threshold range of CateThre is in [0.3,0.7] scope.Within the scope of this, enumerate step-length is 0.01 at every turn, obtains 40 different CateThre.In the query word categorizing system of the present embodiment, have 4 layers of classification: LeafCate, SecondCate, FirstCate, BUCate.Combination obtains 40*40*40*40=2560000 peak value classification parameter configuration.Use Distributed Computing Platform, calculate all parameter configuration combinations soon, choose the combination that assessment F average is maximum, remain on line and use.
Regulate according to parameter described above, through successive ignition, one can be obtained and be enclosed within optimized algorithm parameter configuration in sample space.
Visible, regulated by above-mentioned parameter, when disaggregated model exists multiple parameter, by adjusting the value of parameter, making model closer to the actual distribution of data, reaching algorithm optimal effectiveness.
Step S308: use the pullover or industry-specific Search Results of the industry-specific search of setting to show that template shows query word classification results.
When using industry-specific search pullover displaying query word classification results, after searched page gets the industry category information at query word place, call the pullover template of search of different industries, and the navigation classification on hurdle of selecting choices from menus in a precise manner.Such as, user is after search one-piece dress, and searched page can be jumped to " dress and personal adornment " market research page.And choose " women's dress ", the leaf classification below it is opened, and Search Results also only there will be " clothes dress clothes decorations – women's dress " class merchandise news now.
Appear at the key position in the page because pullover, and have the effect of screening search results, the accuracy rate that therefore industry-specific identification requirement is very high, to avoid the bad impact of Consumer's Experience.
When using industry-specific Search Results to show that template shows query word classification results, can control Search Results List exhibition method, for the Search Results of different industries, the focus of user is generally different.Such as, clothing commercial articles searching result, user is more prone to browsing pictures.Therefore, use large figure mode to show Search Results, have better search experience.And industrial goods industry user more attribute of interest, price and the place of production, then show the association attributes of product, price and the place of production.
By the present embodiment, achieve in ecommerce, use query word classification and industry classification tree, classification industry map make search industry-specific; In the corresponding multilayer classification of query word, Chinese word segmentation mark and parameter adjustment module is used to determine which layer classification query word specifically belongs to; Trade information is used to be applied to the application such as pullover, the industry-specific Search Results exhibition method of industry-specific search.Present embodiments provide the standard of regulation classification tree and industry mapping format, the classification that can be extended to other is easily set.For the different requirements that multiple classification tree and different business are classified to query word, provide the weight that parameter adjustment module linearly allocates multiple feature of vying each other, allow machine automatic learning go out a set of parameter being adapted to multiple requirement, achieve the high reusability of algorithmic code.
Embodiment four
With reference to Fig. 7, show the structured flowchart of a kind of query word sorter according to the embodiment of the present invention four.
The query word sorter of the present embodiment comprises: the first generation module 402, for obtaining the query word of input, the leaf industry classification distribution vector that generated query word is corresponding, wherein, leaf industry classification distribution vector contains the industry classification of at least one leaf node corresponding to query word, the all corresponding class term vector of industry-by-industry classification of node at different levels, each class term vector comprises multiple class term; Determination module 404, for the class term vector of the industry classification according to each leaf node corresponding to query word, determines the industry classification of the leaf node belonging to query word; First sort module 406, for the industry classification according to the leaf node determined, according to the industry classification mapping relations at different levels pre-set, determines the industry classification at query word place.
Preferably, each class term that each class term vector comprises is provided with weight; The query word sorter of the present embodiment also comprises: the second sort module 408, for at the industry classification of the first sort module 406 according to the leaf node determined, according to the industry classification mapping relations at different levels pre-set, after determining the industry classification at query word place, judge whether the industry classification at query word place meets established standards; If meet, then query word is categorized into the industry classification determined; If do not meet, then use the industry classification at the first recognition rule and/or the second recognition rule determination query word place, wherein, first recognition rule be used for according to outside the industry classification of the leaf node in disleaf sub-industry classification distribution vector, the industry classification of other leaf node, determine the industry classification at query word place; Second recognition rule is used for according to the weight of query word in industry-by-industry classification, determines the industry classification at query word place.
Preferably, second sort module 408 is when the industry classification at use first recognition rule and/or the second recognition rule determination query word place: according to outside the industry classification of the leaf node in query word and disleaf sub-industry classification distribution vector, the class term vector of the industry classification of other leaf node, determine the industry classification at query word place; And/or, the class term vector corresponding according to industry-by-industry classification, from the industry classification of root node to the industry classification of leaf node, determine whether the weight of query word in industry-by-industry classification is more than or equal to the first setting threshold value step by step, industry classification corresponding for the weight being more than or equal to setting threshold value is defined as the industry classification at query word place.
Preferably, determination module 404 comprises: generation module 4042, for carrying out pre-service to query word, generates at least one searching keyword; Matching module 4044, for the class term vector using at least one searching keyword to mate the industry classification of each leaf node; Weight module 4046, for the weight of the class term vector of the industry classification according to each leaf node that the match is successful, determines that the industry classification of each leaf node corresponds to the weight of query word; Leaf node determination module 4048, for being defined as the industry classification of the leaf node belonging to query word by the industry classification of leaf node the highest for weight.
Preferably, the first sort module 406 comprises: higher level's classification module 4062, for according to the industry classification mapping relations at different levels pre-set, determines the upper level industry classification that the industry classification of leaf node is corresponding; Higher level's weight module 4064, the upper level industry classification that the industry classification for obtaining leaf node is corresponding corresponds to the weight of query word; Judge module 4066, for judging that upper level industry classification corresponds to the weight of query word whether more than the weight of the second setting threshold value; Execution module 4068, if be yes for the judged result of judge module 4066, is then defined as the industry classification at query word place more than the upper level industry classification of the second setting threshold value by weight; If the judged result of judge module 4066 is no, then continue upwards to search, until find the industry classification of weight more than the second setting threshold value, or, until the industry classification of root node.
Preferably, the query word sorter of the present embodiment also comprises: corresponding relation generation module 410, for the corresponding relation of the industry classification of generated query word and leaf node, comprise: the search click information obtaining user, wherein, search click packets of information is drawn together: the merchandise news that the query word information of user's input and user click according to Query Result; According to search click information, and the corresponding relation of classification belonging to commodity and commodity, the corresponding relation of the industry classification of acquisition query word and leaf node.
Preferably, corresponding relation generation module 410 is when obtaining the search click information of user: obtain user search click logs, wherein, contain the search click information of multiple user in user search click logs, search is clicked packets of information and is drawn together: category information belonging to the merchandise news that the query word information of user's input, user click according to Query Result and commodity; Corresponding relation generation module 410 is according to described search click information, and the corresponding relation of classification belonging to commodity and commodity, when obtaining the corresponding relation of industry classification of described query word and described leaf node: statistical study is carried out to the search click information in user search click logs, obtain the corresponding relation between query word information and merchandise news, and the corresponding relation belonging to merchandise news and commodity between category information; According to the corresponding relation between query word information and merchandise news, and the corresponding relation belonging to merchandise news and commodity between category information, the corresponding relation of the industry classification of generated query word and leaf node.
Preferably, corresponding relation generation module 410 according to search click information, and the corresponding relation of classification belonging to commodity and commodity, when obtaining the corresponding relation of the industry classification of query word and leaf node: obtain category information belonging to merchandise news and commodity from commodity message header, determine the corresponding relation of category information belonging to merchandise news and commodity; According to search click information, and the corresponding relation of category information belonging to merchandise news and commodity, obtain the corresponding relation of the industry classification of query word and leaf node.
Preferably, the first generation module 402, for obtaining the query word of input, according to the corresponding relation of the industry classification of query word and leaf node, determine query word corresponding, the industry classification of at least one leaf node of meeting setting rule; Use the industry classification of the leaf node determined, the leaf industry classification distribution vector that generated query word is corresponding.
Preferably, the query word sorter of the present embodiment also comprises: adjusting module 412, for the matching result according to the class term in query word and each class term vector, the training data of the artificial mark of contrast, adjusts the weight of the class term in each class term vector.
Preferably, second sort module 408 is when judging whether the industry classification at query word place meets established standards: what judge the industry classification at query word place recalls the whether satisfied 3rd setting threshold value of number, and/or, judge whether the clicks of the industry classification at query word place meets the 4th setting threshold value.
The query word sorter of the present embodiment is used for realizing corresponding query word sorting technique in aforementioned multiple embodiment of the method, and has the beneficial effect of corresponding embodiment of the method, does not repeat them here.
Embodiment five
With reference to Fig. 8, show a kind of system architecture schematic diagram using query word sorter to carry out industry-specific identification according to the embodiment of the present invention five.
The system architecture that the use query word sorter of the present embodiment carries out industry-specific identification mainly comprises:: Webpage search front end SearchWeb, query word analysis platform Query Process and industry-specific Web page module.
Wherein, in query word analysis platform Query Process, be provided with industry-specific identification module (there are all functions of the query word sorter in embodiment four), make industry-specific identification supply to call service on platform as a module.The industry-specific webpage (comprise industry-specific search industry-specific Search Results that is pullover and/or setting and show template) that industry-specific Web page module provides is different with traditional ecommerce webpage, searched page exhibition method is more professional, also can respond fast for industry requirement.
For a user search, first obtain query word information by SearchWeb, then call the industry-specific identification module of QueryProcess platform.Industry-specific identification module can calculate industry belonging to relevant industry classification and query word according to previously mentioned query word grouped data, is also the industry classification at query word place, then returns to SearchWeb.Afterwards, SearchWeb can using industry-specific for query word data as a search parameter, calling search engine and industry-specific Web page module.Search engine and industry-specific Web page module can, according to the demand in downstream, use industry-specific parameter to complete a series of application.
Such as, industry-specific webpage can draw the pullover page of industry-specific search.As shown in Figure 9.Now, after SearchWeb gets trade information, call the pullover template of search of different industries, and the navigation classification on hurdle of selecting choices from menus in a precise manner.Such as, in Fig. 9, if after user search one-piece dress, SearchWeb can be jumped to " dress and personal adornment " market research page.Further, choose " women's dress ", the leaf classification below it is opened, and Search Results also only there will be " clothes dress clothes decorations – women's dress " class merchandise news now.Appear at the key position in the page because pullover, and have the effect of screening search results, the accuracy rate that therefore industry-specific identification requirement is very high, to avoid the bad impact of Consumer's Experience.
For another example, industry-specific webpage can draw Search Results List exhibition method displaying Query Result.For the Search Results of different industries, the focus of user is generally different.Such as, clothing commercial articles searching result, user is more prone to browsing pictures, therefore uses large figure mode to show Search Results, has better search experience.And industrial goods industry user more attribute of interest, price and the place of production, then the association attributes of display of commodity, price and place of production information, have better search experience.
Pass through the present embodiment, achieve the industry-specific identification of the query word to user's input, can determine that user needs the industry of inquiring about according to query word, and then provide the information that corresponds to query word in the sector and carry out inquiry in industry accurately for user, improve the search experience of user; Meanwhile, this mode also effectively can promote the development of the industry market page.
Pass through the solution of the present invention, can when a user search, the trade information recognized is supplied to webpage front end and search engine sequence end, to determine the exhibition method of merchandise news list page, the navigation of search page top, and as the feature that search engine sorts.By the query word classification inputted at search box user, user is directed to corresponding industry market.A kind of scheme provided in the embodiment of the present invention, the structure of the query word hierarchical classification of diagram data can be clicked based on search daily record, and design a kind of algorithm, corresponding leaf industry classification distribution vector can be calculated according to click data to all query words, then use e-commerce website industry classification structured data, query word is divided into correct industry market.Further, present invention also offers the standard of regulation classification tree and industry mapping format, the classification that can be extended to other is easily set.For the different requirements that multiple classification tree and different business are classified to query word, provide the weight that parameter adjustment module linearly allocates multiple feature of vying each other, machine automatic learning is allowed to go out a set of parameter being adapted to multiple requirement, the high reusability of implementation algorithm code.
By the present invention, achieve in ecommerce, use query word classification and classification tree, classification industry Mapping implementation search industry-specific; In the corresponding multilayer classification of query word, Chinese word segmentation mark and parameter adjustment module is used to determine which layer classification query word specifically belongs to; Trade information is used to be applied to the application such as pullover, the industry-specific Search Results exhibition method of industry-specific search.But be not limited thereto, skilled person according to the description in the embodiment of the present invention, can also adopt corresponding equivalent or alternative, realizes the solution of the present invention, e.g., the query word of other low accuracy rate can be used to classify; Do not use industry classification to set data and the mapping of classification industry, but in the application such as the industry page, its exhibition method etc. is determined to each leaf industry classification; What use the navigation of search classification returns classification etc.
Wherein, query word classification and Query Classification, namely the algorithm basis of native system, is computer major term, represents and obtains the process of query word to identified category (native system is the classification of ecommerce).Assign in each classification is Weight (can be understood as probability) simultaneously.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar part mutually see.For device and system embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
The embodiment of the present invention can be used in numerous general or special purpose computing system environment or configuration.Such as: personal computer, server computer, handheld device or portable set, laptop device, multicomputer system, system, network PC, small-size computer, mainframe computer, the distributed computing environment comprising above any system or equipment etc. based on microprocessor.
The embodiment of the present invention can describe in the general context of computer executable instructions, such as program module.Usually, program module comprises the routine, program, object, assembly, data structure etc. that perform particular task or realize particular abstract data type.Also can put into practice the embodiment of the present invention in a distributed computing environment, in these distributed computing environment, be executed the task by the remote processing devices be connected by communication network.In a distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium comprising memory device.In one typically configuration, described computer equipment comprises one or more processor (CPU), input/output interface, network interface and internal memory.Internal memory may comprise the volatile memory in computer-readable medium, and the forms such as random access memory (RAM) and/or Nonvolatile memory, as ROM (read-only memory) (ROM) or flash memory (flash RAM).Internal memory is the example of computer-readable medium.Computer-readable medium comprises permanent and impermanency, removable and non-removable media can be stored to realize information by any method or technology.Information can be computer-readable instruction, data structure, the module of program or other data.The example of the storage medium of computing machine comprises, but be not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic RAM (DRAM), the random access memory (RAM) of other types, ROM (read-only memory) (ROM), Electrically Erasable Read Only Memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc ROM (read-only memory) (CD-ROM), digital versatile disc (DVD) or other optical memory, magnetic magnetic tape cassette, tape magnetic rigid disk stores or other magnetic storage apparatus or any other non-transmitting medium, can be used for storing the information can accessed by computing equipment.According to defining herein, computer-readable medium does not comprise non-standing ground computer readable media (transitorymedia), as data-signal and the carrier wave of modulation.
Above a kind of query word sorting technique provided by the present invention and device are described in detail, apply specific case herein to set forth principle of the present invention and embodiment, the explanation of above embodiment just understands method of the present invention and core concept thereof for helping; Meanwhile, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (20)

1. a query word sorting technique, is characterized in that, comprising:
Obtain the query word of input, generate the leaf industry classification distribution vector that described query word is corresponding, wherein, described leaf industry classification distribution vector contains the industry classification of at least one leaf node corresponding to described query word, the all corresponding class term vector of industry-by-industry classification of node at different levels, each described class term vector comprises multiple class term;
According to the class term vector of the industry classification of each leaf node corresponding to described query word, determine the industry classification of the leaf node belonging to described query word;
According to the industry classification of the described leaf node determined, according to the industry classification mapping relations at different levels pre-set, determine the industry classification at described query word place.
2. method according to claim 1, is characterized in that, each class term that described each class term vector comprises is provided with weight;
At the industry classification of the described described leaf node according to determining, according to the industry classification mapping relations at different levels pre-set, after determining the step of the industry classification at described query word place, also comprise:
Judge whether the industry classification at described query word place meets established standards;
If meet, then described query word is categorized into the described industry classification determined; If do not meet, the first recognition rule and/or the second recognition rule is then used to determine the industry classification at described query word place, wherein, described first recognition rule be used for according to except the industry classification of the leaf node in described leaf industry classification distribution vector, the industry classification of other leaf node, determine the industry classification at described query word place; Described second recognition rule is used for according to the weight of described query word in industry-by-industry classification, determines the industry classification at described query word place.
3. method according to claim 2, is characterized in that, described use first recognition rule and/or the second recognition rule determine that the step of the industry classification at described query word place comprises:
According to described query word and except the industry classification of the leaf node in described leaf industry classification distribution vector, the class term vector of the industry classification of other leaf node, determine the industry classification at described query word place;
And/or,
The class term vector corresponding according to industry-by-industry classification, from the industry classification of root node to the industry classification of leaf node, determine whether the weight of described query word in industry-by-industry classification is more than or equal to the first setting threshold value step by step, industry classification corresponding for the weight being more than or equal to described setting threshold value is defined as the industry classification at described query word place.
4. according to the method in claim 2 or 3, it is characterized in that the class term vector of the industry classification of described each leaf node corresponding according to described query word determines that the step of the industry classification of the leaf node belonging to described query word comprises:
Pre-service is carried out to described query word, generates at least one searching keyword;
Use the class term vector of the industry classification of described at least one searching keyword coupling each leaf node described;
According to the weight of the class term vector of the industry classification of each leaf node described in the match is successful, determine that the industry classification of each leaf node described corresponds to the weight of described query word;
The industry classification of described leaf node the highest for weight is defined as the industry classification of the leaf node belonging to described query word.
5. method according to claim 4, is characterized in that, the industry classification of the described described leaf node according to determining, according to the industry classification mapping relations at different levels pre-set, determines that the step of the industry classification at described query word place comprises:
According to the described industry classification mapping relations at different levels pre-set, determine the upper level industry classification that the industry classification of described leaf node is corresponding;
Obtain the weight that upper level industry classification corresponding to the industry classification of described leaf node corresponds to described query word;
Judge that described upper level industry classification corresponds to the weight of described query word whether more than the weight of the second setting threshold value;
If so, the upper level industry classification then weight being exceeded described second setting threshold value is defined as the industry classification at described query word place; If not, then continue upwards to search, until find weight to exceed the industry classification that described second sets threshold value, or, until the industry classification of root node.
6. method according to claim 1, is characterized in that, the corresponding relation of the industry classification of described query word and described leaf node generates in the following manner:
Obtain the search click information of user, wherein, described search is clicked packets of information and is drawn together: the merchandise news that the query word information of user's input and user click according to Query Result;
According to described search click information, and the corresponding relation of classification belonging to commodity and commodity, obtain the corresponding relation of the industry classification of described query word and described leaf node.
7. method according to claim 6, is characterized in that,
The step of the search click information of described acquisition user comprises: obtain user search click logs, wherein, contain the search click information of multiple described user in described user search click logs, described search is clicked packets of information and is drawn together: category information belonging to the merchandise news that the query word information of user's input, user click according to Query Result and commodity;
Described according to described search click information, and the corresponding relation of classification belonging to commodity and commodity, the step obtaining the corresponding relation of the industry classification of described query word and described leaf node comprises: carry out statistical study to the search click information in described user search click logs, obtain the corresponding relation between described query word information and described merchandise news, and the corresponding relation belonging to described merchandise news and described commodity between category information; According to the corresponding relation between described query word information and described merchandise news, and the corresponding relation belonging to described merchandise news and described commodity between category information, generate the corresponding relation of the industry classification of described query word and described leaf node.
8. method according to claim 6, is characterized in that, described according to described search click information, and the corresponding relation of classification belonging to commodity and commodity, and the step obtaining the corresponding relation of the industry classification of described query word and described leaf node comprises:
From commodity message header, obtain category information belonging to merchandise news and commodity, determine the corresponding relation of category information belonging to described merchandise news and described commodity; According to described search click information, and the corresponding relation of category information belonging to described merchandise news and described commodity, obtain the corresponding relation of the industry classification of described query word and described leaf node.
9. the method according to any one of claim 6 to 8, is characterized in that, the described query word obtaining input, and the step generating leaf industry classification distribution vector corresponding to described query word comprises:
Obtain the query word of input, according to the corresponding relation of the industry classification of described query word and described leaf node, determine described query word corresponding, the industry classification that meets at least one regular leaf node of setting;
Use the industry classification of the described leaf node determined, generate the leaf industry classification distribution vector that described query word is corresponding.
10. method according to claim 2, is characterized in that, also comprises:
According to the matching result of the class term in described query word and each class term vector, the training data of the artificial mark of contrast, adjusts the weight of the class term in each class term vector described.
11. methods according to claim 2, is characterized in that, the described step judging whether the industry classification at described query word place meets established standards comprises:
What judge the industry classification at described query word place recalls the whether satisfied 3rd setting threshold value of number, and/or, judge whether the clicks of the industry classification at described query word place meets the 4th setting threshold value.
12. 1 kinds of query word sorters, is characterized in that, comprising:
First generation module, for obtaining the query word of input, generate the leaf industry classification distribution vector that described query word is corresponding, wherein, described leaf industry classification distribution vector contains the industry classification of at least one leaf node corresponding to described query word, the all corresponding class term vector of industry-by-industry classification of node at different levels, each described class term vector comprises multiple class term;
Determination module, for the class term vector of the industry classification according to each leaf node corresponding to described query word, determines the industry classification of the leaf node belonging to described query word;
First sort module, for the industry classification according to the described leaf node determined, according to the industry classification mapping relations at different levels pre-set, determines the industry classification at described query word place.
13. devices according to claim 12, is characterized in that, each class term that described each class term vector comprises is provided with weight;
Described device also comprises: the second sort module, for at the industry classification of described first sort module according to the described leaf node determined, according to the industry classification mapping relations at different levels pre-set, after determining the industry classification at described query word place, judge whether the industry classification at described query word place meets established standards; If meet, then described query word is categorized into the described industry classification determined; If do not meet, the first recognition rule and/or the second recognition rule is then used to determine the industry classification at described query word place, wherein, described first recognition rule be used for according to except the industry classification of the leaf node in described leaf industry classification distribution vector, the industry classification of other leaf node, determine the industry classification at described query word place; Described second recognition rule is used for according to the weight of described query word in industry-by-industry classification, determines the industry classification at described query word place.
14. devices according to claim 13, is characterized in that, described second sort module is when use first recognition rule and/or the second recognition rule determine the industry classification at described query word place:
According to described query word and except the industry classification of the leaf node in described leaf industry classification distribution vector, the class term vector of the industry classification of other leaf node, determine the industry classification at described query word place;
And/or,
The class term vector corresponding according to industry-by-industry classification, from the industry classification of root node to the industry classification of leaf node, determine whether the weight of described query word in industry-by-industry classification is more than or equal to the first setting threshold value step by step, industry classification corresponding for the weight being more than or equal to described setting threshold value is defined as the industry classification at described query word place.
15. devices according to claim 13 or 14, it is characterized in that, described determination module comprises:
Generation module, for carrying out pre-service to described query word, generates at least one searching keyword;
Matching module, for using the class term vector of the industry classification of described at least one searching keyword coupling each leaf node described;
Weight module, the weight of the class term vector of the industry classification of each leaf node described in the match is successful for basis, determines that the industry classification of each leaf node described corresponds to the weight of described query word;
Leaf node determination module, for being defined as the industry classification of the leaf node belonging to described query word by the industry classification of described leaf node the highest for weight.
16. devices according to claim 15, is characterized in that, described first sort module comprises:
Higher level's classification module, for the industry classification mapping relations at different levels pre-set described in basis, determines the upper level industry classification that the industry classification of described leaf node is corresponding;
Higher level's weight module, the upper level industry classification that the industry classification for obtaining described leaf node is corresponding corresponds to the weight of described query word;
Judge module, for judging that described upper level industry classification corresponds to the weight of described query word whether more than the weight of the second setting threshold value;
Execution module, if be yes for the judged result of described judge module, then upper level industry classification weight being exceeded described second setting threshold value is defined as the industry classification at described query word place; If the judged result of described judge module is no, then continue upwards to search, until find weight to exceed the industry classification that described second sets threshold value, or, until the industry classification of root node.
17. devices according to claim 12, is characterized in that, also comprise:
Corresponding relation generation module, for generating the corresponding relation of the industry classification of described query word and described leaf node, comprise: the search click information obtaining user, wherein, described search click packets of information is drawn together: the merchandise news that the query word information of user's input and user click according to Query Result; According to described search click information, and the corresponding relation of classification belonging to commodity and commodity, obtain the corresponding relation of the industry classification of described query word and described leaf node.
18. devices according to claim 17, is characterized in that,
Described corresponding relation generation module is when obtaining the search click information of user: obtain user search click logs, wherein, contain the search click information of multiple described user in described user search click logs, described search is clicked packets of information and is drawn together: category information belonging to the merchandise news that the query word information of user's input, user click according to Query Result and commodity; Described corresponding relation generation module is according to described search click information, and the corresponding relation of classification belonging to commodity and commodity, when obtaining the corresponding relation of industry classification of described query word and described leaf node: statistical study is carried out to the search click information in described user search click logs, obtain the corresponding relation between described query word information and described merchandise news, and the corresponding relation belonging to described merchandise news and described commodity between category information; According to the corresponding relation between described query word information and described merchandise news, and the corresponding relation belonging to described merchandise news and described commodity between category information, generate the corresponding relation of the industry classification of described query word and described leaf node.
19. devices according to claim 17, it is characterized in that, described corresponding relation generation module is according to described search click information, and the corresponding relation of classification belonging to commodity and commodity, when obtaining the corresponding relation of industry classification of described query word and described leaf node:
From commodity message header, obtain category information belonging to merchandise news and commodity, determine the corresponding relation of category information belonging to described merchandise news and described commodity; According to described search click information, and the corresponding relation of category information belonging to described merchandise news and described commodity, obtain the corresponding relation of the industry classification of described query word and described leaf node.
20. devices according to claim 13, it is characterized in that, described second sort module is when judging whether the industry classification at described query word place meets established standards: what judge the industry classification at described query word place recalls the whether satisfied 3rd setting threshold value of number, and/or, judge whether the clicks of the industry classification at described query word place meets the 4th setting threshold value.
CN201310392966.XA 2013-09-02 2013-09-02 Query word sorting technique and device Active CN104424296B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201310392966.XA CN104424296B (en) 2013-09-02 2013-09-02 Query word sorting technique and device
HK15105059.0A HK1204685A1 (en) 2013-09-02 2015-05-28 Method for classifying query word and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310392966.XA CN104424296B (en) 2013-09-02 2013-09-02 Query word sorting technique and device

Publications (2)

Publication Number Publication Date
CN104424296A true CN104424296A (en) 2015-03-18
CN104424296B CN104424296B (en) 2018-07-31

Family

ID=52973275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310392966.XA Active CN104424296B (en) 2013-09-02 2013-09-02 Query word sorting technique and device

Country Status (2)

Country Link
CN (1) CN104424296B (en)
HK (1) HK1204685A1 (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468680A (en) * 2015-11-16 2016-04-06 中国建设银行股份有限公司 Data retrieval method and device
CN105894019A (en) * 2016-03-30 2016-08-24 北京京东尚科信息技术有限公司 Database data classification method and apparatus
US9767182B1 (en) 2016-10-28 2017-09-19 Searchmetrics Gmbh Classification of search queries
US9798820B1 (en) 2016-10-28 2017-10-24 Searchmetrics Gmbh Classification of keywords
CN107423362A (en) * 2017-06-20 2017-12-01 阿里巴巴集团控股有限公司 Industry determines method, Method of Get Remote Object and device, client, server
CN108206020A (en) * 2016-12-16 2018-06-26 北京智能管家科技有限公司 A kind of audio recognition method, device and terminal device
CN108388555A (en) * 2018-02-01 2018-08-10 口碑(上海)信息技术有限公司 Commodity De-weight method based on category of employment and device
CN110019822A (en) * 2019-04-16 2019-07-16 中国科学技术大学 A kind of few sample relationship classification method and system
CN110322093A (en) * 2018-03-30 2019-10-11 阿里巴巴集团控股有限公司 Information processing method, information display method, device and calculating equipment
CN110580291A (en) * 2019-07-29 2019-12-17 用友网络科技股份有限公司 Intelligent searching method based on ERP customer service knowledge graph and computer equipment
CN106971000B (en) * 2017-04-12 2020-04-28 北京焦点新干线信息技术有限公司 Searching method and device
CN111159552A (en) * 2019-12-30 2020-05-15 北京每日优鲜电子商务有限公司 Commodity searching method, commodity searching device, server and storage medium
CN111190933A (en) * 2019-12-27 2020-05-22 江苏艾佳家居用品有限公司 E-commerce category deep query method based on interval number
CN111191022A (en) * 2019-12-27 2020-05-22 苏宁云计算有限公司 Method and device for generating short titles of commodities
CN111339297A (en) * 2020-02-21 2020-06-26 广州天懋信息系统股份有限公司 Network asset anomaly detection method, system, medium, and device
CN112241493A (en) * 2020-10-28 2021-01-19 浙江集享电子商务有限公司 Commodity retrieval method and device, computer equipment and storage medium
CN113553820A (en) * 2020-04-24 2021-10-26 腾讯科技(深圳)有限公司 Information processing method, equipment and computer readable storage medium
CN114860892A (en) * 2022-07-06 2022-08-05 腾讯科技(深圳)有限公司 Hierarchical category prediction method, device, equipment and medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112447300B (en) * 2020-11-27 2024-02-09 平安科技(深圳)有限公司 Medical query method and device based on graph neural network, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050131919A1 (en) * 1999-08-19 2005-06-16 Brookler David B. Efficient storage and access in a database management system
CN101639831A (en) * 2008-07-29 2010-02-03 华为技术有限公司 Search method, search device and search system
US20110078130A1 (en) * 2004-10-06 2011-03-31 Shopzilla, Inc. Word Deletion for Searches
CN102053983A (en) * 2009-11-02 2011-05-11 阿里巴巴集团控股有限公司 Method, system and device for querying vertical search
CN102841946A (en) * 2012-08-24 2012-12-26 北京国政通科技有限公司 Commodity data retrieval sequencing and commodity recommendation method and system
CN103020066A (en) * 2011-09-21 2013-04-03 北京百度网讯科技有限公司 Method and device for recognizing search demand

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050131919A1 (en) * 1999-08-19 2005-06-16 Brookler David B. Efficient storage and access in a database management system
US20110078130A1 (en) * 2004-10-06 2011-03-31 Shopzilla, Inc. Word Deletion for Searches
CN101639831A (en) * 2008-07-29 2010-02-03 华为技术有限公司 Search method, search device and search system
CN102053983A (en) * 2009-11-02 2011-05-11 阿里巴巴集团控股有限公司 Method, system and device for querying vertical search
CN103020066A (en) * 2011-09-21 2013-04-03 北京百度网讯科技有限公司 Method and device for recognizing search demand
CN102841946A (en) * 2012-08-24 2012-12-26 北京国政通科技有限公司 Commodity data retrieval sequencing and commodity recommendation method and system

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468680A (en) * 2015-11-16 2016-04-06 中国建设银行股份有限公司 Data retrieval method and device
CN105894019A (en) * 2016-03-30 2016-08-24 北京京东尚科信息技术有限公司 Database data classification method and apparatus
US9767182B1 (en) 2016-10-28 2017-09-19 Searchmetrics Gmbh Classification of search queries
US9798820B1 (en) 2016-10-28 2017-10-24 Searchmetrics Gmbh Classification of keywords
CN108206020A (en) * 2016-12-16 2018-06-26 北京智能管家科技有限公司 A kind of audio recognition method, device and terminal device
CN106971000B (en) * 2017-04-12 2020-04-28 北京焦点新干线信息技术有限公司 Searching method and device
CN107423362A (en) * 2017-06-20 2017-12-01 阿里巴巴集团控股有限公司 Industry determines method, Method of Get Remote Object and device, client, server
CN108388555A (en) * 2018-02-01 2018-08-10 口碑(上海)信息技术有限公司 Commodity De-weight method based on category of employment and device
CN110322093A (en) * 2018-03-30 2019-10-11 阿里巴巴集团控股有限公司 Information processing method, information display method, device and calculating equipment
CN110322093B (en) * 2018-03-30 2024-03-08 阿里巴巴集团控股有限公司 Information processing method, information display method, information processing device and computing equipment
CN110019822A (en) * 2019-04-16 2019-07-16 中国科学技术大学 A kind of few sample relationship classification method and system
CN110019822B (en) * 2019-04-16 2021-07-06 中国科学技术大学 Few-sample relation classification method and system
CN110580291A (en) * 2019-07-29 2019-12-17 用友网络科技股份有限公司 Intelligent searching method based on ERP customer service knowledge graph and computer equipment
CN111190933B (en) * 2019-12-27 2022-08-12 江苏艾佳家居用品有限公司 E-commerce category deep query method based on interval number
CN111190933A (en) * 2019-12-27 2020-05-22 江苏艾佳家居用品有限公司 E-commerce category deep query method based on interval number
CN111191022A (en) * 2019-12-27 2020-05-22 苏宁云计算有限公司 Method and device for generating short titles of commodities
CN111159552A (en) * 2019-12-30 2020-05-15 北京每日优鲜电子商务有限公司 Commodity searching method, commodity searching device, server and storage medium
CN111339297A (en) * 2020-02-21 2020-06-26 广州天懋信息系统股份有限公司 Network asset anomaly detection method, system, medium, and device
CN111339297B (en) * 2020-02-21 2023-04-25 广州天懋信息系统股份有限公司 Network asset anomaly detection method, system, medium and equipment
CN113553820A (en) * 2020-04-24 2021-10-26 腾讯科技(深圳)有限公司 Information processing method, equipment and computer readable storage medium
CN112241493A (en) * 2020-10-28 2021-01-19 浙江集享电子商务有限公司 Commodity retrieval method and device, computer equipment and storage medium
CN114860892A (en) * 2022-07-06 2022-08-05 腾讯科技(深圳)有限公司 Hierarchical category prediction method, device, equipment and medium
CN114860892B (en) * 2022-07-06 2022-09-06 腾讯科技(深圳)有限公司 Hierarchical category prediction method, device, equipment and medium

Also Published As

Publication number Publication date
CN104424296B (en) 2018-07-31
HK1204685A1 (en) 2015-11-27

Similar Documents

Publication Publication Date Title
CN104424296A (en) Query word classifying method and query word classifying device
CN107220365B (en) Accurate recommendation system and method based on collaborative filtering and association rule parallel processing
US20090327249A1 (en) Intellegent Data Search Engine
CN104239340B (en) Search result screening technique and device
CN109446341A (en) The construction method and device of knowledge mapping
CN108121737A (en) A kind of generation method, the device and system of business object attribute-bit
US20080104113A1 (en) Uniform resource locator scoring for targeted web crawling
CN108846097B (en) User interest tag representation method, article recommendation device and equipment
CN104834686A (en) Video recommendation method based on hybrid semantic matrix
CN103886048B (en) Cluster-based increment digital book recommendation method
CN111125086B (en) Method, device, storage medium and processor for acquiring data resources
CN102053983A (en) Method, system and device for querying vertical search
CN104978356B (en) A kind of recognition methods of synonym and device
CN102236663A (en) Query method, query system and query device based on vertical search
CN107632984A (en) A kind of cluster data table shows methods, devices and systems
CN104679771A (en) Individual data searching method and device
US20090049002A1 (en) System and method for selecting a training sample from a sample test
KR102371437B1 (en) Method and apparatus for recommending entity, electronic device and computer readable medium
CN104484431A (en) Multi-source individualized news webpage recommending method based on field body
CN106528648A (en) Distributed keyword approximate search method for RDF in combination with Redis memory database
CN107423396A (en) It is a kind of that method is recommended based on the Mashup of function implication relation and cluster
CN112070577A (en) Commodity recommendation method, system, equipment and medium
CN105677838A (en) User profile creating and personalized search ranking method and system based on user requirements
CN116882414B (en) Automatic comment generation method and related device based on large-scale language model
KR20210030808A (en) Estimating apparatus for market size, and control method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1204685

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant