WO2017201907A1 - Search term classification method and device - Google Patents

Search term classification method and device Download PDF

Info

Publication number
WO2017201907A1
WO2017201907A1 PCT/CN2016/097351 CN2016097351W WO2017201907A1 WO 2017201907 A1 WO2017201907 A1 WO 2017201907A1 CN 2016097351 W CN2016097351 W CN 2016097351W WO 2017201907 A1 WO2017201907 A1 WO 2017201907A1
Authority
WO
WIPO (PCT)
Prior art keywords
search
classifier
search term
data
search result
Prior art date
Application number
PCT/CN2016/097351
Other languages
French (fr)
Chinese (zh)
Inventor
马守玉
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Publication of WO2017201907A1 publication Critical patent/WO2017201907A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a method and apparatus for classifying search terms.
  • the present invention aims to solve the above technical problems at least to some extent.
  • the first object of the present invention is to propose a method for classifying search terms, which can enrich search results and improve the diversity and extensibility of search results.
  • a second object of the present invention is to provide a search term classification device.
  • a search term classification method includes: receiving a search term input by a user, and obtaining a corresponding search result according to the search term; and obtaining a click of the search result. Data, and extracting corresponding data features according to the click data; training the data features to generate a classifier; and classifying the search terms according to the classifier.
  • the method for classifying a search term receives a search term input by a user, and obtains a corresponding search result according to the search term, acquires click data of the search result, and extracts corresponding data according to the click data.
  • the feature, the data feature is trained to generate a classifier, and the search term is classified according to the classifier, which can enrich the search result and improve the diversity and extensibility of the search result.
  • the second aspect of the present invention provides a search term classification device, including: an obtaining module, configured to receive a search term input by a user, and obtain a corresponding search result according to the search term; and an extracting module, configured to acquire the Search knot Clicking data, and extracting corresponding data features according to the click data; a training module for training the data features to generate a classifier; and a classification module for performing the search term according to the classifier classification.
  • the search term classification device of the embodiment of the present invention receives the search term input by the user, acquires the corresponding search result according to the search term, acquires the click data of the search result, and extracts corresponding data according to the click data.
  • the feature, the data feature is trained to generate a classifier, and the search term is classified according to the classifier, which can enrich the search result and improve the diversity and extensibility of the search result.
  • An embodiment of the third aspect of the present invention provides an electronic device comprising: one or more processors; a memory; one or more programs, the one or more programs being stored in the memory when Or when the plurality of processors are executed, the search term classification method of the first aspect of the present invention is executed.
  • a fourth aspect of the present invention provides a non-volatile computer storage medium storing one or more programs, when the one or more programs are executed by one device, causing the device A search term classification method in accordance with an embodiment of the first aspect of the present invention is performed.
  • FIG. 1 is a flow chart of a method for classifying search terms according to an embodiment of the present invention
  • FIG. 2 is a block diagram showing the structure of a search term classification device according to an embodiment of the present invention.
  • a search term classification method includes the following steps: receiving a search term input by a user, and obtaining a corresponding search result according to the search term; acquiring click data of the search result, and extracting corresponding data features according to the click data Training the data features to generate a classifier; and classifying the search terms according to the classifier.
  • FIG. 1 is a flow chart of a method for classifying search terms in accordance with one embodiment of the present invention.
  • a search term classification method includes the following steps.
  • the user can enter the application store and input the search term “flowers” in the search bar of the application store, and then search the application store according to the search term “flowers” to obtain the application APP related to “flowers”.
  • search term such as flower nets, flowers and so on.
  • the user may receive a click operation on the search result, and record the click data corresponding to the click operation.
  • the corresponding data features can then be extracted based on the click data.
  • the data characteristics may include a name, a category, a keyword, a version, a file size, a download count, a developer name, and the like of the search result. For example, if the clicked APP is a flower and a Lianliankan, you can extract the name of the flower Lianliankan APP, “Flower Lianliankan”, category “Game”, version “6.0”, file size “30M”, download times 500 times, etc.
  • the machine learning model can be used to train the data features to generate a classifier.
  • the machine learning model may include a naive Bayesian model, a support vector machine model, a neural network, and the like.
  • the data feature of the category of the APP can be used to train using the naive Bayesian model to generate a classifier.
  • the classifier and the keyword of the APP can be used to train using the support vector machine model to generate a classifier.
  • the classifier can be used to classify the search terms. For example, the search term "flowers" can be classified into games by the classifier.
  • classification of search terms is not limited to one category and can belong to multiple categories. Priority can be given to users to categories that have a high number of clicks.
  • the search term classification method receives the search term input by the user, obtains the corresponding search result according to the search term, acquires the click data of the search result, and extracts the corresponding data feature according to the click data, and trains the data feature.
  • the search results can be enriched, and the diversity and extensibility of the search results can be improved.
  • the present invention also proposes a search term classification device.
  • a search term classification device comprising: an obtaining module, configured to receive a search term input by a user, and obtain a corresponding search result according to the search term; and an extracting module, configured to acquire click data of the search result, and according to the a data feature corresponding to the click data extraction; a training module for training the data feature to generate a classifier; a class module for classifying the search terms according to the classifier.
  • FIG. 2 is a block diagram showing the structure of a search term classification device according to an embodiment of the present invention.
  • the search term classification device includes: an acquisition module 110, an extraction module 120, a training module 130, and a classification module 140.
  • the obtaining module 110 is configured to receive a search term input by the user, and obtain a corresponding search result according to the search term. For example, the user can enter the application store and input the search term “flowers” in the search bar of the application store, and then search the application store according to the search term “flowers” to obtain the application APP related to “flowers”. Such as flower nets, flowers and so on.
  • the extraction module 120 is configured to obtain click data of the retrieval result, and extract corresponding data features according to the click data.
  • the user may receive a click operation on the search result, and record the click data corresponding to the click operation.
  • the corresponding data features can then be extracted based on the click data.
  • the data characteristics may include a name, a category, a keyword, a version, a file size, a download count, a developer name, and the like of the search result. For example, if the clicked APP is a flower and a Lianliankan, you can extract the name of the flower Lianliankan APP, “Flower Lianliankan”, category “Game”, version “6.0”, file size “30M”, download times 500 times, etc.
  • Training module 130 is used to train data features to generate a classifier.
  • the machine learning model can be used to train the data features to generate a classifier.
  • the machine learning model may include a naive Bayesian model, a support vector machine model, a neural network, and the like.
  • the data feature of the category of the APP can be used to train using the naive Bayesian model to generate a classifier.
  • the classifier and the keyword of the APP can be used to train using the support vector machine model to generate a classifier.
  • the classification module 140 is configured to classify the search terms according to the classifier. After training the classifier, the classifier can be used to classify the search terms. For example, the search term "flowers" can be classified into games by the classifier.
  • classification of search terms is not limited to one category and can belong to multiple categories. Priority can be given to users to categories that have a high number of clicks.
  • the search term classification device of the embodiment of the present invention receives the search term input by the user, obtains the corresponding search result according to the search term, acquires the click data of the search result, and extracts the corresponding data feature according to the click data, and trains the data feature.
  • the search results can be enriched, and the diversity and extensibility of the search results can be improved.
  • first and second are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated.
  • features defining “first” or “second” may include at least one of the features, either explicitly or implicitly.
  • the meaning of "a plurality” is two or more unless specifically and specifically defined otherwise.
  • a "computer-readable medium” can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with the instruction execution system, apparatus, or device.
  • computer readable media include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM).
  • the computer readable medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if appropriate, other suitable The method is processed to obtain the program electronically and then stored in computer memory.
  • portions of the invention may be implemented in hardware, software, firmware or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuit, ASIC with suitable combinational logic gate, Programmable Gate Array (PGA), now Field programmable gate array (FPGA), etc.
  • each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
  • the above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like.

Abstract

A search term classification method and device. The search term classification method comprises the following steps: receiving a search term inputted by a user, and acquiring, according to the search term, corresponding search results (S1); acquiring click data of the search results, and extracting corresponding data features according to the click data (S2); training with the data features to generate a classifier (S3); and classifying a search term according to the classifier (S4). The method and device of the present invention enrich search results, and improve variety and expandability of the search results.

Description

检索词分类方法及装置Search term classification method and device
相关申请的交叉引用Cross-reference to related applications
本申请要求百度在线网络技术(北京)有限公司于2016年5月24日提交的、发明名称为“检索词分类方法及装置”的、中国专利申请号“201610350036.1”的优先权。This application claims the priority of the Chinese patent application No. "201610350036.1" filed on May 24, 2016 by Baidu Online Network Technology (Beijing) Co., Ltd., entitled "Search Method and Apparatus for Searching Terms".
技术领域Technical field
本发明涉及计算机技术领域,特别涉及一种检索词分类方法及装置。The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for classifying search terms.
背景技术Background technique
随着互联网的迅速发展,智能手机等移动终端越来越普及,在使用智能手机的过程中,用户可根据需求安装具有各种功能的应用程序。With the rapid development of the Internet, mobile terminals such as smart phones are becoming more and more popular. In the process of using a smart phone, users can install applications with various functions according to requirements.
目前,用户可以通过应用商店,输入检索词来搜索所需的APP(Application,应用)。但是,通过该方式获取的结果仅与检索词相关,比较单一,不够丰富。Currently, users can search for the desired APP (Application) by entering the search term through the app store. However, the results obtained by this method are only related to the search terms, which are relatively simple and not rich enough.
发明内容Summary of the invention
本发明旨在至少在一定程度上解决上述技术问题。The present invention aims to solve the above technical problems at least to some extent.
为此,本发明的第一个目的在于提出一种检索词分类方法,该方法能够丰富检索结果,提高检索结果的多样性和扩展性。To this end, the first object of the present invention is to propose a method for classifying search terms, which can enrich search results and improve the diversity and extensibility of search results.
本发明的第二个目的在于提出一种检索词分类装置。A second object of the present invention is to provide a search term classification device.
为达上述目的,根据本发明第一方面实施例提出了一种检索词分类方法,包括:接收用户输入的检索词,并根据所述检索词获取对应的检索结果;获取所述检索结果的点击数据,并根据所述点击数据提取对应的数据特征;训练所述数据特征,以生成分类器;以及根据所述分类器对所述检索词进行分类。In order to achieve the above objective, a search term classification method according to the first aspect of the present invention includes: receiving a search term input by a user, and obtaining a corresponding search result according to the search term; and obtaining a click of the search result. Data, and extracting corresponding data features according to the click data; training the data features to generate a classifier; and classifying the search terms according to the classifier.
本发明实施例的检索词分类方法,通过接收用户输入的检索词,并根据所述检索词获取对应的检索结果,再获取所述检索结果的点击数据,并根据所述点击数据提取对应的数据特征,训练所述数据特征,以生成分类器,以及根据所述分类器对所述检索词进行分类,能够丰富检索结果,提高检索结果的多样性和扩展性。The method for classifying a search term according to an embodiment of the present invention receives a search term input by a user, and obtains a corresponding search result according to the search term, acquires click data of the search result, and extracts corresponding data according to the click data. The feature, the data feature is trained to generate a classifier, and the search term is classified according to the classifier, which can enrich the search result and improve the diversity and extensibility of the search result.
本发明第二方面实施例提出了一种检索词分类装置,包括:获取模块,用于接收用户输入的检索词,并根据所述检索词获取对应的检索结果;提取模块,用于获取所述检索结 果的点击数据,并根据所述点击数据提取对应的数据特征;训练模块,用于训练所述数据特征,以生成分类器;以及分类模块,用于根据所述分类器对所述检索词进行分类。The second aspect of the present invention provides a search term classification device, including: an obtaining module, configured to receive a search term input by a user, and obtain a corresponding search result according to the search term; and an extracting module, configured to acquire the Search knot Clicking data, and extracting corresponding data features according to the click data; a training module for training the data features to generate a classifier; and a classification module for performing the search term according to the classifier classification.
本发明实施例的检索词分类装置,通过接收用户输入的检索词,并根据所述检索词获取对应的检索结果,再获取所述检索结果的点击数据,并根据所述点击数据提取对应的数据特征,训练所述数据特征,以生成分类器,以及根据所述分类器对所述检索词进行分类,能够丰富检索结果,提高检索结果的多样性和扩展性。The search term classification device of the embodiment of the present invention receives the search term input by the user, acquires the corresponding search result according to the search term, acquires the click data of the search result, and extracts corresponding data according to the click data. The feature, the data feature is trained to generate a classifier, and the search term is classified according to the classifier, which can enrich the search result and improve the diversity and extensibility of the search result.
本发明第三方面实施例提供了一种电子设备,包括:一个或者多个处理器;存储器;一个或者多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多个处理器执行时,执行本发明第一方面实施例的检索词分类方法。An embodiment of the third aspect of the present invention provides an electronic device comprising: one or more processors; a memory; one or more programs, the one or more programs being stored in the memory when Or when the plurality of processors are executed, the search term classification method of the first aspect of the present invention is executed.
本发明第四方面实施例提供了一种非易失性计算机存储介质,所述计算机存储介质存储有一个或者多个程序,当所述一个或者多个程序被一个设备执行时,使得所述设备执行以本发明第一方面实施例的检索词分类方法。A fourth aspect of the present invention provides a non-volatile computer storage medium storing one or more programs, when the one or more programs are executed by one device, causing the device A search term classification method in accordance with an embodiment of the first aspect of the present invention is performed.
本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。The additional aspects and advantages of the invention will be set forth in part in the description which follows.
附图说明DRAWINGS
本发明的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from
图1为根据本发明一个实施例的检索词分类方法的流程图;1 is a flow chart of a method for classifying search terms according to an embodiment of the present invention;
图2为根据本发明一个实施例的检索词分类装置的结构示意图。2 is a block diagram showing the structure of a search term classification device according to an embodiment of the present invention.
具体实施方式detailed description
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are intended to be illustrative of the invention and are not to be construed as limiting.
在本发明的描述中,需要理解的是,术语“多个”指两个或两个以上;术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性。In the description of the present invention, it is to be understood that the term "plurality" means two or more; the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying that they are relatively important. Sex.
下面参考附图描述根据本发明实施例的检索词分类方法及装置。A search term classification method and apparatus according to an embodiment of the present invention will be described below with reference to the accompanying drawings.
一种检索词分类方法,包括以下步骤:接收用户输入的检索词,并根据所述检索词获取对应的检索结果;获取所述检索结果的点击数据,并根据所述点击数据提取对应的数据特征;训练所述数据特征,以生成分类器;以及根据所述分类器对所述检索词进行分类。A search term classification method includes the following steps: receiving a search term input by a user, and obtaining a corresponding search result according to the search term; acquiring click data of the search result, and extracting corresponding data features according to the click data Training the data features to generate a classifier; and classifying the search terms according to the classifier.
图1为根据本发明一个实施例的检索词分类方法的流程图。 1 is a flow chart of a method for classifying search terms in accordance with one embodiment of the present invention.
如图1所示,根据本发明实施例的检索词分类方法,包括以下步骤。As shown in FIG. 1, a search term classification method according to an embodiment of the present invention includes the following steps.
S1,接收用户输入的检索词,并根据检索词获取对应的检索结果。S1. Receive a search term input by the user, and obtain a corresponding search result according to the search term.
举例来说,用户可进入应用商店,在应用商店的搜索栏中输入检索词“鲜花”,则可根据检索词“鲜花”对应用商店进行搜索,从而获取与“鲜花”相关的应用程序APP,如鲜花网、鲜花连连看等。For example, the user can enter the application store and input the search term “flowers” in the search bar of the application store, and then search the application store according to the search term “flowers” to obtain the application APP related to “flowers”. Such as flower nets, flowers and so on.
S2,获取检索结果的点击数据,并根据点击数据提取对应的数据特征。S2: Obtain click data of the search result, and extract corresponding data features according to the click data.
在获取检索结果后,可接收用户对检索结果的点击操作,记录下点击操作对应的点击数据。然后可根据点击数据提取对应的数据特征。其中,数据特征可包括检索结果的名称、类别、关键字、版本、文件大小、下载次数、开发者姓名等。例如:被点击的APP是鲜花连连看,则可提取出鲜花连连看APP对应的名称“鲜花连连看”,类别“游戏”,版本“6.0”,文件大小“30M”,下载次数500次等。After obtaining the search result, the user may receive a click operation on the search result, and record the click data corresponding to the click operation. The corresponding data features can then be extracted based on the click data. The data characteristics may include a name, a category, a keyword, a version, a file size, a download count, a developer name, and the like of the search result. For example, if the clicked APP is a flower and a Lianliankan, you can extract the name of the flower Lianliankan APP, “Flower Lianliankan”, category “Game”, version “6.0”, file size “30M”, download times 500 times, etc.
S3,训练数据特征,以生成分类器。S3, training data features to generate a classifier.
在提取数据特征之后,可采用机器学习模型对数据特征进行训练,以生成分类器。其中,机器学习模型可包括朴素贝叶斯模型、支持向量机模型、神经网络等。举例来说,可使用APP的类别这一数据特征,采用朴素贝叶斯模型进行训练,从而生成分类器。或者,可使用APP的类别和关键字,采用支持向量机模型进行训练,从而生成分类器。After extracting the data features, the machine learning model can be used to train the data features to generate a classifier. Among them, the machine learning model may include a naive Bayesian model, a support vector machine model, a neural network, and the like. For example, the data feature of the category of the APP can be used to train using the naive Bayesian model to generate a classifier. Alternatively, the classifier and the keyword of the APP can be used to train using the support vector machine model to generate a classifier.
S4,根据分类器对检索词进行分类。S4, classifying the search words according to the classifier.
在训练好分类器后,可利用分类器对检索词进行分类。例如:检索词“鲜花”,可利用分类器将其分类为游戏类。After training the classifier, the classifier can be used to classify the search terms. For example, the search term "flowers" can be classified into games by the classifier.
当用户再次以检索词“鲜花”进行搜索时,可向用户推荐更多属于游戏类的APP,从而丰富检索结果。When the user searches again with the search term "flowers", more APPs belonging to the game category can be recommended to the user, thereby enriching the search results.
当然,检索词的分类不仅限于一类,可以属于多个类别。可优先向用户推荐点击次数多的类别。Of course, the classification of search terms is not limited to one category and can belong to multiple categories. Priority can be given to users to categories that have a high number of clicks.
本发明实施例的检索词分类方法,通过接收用户输入的检索词,并根据检索词获取对应的检索结果,再获取检索结果的点击数据,并根据点击数据提取对应的数据特征,训练数据特征,以生成分类器,以及根据分类器对检索词进行分类,能够丰富检索结果,提高检索结果的多样性和扩展性。The search term classification method according to the embodiment of the present invention receives the search term input by the user, obtains the corresponding search result according to the search term, acquires the click data of the search result, and extracts the corresponding data feature according to the click data, and trains the data feature. By generating a classifier and classifying the search words according to the classifier, the search results can be enriched, and the diversity and extensibility of the search results can be improved.
为了实现上述实施例,本发明还提出一种检索词分类装置。In order to implement the above embodiment, the present invention also proposes a search term classification device.
一种检索词分类装置,包括:获取模块,用于接收用户输入的检索词,并根据所述检索词获取对应的检索结果;提取模块,用于获取所述检索结果的点击数据,并根据所述点击数据提取对应的数据特征;训练模块,用于训练所述数据特征,以生成分类器;以及分 类模块,用于根据所述分类器对所述检索词进行分类。A search term classification device, comprising: an obtaining module, configured to receive a search term input by a user, and obtain a corresponding search result according to the search term; and an extracting module, configured to acquire click data of the search result, and according to the a data feature corresponding to the click data extraction; a training module for training the data feature to generate a classifier; a class module for classifying the search terms according to the classifier.
图2为根据本发明一个实施例的检索词分类装置的结构示意图。2 is a block diagram showing the structure of a search term classification device according to an embodiment of the present invention.
如图2所示,根据本发明实施例的检索词分类装置,包括:获取模块110、提取模块120、训练模块130和分类模块140。As shown in FIG. 2, the search term classification device according to an embodiment of the present invention includes: an acquisition module 110, an extraction module 120, a training module 130, and a classification module 140.
其中,获取模块110用于接收用户输入的检索词,并根据检索词获取对应的检索结果。举例来说,用户可进入应用商店,在应用商店的搜索栏中输入检索词“鲜花”,则可根据检索词“鲜花”对应用商店进行搜索,从而获取与“鲜花”相关的应用程序APP,如鲜花网、鲜花连连看等。The obtaining module 110 is configured to receive a search term input by the user, and obtain a corresponding search result according to the search term. For example, the user can enter the application store and input the search term “flowers” in the search bar of the application store, and then search the application store according to the search term “flowers” to obtain the application APP related to “flowers”. Such as flower nets, flowers and so on.
提取模块120用于获取检索结果的点击数据,并根据点击数据提取对应的数据特征。在获取检索结果后,可接收用户对检索结果的点击操作,记录下点击操作对应的点击数据。然后可根据点击数据提取对应的数据特征。其中,数据特征可包括检索结果的名称、类别、关键字、版本、文件大小、下载次数、开发者姓名等。例如:被点击的APP是鲜花连连看,则可提取出鲜花连连看APP对应的名称“鲜花连连看”,类别“游戏”,版本“6.0”,文件大小“30M”,下载次数500次等。The extraction module 120 is configured to obtain click data of the retrieval result, and extract corresponding data features according to the click data. After obtaining the search result, the user may receive a click operation on the search result, and record the click data corresponding to the click operation. The corresponding data features can then be extracted based on the click data. The data characteristics may include a name, a category, a keyword, a version, a file size, a download count, a developer name, and the like of the search result. For example, if the clicked APP is a flower and a Lianliankan, you can extract the name of the flower Lianliankan APP, “Flower Lianliankan”, category “Game”, version “6.0”, file size “30M”, download times 500 times, etc.
训练模块130用于训练数据特征,以生成分类器。在提取数据特征之后,可采用机器学习模型对数据特征进行训练,以生成分类器。其中,机器学习模型可包括朴素贝叶斯模型、支持向量机模型、神经网络等。举例来说,可使用APP的类别这一数据特征,采用朴素贝叶斯模型进行训练,从而生成分类器。或者,可使用APP的类别和关键字,采用支持向量机模型进行训练,从而生成分类器。Training module 130 is used to train data features to generate a classifier. After extracting the data features, the machine learning model can be used to train the data features to generate a classifier. Among them, the machine learning model may include a naive Bayesian model, a support vector machine model, a neural network, and the like. For example, the data feature of the category of the APP can be used to train using the naive Bayesian model to generate a classifier. Alternatively, the classifier and the keyword of the APP can be used to train using the support vector machine model to generate a classifier.
分类模块140用于根据分类器对检索词进行分类。在训练好分类器后,可利用分类器对检索词进行分类。例如:检索词“鲜花”,可利用分类器将其分类为游戏类。The classification module 140 is configured to classify the search terms according to the classifier. After training the classifier, the classifier can be used to classify the search terms. For example, the search term "flowers" can be classified into games by the classifier.
当用户再次以检索词“鲜花”进行搜索时,可向用户推荐更多属于游戏类的APP,从而丰富检索结果。When the user searches again with the search term "flowers", more APPs belonging to the game category can be recommended to the user, thereby enriching the search results.
当然,检索词的分类不仅限于一类,可以属于多个类别。可优先向用户推荐点击次数多的类别。Of course, the classification of search terms is not limited to one category and can belong to multiple categories. Priority can be given to users to categories that have a high number of clicks.
本发明实施例的检索词分类装置,通过接收用户输入的检索词,并根据检索词获取对应的检索结果,再获取检索结果的点击数据,并根据点击数据提取对应的数据特征,训练数据特征,以生成分类器,以及根据分类器对检索词进行分类,能够丰富检索结果,提高检索结果的多样性和扩展性。The search term classification device of the embodiment of the present invention receives the search term input by the user, obtains the corresponding search result according to the search term, acquires the click data of the search result, and extracts the corresponding data feature according to the click data, and trains the data feature. By generating a classifier and classifying the search words according to the classifier, the search results can be enriched, and the diversity and extensibility of the search results can be improved.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者 特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. , structure, material or Features are included in at least one embodiment or example of the present invention. In the present specification, the schematic representation of the above terms is not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples. In addition, various embodiments or examples described in the specification, as well as features of various embodiments or examples, may be combined and combined.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。Moreover, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" or "second" may include at least one of the features, either explicitly or implicitly. In the description of the present invention, the meaning of "a plurality" is two or more unless specifically and specifically defined otherwise.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本发明的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本发明的实施例所属技术领域的技术人员所理解。Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code that includes one or more executable instructions for implementing the steps of a particular logical function or process. And the scope of the preferred embodiments of the invention includes additional implementations, in which the functions may be performed in a substantially simultaneous manner or in an opposite order depending on the functions involved, in the order shown or discussed. It will be understood by those skilled in the art to which the embodiments of the present invention pertain.
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。The logic and/or steps represented in the flowchart or otherwise described herein, for example, may be considered as an ordered list of executable instructions for implementing logical functions, and may be embodied in any computer readable medium, Used in conjunction with, or in conjunction with, an instruction execution system, apparatus, or device (eg, a computer-based system, a system including a processor, or other system that can fetch instructions and execute instructions from an instruction execution system, apparatus, or device) Or use with equipment. For the purposes of this specification, a "computer-readable medium" can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with the instruction execution system, apparatus, or device. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM). In addition, the computer readable medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if appropriate, other suitable The method is processed to obtain the program electronically and then stored in computer memory.
应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现 场可编程门阵列(FPGA)等。It should be understood that portions of the invention may be implemented in hardware, software, firmware or a combination thereof. In the above-described embodiments, multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuit, ASIC with suitable combinational logic gate, Programmable Gate Array (PGA), now Field programmable gate array (FPGA), etc.
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。One of ordinary skill in the art can understand that all or part of the steps carried by the method of implementing the above embodiments can be completed by a program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, one or a combination of the steps of the method embodiments is included.
此外,在本发明各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。 The above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like. Although the embodiments of the present invention have been shown and described, it is understood that the above-described embodiments are illustrative and are not to be construed as limiting the scope of the invention. The embodiments are subject to variations, modifications, substitutions and variations.

Claims (10)

  1. 一种检索词分类方法,包括:A method for classifying search terms, including:
    接收用户输入的检索词,并根据所述检索词获取对应的检索结果;Receiving a search term input by the user, and obtaining a corresponding search result according to the search term;
    获取所述检索结果的点击数据,并根据所述点击数据提取对应的数据特征;Obtaining click data of the search result, and extracting corresponding data features according to the click data;
    训练所述数据特征,以生成分类器;以及Training the data features to generate a classifier;
    根据所述分类器对所述检索词进行分类。The search terms are classified according to the classifier.
  2. 如权利要求1所述的方法,其中,所述数据特征包括所述检索结果的名称、类别、关键字、版本、文件大小、下载次数、开发者姓名中的一种或多种。The method of claim 1, wherein the data feature comprises one or more of a name, a category, a keyword, a version, a file size, a number of downloads, and a developer name of the search result.
  3. 如权利要求1所述的方法,其中,训练所述数据特征,以生成分类器,包括:The method of claim 1 wherein training the data feature to generate a classifier comprises:
    采用机器学习模型对所述数据特征进行训练,以生成分类器。The data features are trained using a machine learning model to generate a classifier.
  4. 如权利要求3所述的方法,其中,所述机器学习模型包括朴素贝叶斯模型、支持向量机模型、神经网络中的一种。The method of claim 3, wherein the machine learning model comprises one of a naive Bayesian model, a support vector machine model, and a neural network.
  5. 一种检索词分类装置,包括:A search term classification device comprising:
    获取模块,用于接收用户输入的检索词,并根据所述检索词获取对应的检索结果;An obtaining module, configured to receive a search term input by a user, and obtain a corresponding search result according to the search term;
    提取模块,用于获取所述检索结果的点击数据,并根据所述点击数据提取对应的数据特征;An extraction module, configured to acquire click data of the search result, and extract corresponding data features according to the click data;
    训练模块,用于训练所述数据特征,以生成分类器;以及a training module for training the data features to generate a classifier;
    分类模块,用于根据所述分类器对所述检索词进行分类。a classification module, configured to classify the search terms according to the classifier.
  6. 如权利要求5所述的装置,其中,所述数据特征包括所述检索结果的名称、类别、关键字、版本、文件大小、下载次数、开发者姓名中的一种或多种。The apparatus of claim 5, wherein the data feature comprises one or more of a name, a category, a keyword, a version, a file size, a download count, and a developer name of the search result.
  7. 如权利要求5所述的装置,其中,所述训练模块,用于:The apparatus of claim 5 wherein said training module is configured to:
    采用机器学习模型对所述数据特征进行训练,以生成分类器。The data features are trained using a machine learning model to generate a classifier.
  8. 如权利要求7所述的装置,其中,所述机器学习模型包括朴素贝叶斯模型、支持向量机模型、神经网络中的一种。The apparatus of claim 7, wherein the machine learning model comprises one of a naive Bayesian model, a support vector machine model, and a neural network.
  9. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    一个或者多个处理器;One or more processors;
    存储器;Memory
    一个或者多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多个处理器执行时,执行如权利要求1-4任一项所述的检索词分类方法。One or more programs, the one or more programs being stored in the memory, and when executed by the one or more processors, performing the search term classification method according to any one of claims 1-4 .
  10. 一种非易失性计算机存储介质,其特征在于,所述计算机存储介质存储有一个或者多个程序,当所述一个或者多个程序被一个设备执行时,使得所述设备执行如 权利要求1-4任一项所述的检索词分类方法。 A non-volatile computer storage medium, characterized in that the computer storage medium stores one or more programs, when the one or more programs are executed by a device, causing the device to perform The search term classification method according to any one of claims 1 to 4.
PCT/CN2016/097351 2016-05-24 2016-08-30 Search term classification method and device WO2017201907A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610350036.1 2016-05-24
CN201610350036.1A CN107423304A (en) 2016-05-24 2016-05-24 Term sorting technique and device

Publications (1)

Publication Number Publication Date
WO2017201907A1 true WO2017201907A1 (en) 2017-11-30

Family

ID=60410990

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/097351 WO2017201907A1 (en) 2016-05-24 2016-08-30 Search term classification method and device

Country Status (2)

Country Link
CN (1) CN107423304A (en)
WO (1) WO2017201907A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875781A (en) * 2018-05-07 2018-11-23 腾讯科技(深圳)有限公司 A kind of labeling method, apparatus, electronic equipment and storage medium
CN110019808A (en) * 2017-12-28 2019-07-16 北京京东尚科信息技术有限公司 A kind of method and apparatus of predictive information attribute

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182175B (en) * 2017-12-29 2021-01-05 中国银联股份有限公司 Text quality index obtaining method and device
CN111177521A (en) * 2018-10-24 2020-05-19 北京搜狗科技发展有限公司 Method and device for determining query term classification model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551806A (en) * 2008-04-03 2009-10-07 北京搜狗科技发展有限公司 Personalized website navigation method and system
US20110314011A1 (en) * 2010-06-18 2011-12-22 Microsoft Corporation Automatically generating training data
CN103020066A (en) * 2011-09-21 2013-04-03 北京百度网讯科技有限公司 Method and device for recognizing search demand
CN104050240A (en) * 2014-05-26 2014-09-17 北京奇虎科技有限公司 Method and device for determining categorical attribute of search query word
CN104199822A (en) * 2014-07-11 2014-12-10 五八同城信息技术有限公司 Method and system for identifying demand classification corresponding to searching

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211368B (en) * 2007-12-25 2011-08-03 北京搜狗科技发展有限公司 Method for classifying search term, device and search engine system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551806A (en) * 2008-04-03 2009-10-07 北京搜狗科技发展有限公司 Personalized website navigation method and system
US20110314011A1 (en) * 2010-06-18 2011-12-22 Microsoft Corporation Automatically generating training data
CN103020066A (en) * 2011-09-21 2013-04-03 北京百度网讯科技有限公司 Method and device for recognizing search demand
CN104050240A (en) * 2014-05-26 2014-09-17 北京奇虎科技有限公司 Method and device for determining categorical attribute of search query word
CN104199822A (en) * 2014-07-11 2014-12-10 五八同城信息技术有限公司 Method and system for identifying demand classification corresponding to searching

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019808A (en) * 2017-12-28 2019-07-16 北京京东尚科信息技术有限公司 A kind of method and apparatus of predictive information attribute
CN108875781A (en) * 2018-05-07 2018-11-23 腾讯科技(深圳)有限公司 A kind of labeling method, apparatus, electronic equipment and storage medium
CN108875781B (en) * 2018-05-07 2022-08-19 腾讯科技(深圳)有限公司 Label classification method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN107423304A (en) 2017-12-01

Similar Documents

Publication Publication Date Title
US10566009B1 (en) Audio classifier
US10824874B2 (en) Method and apparatus for processing video
CN108009228B (en) Method and device for setting content label and storage medium
US20180336193A1 (en) Artificial Intelligence Based Method and Apparatus for Generating Article
US20180357312A1 (en) Generating a playlist
KR102363369B1 (en) Generating vector representations of documents
WO2018113498A1 (en) Method and apparatus for retrieving legal knowledge
US20170199943A1 (en) User interface for multivariate searching
CN106098063B (en) Voice control method, terminal device and server
US10157619B2 (en) Method and device for searching according to speech based on artificial intelligence
WO2017201907A1 (en) Search term classification method and device
WO2019245781A1 (en) Video summarization and collaboration systems and methods
US8924491B2 (en) Tracking message topics in an interactive messaging environment
CN103440243B (en) A kind of teaching resource recommendation method and device thereof
WO2020000876A1 (en) Model generating method and device
CN111091811B (en) Method and device for processing voice training data and storage medium
WO2016201963A1 (en) Application pushing method and device
CN113094552A (en) Video template searching method and device, server and readable storage medium
KR20170049380A (en) Tag processing method and device
CN108427690A (en) Information distribution method and device
CN106372231A (en) Search method and device
WO2022188844A1 (en) Video classification method and apparatus, device, and medium
JP7315321B2 (en) Generation device, generation method and generation program
US11410706B2 (en) Content pushing method for display device, pushing device and display device
WO2023128877A2 (en) Video generating method and apparatus, electronic device, and readable storage medium

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16902893

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16902893

Country of ref document: EP

Kind code of ref document: A1