WO2016101737A1 - Search query method and apparatus - Google Patents

Search query method and apparatus Download PDF

Info

Publication number
WO2016101737A1
WO2016101737A1 PCT/CN2015/095018 CN2015095018W WO2016101737A1 WO 2016101737 A1 WO2016101737 A1 WO 2016101737A1 CN 2015095018 W CN2015095018 W CN 2015095018W WO 2016101737 A1 WO2016101737 A1 WO 2016101737A1
Authority
WO
WIPO (PCT)
Prior art keywords
site
name
query keyword
search
query
Prior art date
Application number
PCT/CN2015/095018
Other languages
French (fr)
Chinese (zh)
Inventor
郭峰
李亚平
彭仁刚
秦吉胜
Original Assignee
北京奇虎科技有限公司
奇智软件(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京奇虎科技有限公司, 奇智软件(北京)有限公司 filed Critical 北京奇虎科技有限公司
Publication of WO2016101737A1 publication Critical patent/WO2016101737A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • the present invention relates to the field of Internet technologies, and in particular, to a search query method and apparatus.
  • the existing search engine usually directly uses the user's query keyword to perform a search query.
  • the query keyword is segmented and the core word is selected, the webpage containing the core word is recalled and sorted, or the query keyword is escaped.
  • the query keyword is then converted into multiple query keywords, and then the results of each query keyword will be merged, and finally sorted.
  • search engines do not perform well in some cases.
  • a user uses a search engine, sometimes he or she only wants to obtain a resource page within a certain website. Therefore, the query is often performed by adding a site name to the query keyword, for example, "Gourd 360 Video", the user's The purpose is actually to find the "Gourd Baby” program on the "360 Movies" site.
  • the existing search engine will cause: 1) the result of the query contains non-user-required site results, such as the "Gourd Baby” program on the non-"360 movie” site; 2) in the query result, the site home page may be more advanced Because the weight of the homepage tends to be larger, such as the "360 movie” site homepage, but these homepages are not the result of the user's expectations; 3) too many recall results, resulting in a large amount of query calculation; 4) some cheating sites may be wrong The lifting of power led to the release of the front. Thus, a new search query scheme is needed to satisfy the user's need to obtain resources at a certain site.
  • the present invention has been made in order to provide a search query method and apparatus that overcomes the above problems or at least partially solves the above problems.
  • a search query method including: identifying whether a first query keyword input by a user conforms to a preset definition rule for performing a search within a limited site; The search rule is searched for, and the search result corresponding to the first query keyword is searched for under the domain name of the limited site.
  • a search query apparatus including: a first query keyword identification module, configured to identify whether a first query keyword input by a user meets a preset search within a limited site a qualifying rule; the search module, if it meets a qualifying rule suitable for searching within the defined site, searches for the search result corresponding to the first query keyword under the domain name of the qualified site.
  • a computer program comprising a computer readable generation A code, when the computer readable code is run on a computing device, causes the computing device to perform the search query method described above.
  • a computer readable medium wherein the computer program described above is stored.
  • the search query method and apparatus of the present invention have at least the following advantages:
  • the domain name of the limited site may be performed. Search queries; therefore, the results of a search query will not contain results outside of the unqualified site, nor will they contain the first page of some sites; since only searches on restricted sites, the search will generate less computation, It's easier to avoid the interference of cheat pages on unrestricted sites.
  • FIG. 1 shows a flow chart of a search query method in accordance with one embodiment of the present invention
  • FIG. 2 shows a flow chart of a search query method in accordance with one embodiment of the present invention
  • FIG. 3 shows a flow chart of a search query method in accordance with one embodiment of the present invention
  • FIG. 4 shows a flow chart of a search query method in accordance with one embodiment of the present invention
  • FIG. 5 shows a flow chart of a search query method in accordance with one embodiment of the present invention
  • FIG. 6 shows a flow chart of a search query method in accordance with one embodiment of the present invention
  • Figure 7 shows a block diagram of a search query device in accordance with one embodiment of the present invention.
  • Figure 8 shows a block diagram of a search query device in accordance with one embodiment of the present invention.
  • Figure 9 shows a block diagram of a search query device in accordance with one embodiment of the present invention.
  • Figure 10 schematically shows a block diagram of a computing device for performing the method according to the invention
  • Fig. 11 schematically shows a storage unit for holding or carrying program code implementing the method according to the invention.
  • an embodiment of the present invention provides a search query method, including:
  • Step 110 Identify whether the first query keyword input by the user meets a preset qualification rule for searching within the limited site.
  • the type of the rule is not limited.
  • the rule may be: recording the names of the multiple sites. If the site name is included in the first query keyword, it is determined that the rule needs to be Search under the site.
  • Step 120 If the qualified rule suitable for searching within the limited site is met, search for the search result corresponding to the first query keyword under the domain name of the qualified site.
  • the result of the search query does not include the results outside the unqualified site, and does not include the home page of some sites; since the search is only performed on the limited site, the search generates a larger amount of calculation. Small, it is also easier to avoid the interference of cheat pages on unrestricted sites.
  • the qualification rule includes: if the user inputs a query keyword that includes "360 movie”, then a search is required in the "360 movie” site.
  • the search engine chooses to search for the "360 movie” domain name "www.360kan.com”, get “360 video”
  • the "Gourd Baby” program on the site is provided to the user as a search result.
  • an embodiment of the present invention provides a search query method, including:
  • Step 210 Obtain a second query keyword corresponding to the URL from the preset search log.
  • the search log may be a log recorded according to the search engine behavior, and the second query keyword is a historical query keyword.
  • Step 220 Extract a site name from a second query keyword corresponding to the URL.
  • the extracted site names may be one or more.
  • Step 230 Perform a training according to the second query keyword and the site name to obtain a qualified rule, and obtain a name of the qualified site when the first query keyword meets the qualified rule.
  • the definition rule can be mined based on the historical data, and the historical data reflects the historical search behavior of the user, so the qualification rule obtained based on the historical data is more applicable to the user. Specifically, it is also possible to perform training based on whether the URL is clicked or not.
  • Step 240 Identify whether the first query keyword input by the user meets a preset qualification rule for searching within the limited site.
  • Step 250 If the qualified rule suitable for searching within the limited site is met, the domain name of the limited site is determined according to the name of the qualified site, and the search result corresponding to the first query keyword is searched under the domain name of the limited site. In this embodiment, as long as the name of the site is determined, the domain name of the site can be determined.
  • the search engine chooses to search for the "360 movie” domain name "www.360kan.com", and obtains the "Gourd Baby” program on the "360 film and television” site as the search result. Provided to the user.
  • an embodiment of the present invention provides a search query method, including:
  • Step 310 Obtain a second query keyword corresponding to the URL from the preset search log.
  • Step 320 Extract a site name from a second query keyword corresponding to the URL.
  • Step 330 Perform a training according to the second query keyword and the site name to obtain a qualified rule, and obtain a name of the qualified site when the first query keyword meets the qualified rule.
  • step 340 the domain name is extracted from the URL.
  • Step 350 Establish a correspondence between the extracted domain name and the site name.
  • a domain name can be associated with multiple site names.
  • Step 360 Identify whether the first query keyword input by the user meets a preset qualification rule for searching within the limited site.
  • Step 370 If the qualified rule suitable for searching within the limited site is met, the domain name of the limited site is searched according to the name and correspondence of the qualified site, and the search result corresponding to the first query keyword is searched under the domain name of the limited site. . According to the technical solution of the embodiment, the domain name of the limited site can be quickly found through the established correspondence.
  • an embodiment of the present invention provides a search query method, including:
  • Step 410 Obtain a second query keyword corresponding to the URL from the preset search log.
  • Step 420 Extract a site name from a second query keyword corresponding to the URL.
  • Step 430 For each extracted site name, determine whether the number of clicks of the corresponding domain name home page is retained according to the name of each domain name when the site name appears in the second query keyword. According to the technical solution of the embodiment, since there may be multiple site names extracted from the same query keyword, it is necessary to filter and retain the same, and the higher the number of clicks of the corresponding domain home page, the more the domain name and the site name are related. Higher, the site name is more likely to be correct and it is necessary to keep it.
  • Step 440 Perform a training according to the second query keyword and the site name to obtain a qualified rule, and obtain a name of the limited site when the first query keyword meets the qualified rule.
  • Step 450 Identify whether the first query keyword input by the user meets a preset definition rule for searching within the limited site.
  • Step 460 If the qualified rule suitable for searching within the limited site is met, the domain name of the limited site is determined according to the name of the qualified site, and the search result corresponding to the first query keyword is searched for under the domain name of the limited site.
  • an embodiment of the present invention provides a search query method, including:
  • Step 510 Obtain a second query keyword corresponding to the URL from the preset search log.
  • Step 520 Extract the site name from the second query keyword corresponding to the URL.
  • Step 530 Perform a training according to the second query keyword and the site name to obtain a qualified rule, and obtain a name of the qualified site when the first query keyword meets the qualified rule.
  • Step 540 Identify whether the first query keyword input by the user meets a preset qualification rule for searching within the limited site.
  • Step 550 if the qualified rule suitable for searching within the limited site is met, the corresponding content of the name of the defined site is identified from the first query keyword.
  • the corresponding content may be the same content as the qualified site name, or a synonym, or a pinyin or English comparison that defines the name of the site.
  • step 560 under the domain name of the limited site, the search is performed according to the part of the first query keyword except the corresponding content.
  • the two parts can be reasonably divided by the embodiment to accurately perform the search for.
  • a user inputs "cucurbita 360yingshi” to perform a search query, and judges the user's demand to search on the "360 movie” site based on the qualification rule, and recognizes “360yingshi” corresponding to "360 movie", wherein, "Yingshi” is the pinyin of "film”, the search engine chooses to search for the new query keyword under the "360 movie” domain name "www.360kan.com” with "cucurbit baby”.
  • an embodiment of the present invention provides a search query method, including:
  • Step 610 Obtain a second query keyword corresponding to the URL from the preset search log.
  • Step 620 Extract a site name from a second query keyword corresponding to the URL.
  • Step 630 Perform a training according to the second query keyword and the site name to obtain a qualified rule, and obtain a name of the qualified site when the first query keyword meets the qualified rule.
  • Step 640 Identify whether the first query keyword input by the user meets a preset definition rule for searching within the limited site.
  • Step 650 If the qualified rule suitable for searching within the limited site is met, the first query keyword is segmented to obtain a plurality of words, and each word is determined to be a content corresponding to the limited site name.
  • the word segmentation technique can be used to perform word segmentation. Before the word segmentation, you can also set the blacklist and whitelist. The blacklist can set some words that need to be blocked. Some words that return a fixed result can be set in the whitelist. At the same time, in the word segmentation, it is also necessary to avoid segmentation of some protection words, for example, "how steel is made"; it is also possible to filter out some words, such as some predicates.
  • step 660 under the domain name of the limited site, the search is performed according to the part of the first query keyword except the corresponding content.
  • a user enters the "Huluwa 360 Movie” to conduct a search query, based on the qualification rules to determine the user's needs to search on the "360 film and television” site, and the "cucurbit baby 360" film segmentation to get “cucurbit baby” and " 360 film and television, in which "360 film and television” corresponds to the site, the search engine chooses to search for the new query keyword under the "360 movie” domain name "www.360kan.com” with "cucurbit baby”.
  • an embodiment of the present invention provides a search query apparatus, including:
  • the first query keyword identification module 710 identifies whether the first query keyword input by the user meets a preset definition rule for searching within the limited site.
  • the type of the rule is not limited.
  • the rule may be: recording the names of the multiple sites. If the site name is included in the first query keyword, it is determined that the rule needs to be Search under the site.
  • the search module 720 searches for a search result corresponding to the first query keyword under the domain name of the defined site if it meets a qualified rule suitable for searching within the defined site.
  • the result of the search query does not include the results outside the unqualified site, and does not include the home page of some sites; since the search is only performed on the limited site, the search generates a larger amount of calculation. Small, it is also easier to avoid the interference of cheat pages on unrestricted sites.
  • the qualification rule includes: if the user input of the query keyword includes "360 movie”, it is necessary to search in the "360 movie” site.
  • the "Gourd Baby” program on the site is provided to the user as a search result.
  • an embodiment of the present invention provides a search query apparatus, including:
  • the second query keyword obtaining module 810 obtains a second query keyword corresponding to the URL from the preset search log.
  • the search log may be a log recorded according to the search engine behavior, and the second query keyword is a historical query keyword.
  • the site name extraction module 820 extracts the site name from the second query keyword corresponding to the URL.
  • the extracted site names may be one or more.
  • the training module 830 performs training according to the second query keyword and the site name to obtain a qualified rule, and obtains a name of the limited site when the first query keyword meets the qualified rule.
  • the definition rule can be mined based on the historical data, and the historical data reflects the historical search behavior of the user, so the qualification rule obtained based on the historical data is more applicable to the user. Specifically, it is also possible to perform training based on whether or not the URL is clicked.
  • the first query keyword identification module 840 identifies whether the first query keyword input by the user conforms to a preset definition rule for searching within the limited site.
  • the search module 850 determines the domain name of the limited site according to the name of the qualified site, and searches for the search result corresponding to the first query keyword under the domain name of the limited site. In this embodiment, as long as the name of the site is determined, the domain name of the site can be determined.
  • an embodiment of the present invention provides a search query apparatus, including:
  • the second query keyword obtaining module 910 obtains a second query keyword corresponding to the URL from the preset search log.
  • the site name extraction module 920 extracts the site name from the second query keyword corresponding to the URL.
  • the training module 930 performs training according to the second query keyword and the site name to obtain a qualified rule, and obtains a name of the limited site when the first query keyword meets the qualified rule.
  • the domain name extraction module 940 extracts the domain name from the URL.
  • the correspondence establishing module 950 establishes a correspondence between the extracted domain name and the site name.
  • a domain name can be associated with multiple site names.
  • the first query keyword identification module 960 identifies whether the first query keyword input by the user meets a preset definition rule for searching within the limited site.
  • the search module 970 if it meets the qualification rule suitable for searching within the limited site, searches for the domain name of the limited site according to the name and correspondence of the qualified site, and searches for the search corresponding to the first query keyword under the domain name of the restricted site. result.
  • the domain name of the limited site can be quickly found through the established correspondence.
  • an embodiment of the present invention provides a search query apparatus, including:
  • the second query keyword extraction module 810 obtains a second query keyword corresponding to the URL from the preset search log.
  • the site name extraction module 820 extracts the site name from the second query keyword corresponding to the URL.
  • the site name extraction module 820 determines, for each extracted site name, whether the number of clicks of the corresponding domain name home page is retained for each domain name when it appears in the second query keyword. According to the technical solution of the embodiment, since there may be multiple site names extracted from the same query keyword, it is necessary to filter and retain the same, and the higher the number of clicks of the corresponding domain home page, the more the domain name and the site name are related. Higher, the site name is more likely to be correct and it is necessary to keep it.
  • the training module 830 performs training according to the second query keyword and the site name to obtain a qualified rule, and obtains a name of the limited site when the first query keyword meets the qualified rule.
  • the first query keyword identification module 840 identifies whether the first query keyword input by the user conforms to a preset definition rule for searching within the limited site.
  • the search module 850 determines the domain name of the limited site according to the name of the qualified site, and searches for the search result corresponding to the first query keyword under the domain name of the limited site.
  • an embodiment of the present invention provides a search query apparatus, including:
  • the second query keyword obtaining module 810 obtains a second query keyword corresponding to the URL from the preset search log.
  • the site name extraction module 820 extracts the site name from the second query keyword corresponding to the URL.
  • the training module 830 performs training according to the second query keyword and the site name to obtain a qualified rule, and obtains a name of the limited site when the first query keyword meets the qualified rule.
  • the first query keyword identification module 840 identifies whether the first query keyword input by the user conforms to a preset definition rule for searching within the limited site.
  • the search module 850 if conforming to a qualifying rule suitable for searching within the defined site, identifies corresponding content from the first query keyword with the name of the defined site.
  • the corresponding content may be the same content as the qualified site name, or a synonym, or a pinyin or English comparison that defines the name of the site.
  • the search module 850 searches for the portion of the first query keyword other than the corresponding content under the domain name of the limited site. According to the technical solution of the embodiment, since part of the query keywords input by the user is used to define the site, and another part is used to reflect the resources of the request, the two parts can be reasonably divided by the embodiment to accurately perform the search for.
  • a user inputs "cucurbita 360yingshi” to perform a search query, and judges the user's demand to search on the "360 film and television” site based on the qualification rule, and recognizes “360yingshi” corresponding to "360 film and television", wherein, “Yingshi” is the pinyin of "film”, the search engine chooses to search for the new query keyword under the "360 movie” domain name "www.360kan.com” with "cucurbit baby”.
  • an embodiment of the present invention provides a search query apparatus, including:
  • the second query keyword obtaining module 810 obtains a second query keyword corresponding to the URL from the preset search log.
  • the site name extraction module 820 extracts the site name from the second query keyword corresponding to the URL.
  • the training module 830 performs training according to the second query keyword and the site name to obtain a qualified rule, and obtains a name of the limited site when the first query keyword meets the qualified rule.
  • the first query keyword identification module 840 identifies whether the first query keyword input by the user meets a preset definition rule for searching within the defined point.
  • the search module 850 if conforming to a qualification rule suitable for searching within the limited site, classifies the first query keyword to obtain a plurality of words, and respectively determines whether each word is a content corresponding to the limited site name.
  • the word segmentation technique can be used to perform word segmentation. Before the word segmentation, you can also set the blacklist and whitelist. In the blacklist, you can set some words that need to be blocked. In the whitelist, you can set some words that return fixed results. At the same time, in the word segmentation, it is also necessary to avoid segmentation of some protection words, for example, "how steel is made"; it is also possible to filter out some words, such as some predicates.
  • the search module 850 searches for the portion of the first query keyword other than the corresponding content under the domain name of the limited site.
  • a user enters the "Huluwa 360 Movie” to conduct a search query, based on the qualification rules to determine the user's needs to search on the "360 film and television” site, and the "cucurbit baby 360" film segmentation to get “cucurbit baby” and " 360 film and television, in which "360 film and television” corresponds to the site, the search engine chooses to search for the new query keyword under the "360 movie” domain name "www.360kan.com” with "cucurbit baby”.
  • modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment.
  • the modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components.
  • any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined.
  • Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
  • the various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processor may be used in practice to implement some or all of the functionality of some or all of the components of the search query device in accordance with embodiments of the present invention.
  • the invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein.
  • a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
  • Figure 10 schematically illustrates a block diagram of a computing device for performing the method in accordance with the present invention.
  • the computing device conventionally includes a processor 1010 and a computer program product or computer readable medium in the form of a memory 1020.
  • the memory 1020 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM.
  • the memory 1020 has a memory space 1030 for executing program code 1031 of any of the above method steps.
  • storage space 1030 for program code may include various program code 1031 for implementing various steps in the above methods, respectively.
  • the program code can be read from or written to one or more computer program products.
  • Computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • Such a computer program product is typically as described with reference to Figure 11 Portable or fixed storage unit.
  • the storage unit may have storage segments, storage spaces, and the like that are similarly arranged to memory 1020 in the computing device of FIG.
  • the program code can be compressed, for example, in an appropriate form.
  • the storage unit comprises computer readable code 1031' for performing the steps of the method according to the invention, ie code that can be read by, for example, a processor such as 1010, which when executed by the computing device causes the calculation The device performs the various steps in the methods described above.
  • the present invention is applicable to computer systems/servers that can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems/servers include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • the computer system/server can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.

Abstract

Disclosed are a search query method and apparatus, which mainly relate to the technical field of Internet and mainly aim to meet the requirement of a user to acquire resources at a limit station. The method comprises: recognising whether a first query keyword input by a user conforms to a pre-set limit rule for searching in a limit station; and if the first query keyword conforms to the limit rule suitable for searching in the limit station, searching under a domain name of the limit station for a search result corresponding to the first query keyword. According to the present invention, the result of a search query will not contain a result apart from non-limit stations and will not contain home pages of certain stations. Since the search is only performed on the limit station, a calculation amount engendered by the search is small, and interference from a spam page of a non-limit station can be more easily avoided.

Description

搜索查询方法和装置Search query method and device 技术领域Technical field
本发明涉及互联网技术领域,具体而言,涉及一种搜索查询方法和装置。The present invention relates to the field of Internet technologies, and in particular, to a search query method and apparatus.
背景技术Background technique
对于目前的搜索引擎,准确理解用户意图,提高搜索引擎结果的质量,改善用户搜索体验是搜索引擎的目标之一。For current search engines, accurately understanding user intent, improving the quality of search engine results, and improving user search experience are one of the goals of search engines.
现有的搜索引擎,通常直接使用用户的查询关键词进行搜索查询,首先对查询关键词进行分词并从中选取核心词,召回含有核心词的网页并进行排序;或者对查询关键词进行转义,如同义词转义,句式转义等,之后该查询关键词被转化成多个查询关键词,然后每个查询关键词会召回的结果进行合并,最后统一进行排序。The existing search engine usually directly uses the user's query keyword to perform a search query. First, the query keyword is segmented and the core word is selected, the webpage containing the core word is recalled and sorted, or the query keyword is escaped. Like the hexadecimal escaping, sentence escaping, etc., the query keyword is then converted into multiple query keywords, and then the results of each query keyword will be merged, and finally sorted.
现有的搜索引擎,在某些情况下的效果并不理想。用户使用搜索引擎的时候,有时候希望仅仅得到某个网站内的资源页面,所以往往通过在查询关键词中加入站点名称的方式来进行这种查询,例如:“葫芦娃360影视”,用户的目的实际上是在“360影视”的站点上查找“葫芦娃”节目。则现有的搜索引擎会造成:1)查询结果中含有非用户需要的站点结果,例如非“360影视”站点上的“葫芦娃”节目;2)查询结果中,站点首页可能会更靠前,因为首页的权重往往更大,如“360影视”的站点首页,但这些首页其实不是用户期待的结果;3)召回结果过多,导致查询计算量较大;4)有些作弊网站可能会错误的提权,导致放到前面。由此可见,需要一种新的搜索查询方案来满足用户在某个站点获取资源的需求。Existing search engines do not perform well in some cases. When a user uses a search engine, sometimes he or she only wants to obtain a resource page within a certain website. Therefore, the query is often performed by adding a site name to the query keyword, for example, "Gourd 360 Video", the user's The purpose is actually to find the "Gourd Baby" program on the "360 Movies" site. The existing search engine will cause: 1) the result of the query contains non-user-required site results, such as the "Gourd Baby" program on the non-"360 movie" site; 2) in the query result, the site home page may be more advanced Because the weight of the homepage tends to be larger, such as the "360 movie" site homepage, but these homepages are not the result of the user's expectations; 3) too many recall results, resulting in a large amount of query calculation; 4) some cheating sites may be wrong The lifting of power led to the release of the front. Thus, a new search query scheme is needed to satisfy the user's need to obtain resources at a certain site.
发明内容Summary of the invention
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的一种搜索查询方法和装置。In view of the above problems, the present invention has been made in order to provide a search query method and apparatus that overcomes the above problems or at least partially solves the above problems.
依据本发明的一个方面,提供了一种搜索查询方法,其包括:识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则;如符合适于在限定站点内进行搜索的限定规则,则在所述限定站点的域名下搜索所述第一查询关键词对应的搜索结果。According to an aspect of the present invention, a search query method is provided, including: identifying whether a first query keyword input by a user conforms to a preset definition rule for performing a search within a limited site; The search rule is searched for, and the search result corresponding to the first query keyword is searched for under the domain name of the limited site.
依据本发明的另一方面,还提供了一种搜索查询装置,其包括:第一查询关键词识别模块,用于识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则;搜索模块,如符合适于在限定站点内进行搜索的限定规则,则在所述限定站点的域名下搜索所述第一查询关键词对应的搜索结果。According to another aspect of the present invention, a search query apparatus is further provided, including: a first query keyword identification module, configured to identify whether a first query keyword input by a user meets a preset search within a limited site a qualifying rule; the search module, if it meets a qualifying rule suitable for searching within the defined site, searches for the search result corresponding to the first query keyword under the domain name of the qualified site.
根据本发明的又一个方面,提出了一种计算机程序,包括计算机可读代 码,当所述计算机可读代码在计算设备上运行时,导致所述计算设备执行上文所述的搜索查询方法。According to still another aspect of the present invention, a computer program is provided, comprising a computer readable generation A code, when the computer readable code is run on a computing device, causes the computing device to perform the search query method described above.
根据本发明的再一个方面,提出了一种计算机可读介质,其中存储了上述的计算机程序。According to still another aspect of the present invention, a computer readable medium is proposed, wherein the computer program described above is stored.
根据以上技术方案,可知本发明的搜索查询方法和装置至少具有以下优点:According to the above technical solution, it can be seen that the search query method and apparatus of the present invention have at least the following advantages:
在接收到用户的查询关键词时,首先分析用户输入查询关键词的目的是否是为了在限定站点搜索资源;在确定用户需求在限定站点搜索资源之后,则可以在该限定站点的域名之下进行搜索查询;因此,搜索查询的结果既不会包含非限定站点外的结果,也不会包含某些站点的首页;由于仅在限定站点上进行搜索,所以搜索产生的计算量也较小,也更容易避免非限定站点的作弊网页的干扰。When receiving the query keyword of the user, first analyzing whether the purpose of the user inputting the query keyword is to search for the resource in the limited site; after determining that the user needs to search for the resource in the limited site, the domain name of the limited site may be performed. Search queries; therefore, the results of a search query will not contain results outside of the unqualified site, nor will they contain the first page of some sites; since only searches on restricted sites, the search will generate less computation, It's easier to avoid the interference of cheat pages on unrestricted sites.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the present invention, and the above-described and other objects, features and advantages of the present invention can be more clearly understood. Specific embodiments of the invention are set forth below.
附图说明DRAWINGS
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those skilled in the art from a The drawings are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the drawing:
图1示出了根据本发明的一个实施例的搜索查询方法的流程图;1 shows a flow chart of a search query method in accordance with one embodiment of the present invention;
图2示出了根据本发明的一个实施例的搜索查询方法的流程图;2 shows a flow chart of a search query method in accordance with one embodiment of the present invention;
图3示出了根据本发明的一个实施例的搜索查询方法的流程图;3 shows a flow chart of a search query method in accordance with one embodiment of the present invention;
图4示出了根据本发明的一个实施例的搜索查询方法的流程图;4 shows a flow chart of a search query method in accordance with one embodiment of the present invention;
图5示出了根据本发明的一个实施例的搜索查询方法的流程图;FIG. 5 shows a flow chart of a search query method in accordance with one embodiment of the present invention;
图6示出了根据本发明的一个实施例的搜索查询方法的流程图;6 shows a flow chart of a search query method in accordance with one embodiment of the present invention;
图7示出了根据本发明的一个实施例的搜索查询装置的框图;Figure 7 shows a block diagram of a search query device in accordance with one embodiment of the present invention;
图8示出了根据本发明的一个实施例的搜索查询装置的框图;Figure 8 shows a block diagram of a search query device in accordance with one embodiment of the present invention;
图9示出了根据本发明的一个实施例的搜索查询装置的框图;Figure 9 shows a block diagram of a search query device in accordance with one embodiment of the present invention;
图10示意性地示出了用于执行根据本发明的方法的计算设备的框图;以及Figure 10 schematically shows a block diagram of a computing device for performing the method according to the invention;
图11示意性地示出了用于保持或者携带实现根据本发明的方法的程序代码的存储单元。Fig. 11 schematically shows a storage unit for holding or carrying program code implementing the method according to the invention.
具体实施例Specific embodiment
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地 理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the embodiments of the present invention have been shown in the drawings, the embodiments Rather, these embodiments are provided to provide a more thorough The disclosure is to be understood, and the scope of the present disclosure can be fully conveyed to those skilled in the art.
如图1所示,本发明的一个实施例中提供了一种搜索查询方法,其包括:As shown in FIG. 1, an embodiment of the present invention provides a search query method, including:
步骤110,识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则。在本实施例中,对限定规则的类型不进行限制,例如,限定规则可以是:对多个站点的名称进行记录,如果第一查询关键词中包含了其中的站点名称,则判断需要在该站点下进行搜索。Step 110: Identify whether the first query keyword input by the user meets a preset qualification rule for searching within the limited site. In this embodiment, the type of the rule is not limited. For example, the rule may be: recording the names of the multiple sites. If the site name is included in the first query keyword, it is determined that the rule needs to be Search under the site.
步骤120,如符合适于在限定站点内进行搜索的限定规则,则在限定站点的域名下搜索第一查询关键词对应的搜索结果。根据本实施例的技术方案,搜索查询的结果既不会包含非限定站点外的结果,也不会包含某些站点的首页;由于仅在限定站点上进行搜索,所以搜索产生的计算量也较小,也更容易避免非限定站点的作弊网页的干扰。Step 120: If the qualified rule suitable for searching within the limited site is met, search for the search result corresponding to the first query keyword under the domain name of the qualified site. According to the technical solution of the embodiment, the result of the search query does not include the results outside the unqualified site, and does not include the home page of some sites; since the search is only performed on the limited site, the search generates a larger amount of calculation. Small, it is also easier to avoid the interference of cheat pages on unrestricted sites.
根据图1,假设限定规则包含:如果用户输入的查询关键词中包含“360影视”,则需要在“360影视”站点中进行搜索。某用户输入了“葫芦娃360影视”进行搜索查询,其中包含了“360影视”,则搜索引擎选择在“360影视”的域名“www.360kan.com”下进行搜索查询,得到“360影视”站点上的“葫芦娃”节目作为搜索结果提供给用户。According to FIG. 1, it is assumed that the qualification rule includes: if the user inputs a query keyword that includes "360 movie", then a search is required in the "360 movie" site. A user entered the "Huluwa 360 Video" search query, which includes "360 video", the search engine chooses to search for the "360 movie" domain name "www.360kan.com", get "360 video" The "Gourd Baby" program on the site is provided to the user as a search result.
如图2所示,本发明的一个实施例中提供了一种搜索查询方法,其包括:As shown in FIG. 2, an embodiment of the present invention provides a search query method, including:
步骤210,从预设的搜索日志中获取URL对应的第二查询关键词。在本实施例中,搜索日志可以是根据搜索引擎行为记录的日志,第二查询关键词即为历史查询关键词。Step 210: Obtain a second query keyword corresponding to the URL from the preset search log. In this embodiment, the search log may be a log recorded according to the search engine behavior, and the second query keyword is a historical query keyword.
步骤220,从URL对应的第二查询关键词中提取站点名称。在本实施例中,提取的站点名称可以为一个或多个。Step 220: Extract a site name from a second query keyword corresponding to the URL. In this embodiment, the extracted site names may be one or more.
步骤230,根据第二查询关键词和站点名称进行训练得到限定规则,且在第一查询关键词符合限定规则时,得到限定站点的名称。在本实施例中,基于历史数据可以挖掘到限定规则,历史数据反映了用户的历史搜索行为,所以基于历史数据得到的限定规则对用户更加适用。具体地,还可以依据该URL是否被点击的数据来进行训练。Step 230: Perform a training according to the second query keyword and the site name to obtain a qualified rule, and obtain a name of the qualified site when the first query keyword meets the qualified rule. In this embodiment, the definition rule can be mined based on the historical data, and the historical data reflects the historical search behavior of the user, so the qualification rule obtained based on the historical data is more applicable to the user. Specifically, it is also possible to perform training based on whether the URL is clicked or not.
步骤240,识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则。Step 240: Identify whether the first query keyword input by the user meets a preset qualification rule for searching within the limited site.
步骤250,如符合适于在限定站点内进行搜索的限定规则,则按限定站点的名称,确定限定站点的域名,并在限定站点的域名下搜索第一查询关键词对应的搜索结果。在本实施例中,只要确定了站点的名称,那么站点的域名也就能够确定。Step 250: If the qualified rule suitable for searching within the limited site is met, the domain name of the limited site is determined according to the name of the qualified site, and the search result corresponding to the first query keyword is searched under the domain name of the limited site. In this embodiment, as long as the name of the site is determined, the domain name of the site can be determined.
根据图2,在搜索引擎的搜索日志中,假设存在了“www.360kan.com/jqm”的url,则获取其对应的查询关键词“机器猫360影视”,从中提取出站点名称“360影视”,则基于“机器猫360影视”和“360影视”可以进行限定规则的训练,可以采用决策树方式训练。某用户输入了“葫芦娃360影视”进行搜索查询,搜索引擎判断其符合限定规则, 并判断限定站点名称为“360影视”,则搜索引擎选择在“360影视”的域名“www.360kan.com”下进行搜索查询,得到“360影视”站点上的“葫芦娃”节目作为搜索结果提供给用户。According to Figure 2, in the search log of the search engine, assuming that the url of "www.360kan.com/jqm" exists, the corresponding query keyword "machine cat 360 movie" is obtained, and the site name "360 film and television" is extracted therefrom. ", based on "Don't Cat 360 Movies" and "360 Movies and TVs" can be limited training, you can use the decision tree to train. A user enters the "Gourd Baby 360 Movies" to conduct a search query, and the search engine judges that it meets the qualifying rules. And judged that the limited site name is "360 film and television", the search engine chooses to search for the "360 movie" domain name "www.360kan.com", and obtains the "Gourd Baby" program on the "360 film and television" site as the search result. Provided to the user.
如图3所示,本发明的一个实施例中提供了一种搜索查询方法,其包括:As shown in FIG. 3, an embodiment of the present invention provides a search query method, including:
步骤310,从预设的搜索日志中获取URL对应的第二查询关键词。Step 310: Obtain a second query keyword corresponding to the URL from the preset search log.
步骤320,从URL对应的第二查询关键词中提取站点名称。Step 320: Extract a site name from a second query keyword corresponding to the URL.
步骤330,根据第二查询关键词和站点名称进行训练得到限定规则,且在第一查询关键词符合限定规则时,得到限定站点的名称。Step 330: Perform a training according to the second query keyword and the site name to obtain a qualified rule, and obtain a name of the qualified site when the first query keyword meets the qualified rule.
步骤340,从URL中提取域名。In step 340, the domain name is extracted from the URL.
步骤350,建立所提取域名与站点名称之间的对应关系。在本实施例中,一个域名可以与多个站点名称之间建立对应关系。Step 350: Establish a correspondence between the extracted domain name and the site name. In this embodiment, a domain name can be associated with multiple site names.
步骤360,识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则。Step 360: Identify whether the first query keyword input by the user meets a preset qualification rule for searching within the limited site.
步骤370,如符合适于在限定站点内进行搜索的限定规则,则根据限定站点的名称和对应关系,查找限定站点的域名,并在限定站点的域名下搜索第一查询关键词对应的搜索结果。根据本实施例的技术方案,通过建立的对应关系,可以快速找到限定站点的域名。Step 370: If the qualified rule suitable for searching within the limited site is met, the domain name of the limited site is searched according to the name and correspondence of the qualified site, and the search result corresponding to the first query keyword is searched under the domain name of the limited site. . According to the technical solution of the embodiment, the domain name of the limited site can be quickly found through the established correspondence.
根据图3,在搜索引擎的搜索日志中,假设存在了“www.360kan.com/jqm”的url,则从中可提取域名“www.360kan.com”,并建立“360影视”与“www.360kan.com”的对应关系。则某用户输入了“葫芦娃360影视”进行搜索查询,搜索引擎判断其符合限定规则,并判断限定站点名称为“360影视”,则搜索引擎根据对应关系,可知需要在域名“www.360kan.com”下进行搜索查询,得到“360影视”站点上的“葫芦娃”节目作为搜索结果提供给用户。According to FIG. 3, in the search log of the search engine, assuming that the url of "www.360kan.com/jqm" exists, the domain name "www.360kan.com" can be extracted from it, and "360 video" and "www. The corresponding relationship of 360kan.com". Then a user enters the "Huluwa 360 Movie" to conduct a search query, the search engine judges that it meets the qualifying rule, and judges that the limited site name is "360 film and television", then the search engine according to the corresponding relationship, it can be known that the domain name "www.360kan. Under the com" search query, get the "Gourd Baby" program on the "360 movie" site as a search result to provide to the user.
如图4所示,本发明的一个实施例中提供了一种搜索查询方法,其包括:As shown in FIG. 4, an embodiment of the present invention provides a search query method, including:
步骤410,从预设的搜索日志中获取URL对应的第二查询关键词。Step 410: Obtain a second query keyword corresponding to the URL from the preset search log.
步骤420,从URL对应的第二查询关键词中提取站点名称。Step 420: Extract a site name from a second query keyword corresponding to the URL.
步骤430,对于每个提取的站点名称,根据每个站点名称出现在所述第二查询关键词中时对应的域名首页的被点击次数高低判断是否对其保留。根据本实施例的技术方案,由于从同一查询关键词提取的站点名称可能有多个,因为有必要对其进行筛选保留,而对应域名首页点击次数越高则越说明域名与站点名称的相关度更高,站点名称更有可能是正确的,有必要进行保留。Step 430: For each extracted site name, determine whether the number of clicks of the corresponding domain name home page is retained according to the name of each domain name when the site name appears in the second query keyword. According to the technical solution of the embodiment, since there may be multiple site names extracted from the same query keyword, it is necessary to filter and retain the same, and the higher the number of clicks of the corresponding domain home page, the more the domain name and the site name are related. Higher, the site name is more likely to be correct and it is necessary to keep it.
步骤440,根据第二查询关键词和站点名称进行训练得到限定规则,且在第一查询关键词符合限定规则时,得到限定站点的名称。Step 440: Perform a training according to the second query keyword and the site name to obtain a qualified rule, and obtain a name of the limited site when the first query keyword meets the qualified rule.
步骤450,识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则。 Step 450: Identify whether the first query keyword input by the user meets a preset definition rule for searching within the limited site.
步骤460,如符合适于在限定站点内进行搜索的限定规则,则按限定站点的名称,确定限定站点的域名,并在限定站点的域名下搜索第一查询关键词对应的搜索结果。Step 460: If the qualified rule suitable for searching within the limited site is met, the domain name of the limited site is determined according to the name of the qualified site, and the search result corresponding to the first query keyword is searched for under the domain name of the limited site.
根据图4,对于从“www.360kan.com/jqm”提取的站点名称“360影视”,判断在“360影视”出现在查询关键词时,“www.360kan.com”的被点击次数较高时则将“360影视”作为“www.360kan.com”的站点名称。According to FIG. 4, for the site name "360 movie" extracted from "www.360kan.com/jqm", it is judged that "www.360kan.com" has a higher number of clicks when "360 movie" appears in the query keyword. At the time, "360 movie" is taken as the site name of "www.360kan.com".
如图5所示,本发明的一个实施例中提供了一种搜索查询方法,其包括:As shown in FIG. 5, an embodiment of the present invention provides a search query method, including:
步骤510,从预设的搜索日志中获取URL对应的第二查询关键词。Step 510: Obtain a second query keyword corresponding to the URL from the preset search log.
步骤520,从URL对应的第二查询关键词中提取站点名称。Step 520: Extract the site name from the second query keyword corresponding to the URL.
步骤530,根据第二查询关键词和站点名称进行训练得到限定规则,且在第一查询关键词符合限定规则时,得到限定站点的名称。Step 530: Perform a training according to the second query keyword and the site name to obtain a qualified rule, and obtain a name of the qualified site when the first query keyword meets the qualified rule.
步骤540,识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则。Step 540: Identify whether the first query keyword input by the user meets a preset qualification rule for searching within the limited site.
步骤550,如符合适于在限定站点内进行搜索的限定规则,从第一查询关键词中识别出与限定站点的名称的对应内容。在本实施例中,对应内容可以是与限定站点名称相同的内容,或者是同义词,或者是限定站点名称的拼音或英文对照。 Step 550, if the qualified rule suitable for searching within the limited site is met, the corresponding content of the name of the defined site is identified from the first query keyword. In this embodiment, the corresponding content may be the same content as the qualified site name, or a synonym, or a pinyin or English comparison that defines the name of the site.
步骤560,在限定站点的域名下,按第一查询关键词中除对应内容外的部分进行搜索。根据本实施例的技术方案,因为用户输入的查询关键词中往往一部分用于限定站点,而另一部分才用于反映其需求的资源,所以通过本实施例可以合理将两部分划分以准确地进行搜索。In step 560, under the domain name of the limited site, the search is performed according to the part of the first query keyword except the corresponding content. According to the technical solution of the embodiment, since part of the query keywords input by the user is used to define the site, and another part is used to reflect the resources of the request, the two parts can be reasonably divided by the embodiment to accurately perform the search for.
根据图5,某用户输入了“葫芦娃360yingshi”进行搜索查询,基于限定规则判断用户需求在“360影视”站点上进行搜索,则识别出与“360影视”对应的“360yingshi”,其中,“yingshi”为“影视”的拼音,则搜索引擎选择在“360影视”的域名“www.360kan.com”下以“葫芦娃”为新的查询关键词进行搜索。According to FIG. 5, a user inputs "cucurbita 360yingshi" to perform a search query, and judges the user's demand to search on the "360 movie" site based on the qualification rule, and recognizes "360yingshi" corresponding to "360 movie", wherein, "Yingshi" is the pinyin of "film", the search engine chooses to search for the new query keyword under the "360 movie" domain name "www.360kan.com" with "cucurbit baby".
如图6所示,本发明的一个实施例中提供了一种搜索查询方法,其包括:As shown in FIG. 6, an embodiment of the present invention provides a search query method, including:
步骤610,从预设的搜索日志中获取URL对应的第二查询关键词。Step 610: Obtain a second query keyword corresponding to the URL from the preset search log.
步骤620,从URL对应的第二查询关键词中提取站点名称。Step 620: Extract a site name from a second query keyword corresponding to the URL.
步骤630,根据第二查询关键词和站点名称进行训练得到限定规则,且在第一查询关键词符合限定规则时,得到限定站点的名称。Step 630: Perform a training according to the second query keyword and the site name to obtain a qualified rule, and obtain a name of the qualified site when the first query keyword meets the qualified rule.
步骤640,识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则。Step 640: Identify whether the first query keyword input by the user meets a preset definition rule for searching within the limited site.
步骤650,如符合适于在限定站点内进行搜索的限定规则,对第一查询关键词进行分词得到多个词语,并分别判断每个词语是否是与限定站点名称对应的内容。在本实施例中,可以采用现有的分词技术进行分词。而在分词之前,还可以设置黑名单和白名单,黑名单中可设置一些需要屏蔽掉的词语, 白名单中可设置一些返回固定结果的词语。同时,在分词时,还需要避免对一些保护词进行分词,例如,“钢铁是怎样炼成的”;还可以过滤掉一些词语,例如一些谓语。Step 650: If the qualified rule suitable for searching within the limited site is met, the first query keyword is segmented to obtain a plurality of words, and each word is determined to be a content corresponding to the limited site name. In this embodiment, the word segmentation technique can be used to perform word segmentation. Before the word segmentation, you can also set the blacklist and whitelist. The blacklist can set some words that need to be blocked. Some words that return a fixed result can be set in the whitelist. At the same time, in the word segmentation, it is also necessary to avoid segmentation of some protection words, for example, "how steel is made"; it is also possible to filter out some words, such as some predicates.
步骤660,在限定站点的域名下,按第一查询关键词中除对应内容外的部分进行搜索。In step 660, under the domain name of the limited site, the search is performed according to the part of the first query keyword except the corresponding content.
根据图6,某用户输入了“葫芦娃360影视”进行搜索查询,基于限定规则判断用户需求在“360影视”站点上进行搜索,对“葫芦娃360影视”分词后得到“葫芦娃”和“360影视”,其中“360影视”与站点对应,则搜索引擎选择在“360影视”的域名“www.360kan.com”下以“葫芦娃”为新的查询关键词进行搜索。According to Figure 6, a user enters the "Huluwa 360 Movie" to conduct a search query, based on the qualification rules to determine the user's needs to search on the "360 film and television" site, and the "cucurbit baby 360" film segmentation to get "cucurbit baby" and " 360 film and television, in which "360 film and television" corresponds to the site, the search engine chooses to search for the new query keyword under the "360 movie" domain name "www.360kan.com" with "cucurbit baby".
如图7所示,本发明的一个实施例中提供了一种搜索查询装置,其包括:As shown in FIG. 7, an embodiment of the present invention provides a search query apparatus, including:
第一查询关键词识别模块710,识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则。在本实施例中,对限定规则的类型不进行限制,例如,限定规则可以是:对多个站点的名称进行记录,如果第一查询关键词中包含了其中的站点名称,则判断需要在该站点下进行搜索。The first query keyword identification module 710 identifies whether the first query keyword input by the user meets a preset definition rule for searching within the limited site. In this embodiment, the type of the rule is not limited. For example, the rule may be: recording the names of the multiple sites. If the site name is included in the first query keyword, it is determined that the rule needs to be Search under the site.
搜索模块720,如符合适于在限定站点内进行搜索的限定规则,则在限定站点的域名下搜索第一查询关键词对应的搜索结果。根据本实施例的技术方案,搜索查询的结果既不会包含非限定站点外的结果,也不会包含某些站点的首页;由于仅在限定站点上进行搜索,所以搜索产生的计算量也较小,也更容易避免非限定站点的作弊网页的干扰。The search module 720 searches for a search result corresponding to the first query keyword under the domain name of the defined site if it meets a qualified rule suitable for searching within the defined site. According to the technical solution of the embodiment, the result of the search query does not include the results outside the unqualified site, and does not include the home page of some sites; since the search is only performed on the limited site, the search generates a larger amount of calculation. Small, it is also easier to avoid the interference of cheat pages on unrestricted sites.
根据图7,假设限定规则包含:如果用户输入的查询关键词中包含“360影视”,则需要在“360影视”站点中进行搜索。某用户输入了“葫芦娃360影视”进行搜索查询,其中包含了“360影视”,则搜索引擎选择在“360影视”的域名“www.360kan.com”下进行搜索查询,得到“360影视”站点上的“葫芦娃”节目作为搜索结果提供给用户。According to FIG. 7, it is assumed that the qualification rule includes: if the user input of the query keyword includes "360 movie", it is necessary to search in the "360 movie" site. A user entered the "Huluwa 360 Video" search query, which includes "360 video", the search engine chooses to search for the "360 movie" domain name "www.360kan.com", get "360 video" The "Gourd Baby" program on the site is provided to the user as a search result.
如图8所示,本发明的一个实施例中提供了一种搜索查询装置,其包括:As shown in FIG. 8, an embodiment of the present invention provides a search query apparatus, including:
第二查询关键词获取模块810,从预设的搜索日志中获取URL对应的第二查询关键词。在本实施例中,搜索日志可以是根据搜索引擎行为记录的日志,第二查询关键词即为历史查询关键词。The second query keyword obtaining module 810 obtains a second query keyword corresponding to the URL from the preset search log. In this embodiment, the search log may be a log recorded according to the search engine behavior, and the second query keyword is a historical query keyword.
站点名称提取模块820,从URL对应的第二查询关键词中提取站点名称。在本实施例中,提取的站点名称可以为一个或多个。The site name extraction module 820 extracts the site name from the second query keyword corresponding to the URL. In this embodiment, the extracted site names may be one or more.
训练模块830,根据第二查询关键词和站点名称进行训练得到限定规则,且在第一查询关键词符合限定规则时,得到限定站点的名称。在本实施例中,基于历史数据可以挖掘到限定规则,历史数据反映了用户的历史搜索行为,所以基于历史数据得到的限定规则对用户更加适用。具体地,还可以根据该URL是否被点击的数据来进行训练。 The training module 830 performs training according to the second query keyword and the site name to obtain a qualified rule, and obtains a name of the limited site when the first query keyword meets the qualified rule. In this embodiment, the definition rule can be mined based on the historical data, and the historical data reflects the historical search behavior of the user, so the qualification rule obtained based on the historical data is more applicable to the user. Specifically, it is also possible to perform training based on whether or not the URL is clicked.
第一查询关键词识别模块840,识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则。The first query keyword identification module 840 identifies whether the first query keyword input by the user conforms to a preset definition rule for searching within the limited site.
搜索模块850,如符合适于在限定站点内进行搜索的限定规则,则按限定站点的名称,确定限定站点的域名,并在限定站点的域名下搜索第一查询关键词对应的搜索结果。在本实施例中,只要确定了站点的名称,那么站点的域名也就能够确定。The search module 850, if it meets the qualification rule suitable for searching within the limited site, determines the domain name of the limited site according to the name of the qualified site, and searches for the search result corresponding to the first query keyword under the domain name of the limited site. In this embodiment, as long as the name of the site is determined, the domain name of the site can be determined.
根据图2,在搜索引擎的搜索日志中,假设存在了“www.360kan.com/jqm”的url,则获取其对应的查询关键词“机器猫360影视”,从中提取出站点名称“360影视”,则基于“机器猫360影视”和“360影视”可以进行限定规则的训练,可以采用决策树方式训练。某用户输入了“葫芦娃360影视”进行搜索查询,搜索引擎判断其符合限定规则,并判断限定站点名称为“360影视”,则搜索引擎选择在“360影视”的域名“www.360kan.com”下进行搜索查询,得到“360影视”站点上的“葫芦娃”节目作为搜索结果提供给用户。According to Figure 2, in the search log of the search engine, assuming that the url of "www.360kan.com/jqm" exists, the corresponding query keyword "machine cat 360 movie" is obtained, and the site name "360 film and television" is extracted therefrom. ", based on "Don't Cat 360 Movies" and "360 Movies and TVs" can be limited training, you can use the decision tree to train. A user enters the "Gourd Baby 360 Movies" to search and query, the search engine judges that it meets the qualifying rules, and judges that the limited site name is "360 Movies", then the search engine selects the domain name of "360 Movies and TVs" "www.360kan.com Under the search query, the "Gourd Baby" program on the "360 Film and Television" site was provided as a search result to the user.
如图9所示,本发明的一个实施例中提供了一种搜索查询装置,其包括:As shown in FIG. 9, an embodiment of the present invention provides a search query apparatus, including:
第二查询关键词获取模块910,从预设的搜索日志中获取URL对应的第二查询关键词。The second query keyword obtaining module 910 obtains a second query keyword corresponding to the URL from the preset search log.
站点名称提取模块920,从URL对应的第二查询关键词中提取站点名称。The site name extraction module 920 extracts the site name from the second query keyword corresponding to the URL.
训练模块930,根据第二查询关键词和站点名称进行训练得到限定规则,且在第一查询关键词符合限定规则时,得到限定站点的名称。The training module 930 performs training according to the second query keyword and the site name to obtain a qualified rule, and obtains a name of the limited site when the first query keyword meets the qualified rule.
域名提取模块940,从URL中提取域名。The domain name extraction module 940 extracts the domain name from the URL.
对应关系建立模块950,建立所提取域名与站点名称之间的对应关系。在本实施例中,一个域名可以与多个站点名称之间建立对应关系。The correspondence establishing module 950 establishes a correspondence between the extracted domain name and the site name. In this embodiment, a domain name can be associated with multiple site names.
第一查询关键词识别模块960,识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则。The first query keyword identification module 960 identifies whether the first query keyword input by the user meets a preset definition rule for searching within the limited site.
搜索模块970,如符合适于在限定站点内进行搜索的限定规则,则根据限定站点的名称和对应关系,查找限定站点的域名,并在限定站点的域名下搜索第一查询关键词对应的搜索结果。根据本实施例的技术方案,通过建立的对应关系,可以快速找到限定站点的域名。The search module 970, if it meets the qualification rule suitable for searching within the limited site, searches for the domain name of the limited site according to the name and correspondence of the qualified site, and searches for the search corresponding to the first query keyword under the domain name of the restricted site. result. According to the technical solution of the embodiment, the domain name of the limited site can be quickly found through the established correspondence.
根据图9,在搜索引擎的搜索日志中,假设存在了“www.360kan.com/jqm”的url,则从中可提取域名“www.360kan.com”,并建立“360影视”与“www.360kan.com”的对应关系。则某用户输入了“葫芦娃360影视”进行搜索查询,搜索引擎判断其符合限定规则,并判断限定站点名称为“360影视”,则搜索引擎根据对应关系,可知需要在域名“www.360kan.com”下进行搜索查询,得到“360影视”站点上的“葫芦娃”节目作为搜索结果提供给用户。 According to FIG. 9, in the search log of the search engine, assuming that the url of "www.360kan.com/jqm" exists, the domain name "www.360kan.com" can be extracted therefrom, and "360 video" and "www. The corresponding relationship of 360kan.com". Then a user enters the "Huluwa 360 Movie" to conduct a search query, the search engine judges that it meets the qualifying rule, and judges that the limited site name is "360 film and television", then the search engine according to the corresponding relationship, it can be known that the domain name "www.360kan. Under the com" search query, get the "Gourd Baby" program on the "360 movie" site as a search result to provide to the user.
如图8所示,本发明的一个实施例中提供了一种搜索查询装置,其包括:As shown in FIG. 8, an embodiment of the present invention provides a search query apparatus, including:
第二查询关键词提取模块810,从预设的搜索日志中获取URL对应的第二查询关键词。The second query keyword extraction module 810 obtains a second query keyword corresponding to the URL from the preset search log.
站点名称提取模块820,从URL对应的第二查询关键词中提取站点名称。The site name extraction module 820 extracts the site name from the second query keyword corresponding to the URL.
站点名称提取模块820对于每个提取的站点名称,根据每个站点名称出现在所述第二查询关键词中时对应的域名首页的被点击次数高低判断是否对其保留。根据本实施例的技术方案,由于从同一查询关键词提取的站点名称可能有多个,因为有必要对其进行筛选保留,而对应域名首页点击次数越高则越说明域名与站点名称的相关度更高,站点名称更有可能是正确的,有必要进行保留。The site name extraction module 820 determines, for each extracted site name, whether the number of clicks of the corresponding domain name home page is retained for each domain name when it appears in the second query keyword. According to the technical solution of the embodiment, since there may be multiple site names extracted from the same query keyword, it is necessary to filter and retain the same, and the higher the number of clicks of the corresponding domain home page, the more the domain name and the site name are related. Higher, the site name is more likely to be correct and it is necessary to keep it.
训练模块830,根据第二查询关键词和站点名称进行训练得到限定规则,且在第一查询关键词符合限定规则时,得到限定站点的名称。The training module 830 performs training according to the second query keyword and the site name to obtain a qualified rule, and obtains a name of the limited site when the first query keyword meets the qualified rule.
第一查询关键词识别模块840,识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则。The first query keyword identification module 840 identifies whether the first query keyword input by the user conforms to a preset definition rule for searching within the limited site.
搜索模块850,如符合适于在限定站点内进行搜索的限定规则,则按限定站点的名称,确定限定站点的域名,并在限定站点的域名下搜索第一查询关键词对应的搜索结果。The search module 850, if it meets the qualification rule suitable for searching within the limited site, determines the domain name of the limited site according to the name of the qualified site, and searches for the search result corresponding to the first query keyword under the domain name of the limited site.
根据图8,对于从“www.360kan.com/jqm”提取的站点名称“360影视”,判断在“360影视”出现在查询关键词时,“www.360kan.com”的被点击次数较高时则将“360影视”作为“www.360kan.com”的站点名称。According to FIG. 8, for the site name "360 movie" extracted from "www.360kan.com/jqm", it is judged that "www.360kan.com" has a higher number of clicks when "360 movie" appears in the query keyword. At the time, "360 movie" is taken as the site name of "www.360kan.com".
如图8所示,本发明的一个实施例中提供了一种搜索查询装置,其包括:As shown in FIG. 8, an embodiment of the present invention provides a search query apparatus, including:
第二查询关键词获取模块810,从预设的搜索日志中获取URL对应的第二查询关键词。The second query keyword obtaining module 810 obtains a second query keyword corresponding to the URL from the preset search log.
站点名称提取模块820,从URL对应的第二查询关键词中提取站点名称。The site name extraction module 820 extracts the site name from the second query keyword corresponding to the URL.
训练模块830,根据第二查询关键词和站点名称进行训练得到限定规则,且在第一查询关键词符合限定规则时,得到限定站点的名称。The training module 830 performs training according to the second query keyword and the site name to obtain a qualified rule, and obtains a name of the limited site when the first query keyword meets the qualified rule.
第一查询关键词识别模块840,识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则。The first query keyword identification module 840 identifies whether the first query keyword input by the user conforms to a preset definition rule for searching within the limited site.
搜索模块850,如符合适于在限定站点内进行搜索的限定规则,从第一查询关键词中识别出与限定站点的名称的对应内容。在本实施例中,对应内容可以是与限定站点名称相同的内容,或者是同义词,或者是限定站点名称的拼音或英文对照。The search module 850, if conforming to a qualifying rule suitable for searching within the defined site, identifies corresponding content from the first query keyword with the name of the defined site. In this embodiment, the corresponding content may be the same content as the qualified site name, or a synonym, or a pinyin or English comparison that defines the name of the site.
搜索模块850在限定站点的域名下,按第一查询关键词中除对应内容外的部分进行搜索。根据本实施例的技术方案,因为用户输入的查询关键词中往往一部分用于限定站点,而另一部分才用于反映其需求的资源,所以通过本实施例可以合理将两部分划分以准确地进行搜索。 The search module 850 searches for the portion of the first query keyword other than the corresponding content under the domain name of the limited site. According to the technical solution of the embodiment, since part of the query keywords input by the user is used to define the site, and another part is used to reflect the resources of the request, the two parts can be reasonably divided by the embodiment to accurately perform the search for.
根据图8,某用户输入了“葫芦娃360yingshi”进行搜索查询,基于限定规则判断用户需求在“360影视”站点上进行搜索,则识别出与“360影视”对应的“360yingshi”,其中,“yingshi”为“影视”的拼音,则搜索引擎选择在“360影视”的域名“www.360kan.com”下以“葫芦娃”为新的查询关键词进行搜索。According to FIG. 8, a user inputs "cucurbita 360yingshi" to perform a search query, and judges the user's demand to search on the "360 film and television" site based on the qualification rule, and recognizes "360yingshi" corresponding to "360 film and television", wherein, "Yingshi" is the pinyin of "film", the search engine chooses to search for the new query keyword under the "360 movie" domain name "www.360kan.com" with "cucurbit baby".
如图8所示,本发明的一个实施例中提供了一种搜索查询装置,其包括:As shown in FIG. 8, an embodiment of the present invention provides a search query apparatus, including:
第二查询关键词获取模块810,从预设的搜索日志中获取URL对应的第二查询关键词。The second query keyword obtaining module 810 obtains a second query keyword corresponding to the URL from the preset search log.
站点名称提取模块820,从URL对应的第二查询关键词中提取站点名称。The site name extraction module 820 extracts the site name from the second query keyword corresponding to the URL.
训练模块830,根据第二查询关键词和站点名称进行训练得到限定规则,且在第一查询关键词符合限定规则时,得到限定站点的名称。The training module 830 performs training according to the second query keyword and the site name to obtain a qualified rule, and obtains a name of the limited site when the first query keyword meets the qualified rule.
第一查询关键词识别模块840,识别用户输入的第一查询关键词是否符合预设的在限定点内进行搜索的限定规则。The first query keyword identification module 840 identifies whether the first query keyword input by the user meets a preset definition rule for searching within the defined point.
搜索模块850,如符合适于在限定站点内进行搜索的限定规则,对第一查询关键词进行分词得到多个词语,并分别判断每个词语是否是与限定站点名称对应的内容。在本实施例中,可以采用现有的分词技术进行分词。而在分词之前,还可以设置黑名单和白名单,黑名单中可设置一些需要屏蔽掉的词语,白名单中可设置一些返回固定结果的词语。同时,在分词时,还需要避免对一些保护词进行分词,例如,“钢铁是怎样炼成的”;还可以过滤掉一些词语,例如一些谓语。The search module 850, if conforming to a qualification rule suitable for searching within the limited site, classifies the first query keyword to obtain a plurality of words, and respectively determines whether each word is a content corresponding to the limited site name. In this embodiment, the word segmentation technique can be used to perform word segmentation. Before the word segmentation, you can also set the blacklist and whitelist. In the blacklist, you can set some words that need to be blocked. In the whitelist, you can set some words that return fixed results. At the same time, in the word segmentation, it is also necessary to avoid segmentation of some protection words, for example, "how steel is made"; it is also possible to filter out some words, such as some predicates.
搜索模块850在限定站点的域名下,按第一查询关键词中除对应内容外的部分进行搜索。The search module 850 searches for the portion of the first query keyword other than the corresponding content under the domain name of the limited site.
根据图6,某用户输入了“葫芦娃360影视”进行搜索查询,基于限定规则判断用户需求在“360影视”站点上进行搜索,对“葫芦娃360影视”分词后得到“葫芦娃”和“360影视”,其中“360影视”与站点对应,则搜索引擎选择在“360影视”的域名“www.360kan.com”下以“葫芦娃”为新的查询关键词进行搜索。According to Figure 6, a user enters the "Huluwa 360 Movie" to conduct a search query, based on the qualification rules to determine the user's needs to search on the "360 film and television" site, and the "cucurbit baby 360" film segmentation to get "cucurbit baby" and " 360 film and television, in which "360 film and television" corresponds to the site, the search engine chooses to search for the new query keyword under the "360 movie" domain name "www.360kan.com" with "cucurbit baby".
在此提供的算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本发明也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明的内容,并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms and displays provided herein are not inherently related to any particular computer, virtual system, or other device. Various general purpose systems can also be used with the teaching based on the teachings herein. The structure required to construct such a system is apparent from the above description. Moreover, the invention is not directed to any particular programming language. It is to be understood that the invention may be embodied in a variety of programming language, and the description of the specific language has been described above in order to disclose the preferred embodiments of the invention.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that the embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques are not shown in detail so as not to obscure the understanding of the description.
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时 被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。Similarly, the various features of the present invention are sometimes described in the above description of exemplary embodiments of the invention in order to simplify the disclosure and to facilitate understanding of one or more of the various embodiments. They are grouped together into a single embodiment, figure, or description thereof. However, the method disclosed is not to be interpreted as reflecting the intention that the claimed invention requires more features than those recited in the claims. Rather, as the following claims reflect, inventive aspects reside in less than all features of the single embodiments disclosed herein. Therefore, the claims following the specific embodiments are hereby explicitly incorporated into the embodiments, and each of the claims as a separate embodiment of the invention.
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art will appreciate that the modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components. In addition to such features and/or at least some of the processes or units being mutually exclusive, any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined. Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。In addition, those skilled in the art will appreciate that, although some embodiments described herein include certain features that are included in other embodiments and not in other features, combinations of features of different embodiments are intended to be within the scope of the present invention. Different embodiments are formed and formed. For example, in the following claims, any one of the claimed embodiments can be used in any combination.
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的搜索查询装置中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or digital signal processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components of the search query device in accordance with embodiments of the present invention. The invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein. Such a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
例如,图10示意性地示出了用于执行根据本发明的方法的计算设备的框图。该计算设备传统上包括处理器1010和以存储器1020形式的计算机程序产品或者计算机可读介质。存储器1020可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器1020具有用于执行上述方法中的任何方法步骤的程序代码1031的存储空间1030。例如,用于程序代码的存储空间1030可以包括分别用于实现上面的方法中的各种步骤的各个程序代码1031。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图11所述的便 携式或者固定存储单元。该存储单元可以具有与图10的计算设备中的存储器1020类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括用于执行根据本发明的方法步骤的计算机可读代码1031’,即可以由例如诸如1010之类的处理器读取的代码,这些代码当由计算设备运行时,导致该计算设备执行上面所描述的方法中的各个步骤。For example, Figure 10 schematically illustrates a block diagram of a computing device for performing the method in accordance with the present invention. The computing device conventionally includes a processor 1010 and a computer program product or computer readable medium in the form of a memory 1020. The memory 1020 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM. The memory 1020 has a memory space 1030 for executing program code 1031 of any of the above method steps. For example, storage space 1030 for program code may include various program code 1031 for implementing various steps in the above methods, respectively. The program code can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such a computer program product is typically as described with reference to Figure 11 Portable or fixed storage unit. The storage unit may have storage segments, storage spaces, and the like that are similarly arranged to memory 1020 in the computing device of FIG. The program code can be compressed, for example, in an appropriate form. Typically, the storage unit comprises computer readable code 1031' for performing the steps of the method according to the invention, ie code that can be read by, for example, a processor such as 1010, which when executed by the computing device causes the calculation The device performs the various steps in the methods described above.
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It is to be noted that the above-described embodiments are illustrative of the invention and are not intended to be limiting, and that the invention may be devised without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as a limitation. The word "comprising" does not exclude the presence of the elements or steps that are not recited in the claims. The word "a" or "an" The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means can be embodied by the same hardware item. The use of the words first, second, and third does not indicate any order. These words can be interpreted as names.
此外,还应当注意,本说明书中使用的语言主要是为了可读性和教导的目的而选择的,而不是为了解释或者限定本发明的主题而选择的。因此,在不偏离所附权利要求书的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明的范围,对本发明所做的公开是说明性的,而非限制性的,本发明的范围由所附权利要求书限定。In addition, it should be noted that the language used in the specification has been selected for the purpose of readability and teaching, and is not intended to be construed or limited. Therefore, many modifications and changes will be apparent to those skilled in the art without departing from the scope of the invention. The disclosure of the present invention is intended to be illustrative, and not restrictive, and the scope of the invention is defined by the appended claims.
本发明可以应用于计算机系统/服务器,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与计算机系统/服务器一起使用的众所周知的计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统、大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。The present invention is applicable to computer systems/servers that can operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems/servers include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
计算机系统/服务器可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。The computer system/server can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system. Generally, program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types. The computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network. In a distributed cloud computing environment, program modules may be located on a local or remote computing system storage medium including storage devices.
本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本发明的至少一个实施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一个实施例。 "an embodiment," or "an embodiment," or "an embodiment," In addition, it is noted that the phrase "in one embodiment" is not necessarily referring to the same embodiment.

Claims (14)

  1. 一种搜索查询方法,其包括:A search query method includes:
    识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则;Identifying whether the first query keyword input by the user meets a preset definition rule for searching within the limited site;
    如符合适于在限定站点内进行搜索的限定规则,则在所述限定站点的域名下搜索所述第一查询关键词对应的搜索结果。If a qualifying rule suitable for searching within the defined site is met, the search result corresponding to the first query keyword is searched for under the domain name of the defined site.
  2. 根据权利要求1所述的方法,其中,在识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则之前,还包括:The method according to claim 1, wherein before identifying whether the first query keyword input by the user conforms to a preset qualification rule for searching within the limited site, the method further comprises:
    从预设的搜索日志中获取URL对应的第二查询关键词;Obtaining a second query keyword corresponding to the URL from the preset search log;
    从所述URL对应的第二查询关键词中提取站点名称;Extracting a site name from a second query keyword corresponding to the URL;
    根据所述第二查询关键词和所述站点名称进行训练得到所述限定规则,且在所述第一查询关键词符合所述限定规则时,得到所述限定站点的名称;And performing the training according to the second query keyword and the site name to obtain the qualified rule, and when the first query keyword meets the qualified rule, obtaining the name of the limited site;
    在在所述限定站点的域名下搜索所述第一查询关键词对应的搜索结果之前,还包括:Before searching for the search result corresponding to the first query keyword under the domain name of the limited site, the method further includes:
    按所述限定站点的名称,确定所述限定站点的域名。The domain name of the qualified site is determined according to the name of the qualified site.
  3. 根据权利要求2所述的方法,其中,在识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则之前,还包括:The method according to claim 2, further comprising: before identifying whether the first query keyword input by the user conforms to the preset defining rule for searching within the limited site, the method further comprising:
    从所述URL中提取域名;Extracting a domain name from the URL;
    建立所提取域名与所述站点名称之间的对应关系;Establishing a correspondence between the extracted domain name and the name of the site;
    按所述限定站点的名称,确定所述限定站点的域名,具体包括:Determining the domain name of the limited site according to the name of the qualified site, including:
    根据所述限定站点的名称和所述对应关系,查找所述限定站点的域名。Searching for the domain name of the limited site according to the name of the qualified site and the corresponding relationship.
  4. 根据权利要求2所述的方法,其中,从所述URL对应的第二查询关键词中提取站点名称,还包括:The method of claim 2, wherein extracting the site name from the second query keyword corresponding to the URL further comprises:
    对于每个提取的站点名称,根据每个站点名称出现在所述第二查询关键词中时对应的域名首页的被点击次数高低判断是否对其保留。For each extracted site name, it is determined whether or not the number of clicks of the corresponding domain name home page is reserved according to the name of each site name when it appears in the second query keyword.
  5. 根据权利要求2所述的方法,其中,在所述限定站点的域名下搜索所述第一查询关键词对应的搜索结果,具体包括:The method of claim 2, wherein searching for the search result corresponding to the first query keyword under the domain name of the limited site comprises:
    从所述第一查询关键词中识别出与所述限定站点的名称的对应内容;Identifying, from the first query keyword, a corresponding content with a name of the limited site;
    在所述限定站点的域名下,按所述第一查询关键词中除所述对应内容外的部分进行搜索。Searching for a portion of the first query keyword other than the corresponding content under the domain name of the limited site.
  6. 根据权利要求5所述的方法,其中,从所述第一查询关键词中识别出与所述限定站点的名称的对应内容,具体包括:The method according to claim 5, wherein the corresponding content of the name of the limited site is identified from the first query keyword, and specifically includes:
    对所述第一查询关键词进行分词得到多个词语;Performing word segmentation on the first query keyword to obtain a plurality of words;
    分别判断每个词语是否是与所述限定站点名称对应的内容。It is judged whether each word is the content corresponding to the limited site name, respectively.
  7. 一种搜索查询装置,其包括:A search query device comprising:
    第一查询关键词识别模块,用于识别用户输入的第一查询关键词是否符合预设的在限定站点内进行搜索的限定规则; a first query keyword identification module, configured to identify whether the first query keyword input by the user meets a preset definition rule for searching within the limited site;
    搜索模块,如符合适于在限定站点内进行搜索的限定规则,则在所述限定站点的域名下搜索所述第一查询关键词对应的搜索结果。The search module, if it meets the qualification rule suitable for searching within the limited site, searches for the search result corresponding to the first query keyword under the domain name of the qualified site.
  8. 根据权利要求7所述的装置,其中,还包括:The apparatus according to claim 7, further comprising:
    第二查询关键词获取模块,用于从预设的搜索日志中获取URL对应的第二查询关键词;a second query keyword obtaining module, configured to obtain a second query keyword corresponding to the URL from the preset search log;
    站点名称提取模块,用于从所述URL对应的第二查询关键词中提取站点名称;a site name extraction module, configured to extract a site name from a second query keyword corresponding to the URL;
    训练模块,用于根据所述第二查询关键词和所述站点名称进行训练得到所述限定规则,且在所述第一查询关键词符合所述限定规则时,得到所述限定站点的名称;a training module, configured to perform the qualification rule according to the second query keyword and the site name, and obtain the name of the limited site when the first query keyword meets the qualification rule;
    域名确定模块,用于按所述限定站点的名称,确定所述限定站点的域名。The domain name determining module is configured to determine the domain name of the limited site according to the name of the limited site.
  9. 根据权利要求8所述的装置,其中,还包括:The apparatus of claim 8 further comprising:
    域名提取模块,用于从所述URL中提取域名;a domain name extraction module, configured to extract a domain name from the URL;
    对应关系建立模块,用于建立所提取域名与所述站点名称之间的对应关系;a correspondence establishing module, configured to establish a correspondence between the extracted domain name and the site name;
    所述域名确定模块根据所述限定站点的名称和所述对应关系,查找所述限定站点的域名。The domain name determining module searches for the domain name of the limited site according to the name of the limited site and the corresponding relationship.
  10. 根据权利要求8所述的装置,其中,The device according to claim 8, wherein
    所述站点名称提取模块对于每个提取的站点名称,根据每个站点名称出现在所述第二查询关键词中时对应的域名首页的被点击次数高低判断是否对其保留。For each extracted site name, the site name extraction module determines whether to retain the number of clicks of the corresponding domain name homepage when the site name appears in the second query keyword.
  11. 根据权利要求8所述的装置,其中,The device according to claim 8, wherein
    所述搜索模块从所述第一查询关键词中识别出与所述限定站点的名称的对应内容,并在所述限定站点的域名下,按所述第一查询关键词中除所述对应内容外的部分进行搜索。The search module identifies a corresponding content from the first query keyword and the name of the limited site, and in the domain name of the limited site, the corresponding content in the first query keyword The outer part is searched.
  12. 根据权利要求11所述的装置,其中,The apparatus according to claim 11, wherein
    所述搜索模块对所述第一查询关键词进行分词得到多个词语,并分别判断每个词语是否是与所述限定站点名称对应的内容。The searching module performs segmentation on the first query keyword to obtain a plurality of words, and respectively determines whether each word is content corresponding to the limited site name.
  13. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算设备上运行时,导致所述计算设备执行根据权利要求1-6中的任一项所述的搜索查询方法。A computer program comprising computer readable code, when the computer readable code is run on a computing device, causing the computing device to perform the search query method of any of claims 1-6.
  14. 一种计算机可读介质,其中存储了如权利要求13所述的计算机程序。 A computer readable medium storing the computer program of claim 13.
PCT/CN2015/095018 2014-12-22 2015-11-19 Search query method and apparatus WO2016101737A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410806927.4A CN104462519A (en) 2014-12-22 2014-12-22 Search query method and device
CN201410806927.4 2014-12-22

Publications (1)

Publication Number Publication Date
WO2016101737A1 true WO2016101737A1 (en) 2016-06-30

Family

ID=52908554

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/095018 WO2016101737A1 (en) 2014-12-22 2015-11-19 Search query method and apparatus

Country Status (2)

Country Link
CN (1) CN104462519A (en)
WO (1) WO2016101737A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147472A (en) * 2017-07-14 2019-08-20 北京搜狗科技发展有限公司 Detection method, device and the detection device for website of practising fraud of cheating website
CN111797205A (en) * 2020-06-30 2020-10-20 百度在线网络技术(北京)有限公司 Word list retrieval method and device, electronic equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462519A (en) * 2014-12-22 2015-03-25 北京奇虎科技有限公司 Search query method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050060296A1 (en) * 2003-09-15 2005-03-17 Whitby David Scott Search system and method for simultaneous querying and notification of multiple web sales sites
CN102947824A (en) * 2010-06-11 2013-02-27 迪内希·阿南德·尼丁 System and method of addressing and accessing information using a keyword identifier
CN102982150A (en) * 2012-11-27 2013-03-20 潘燕辉 Client rapid input-based searching method
CN104123366A (en) * 2014-07-23 2014-10-29 谢建平 Search method and server
CN104462519A (en) * 2014-12-22 2015-03-25 北京奇虎科技有限公司 Search query method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101505328A (en) * 2008-02-04 2009-08-12 台达电子工业股份有限公司 Network data retrieval method applying speech recognition and system thereof
CN102591932A (en) * 2011-12-23 2012-07-18 优视科技有限公司 Voice search method, voice search system, mobile terminal and transfer server
CN102651022B (en) * 2012-03-31 2017-05-10 北京奇虎科技有限公司 Searching method and device
KR20140037751A (en) * 2012-09-19 2014-03-27 베리사인 인코포레이티드 Methods and systems for providing content provider-specified url keyword navigation
CN103873601B (en) * 2012-12-11 2019-03-08 百度在线网络技术(北京)有限公司 A kind of method for digging and system addressing class query word

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050060296A1 (en) * 2003-09-15 2005-03-17 Whitby David Scott Search system and method for simultaneous querying and notification of multiple web sales sites
CN102947824A (en) * 2010-06-11 2013-02-27 迪内希·阿南德·尼丁 System and method of addressing and accessing information using a keyword identifier
CN102982150A (en) * 2012-11-27 2013-03-20 潘燕辉 Client rapid input-based searching method
CN104123366A (en) * 2014-07-23 2014-10-29 谢建平 Search method and server
CN104462519A (en) * 2014-12-22 2015-03-25 北京奇虎科技有限公司 Search query method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147472A (en) * 2017-07-14 2019-08-20 北京搜狗科技发展有限公司 Detection method, device and the detection device for website of practising fraud of cheating website
CN110147472B (en) * 2017-07-14 2021-10-15 北京搜狗科技发展有限公司 Detection method and device for cheating sites and detection device for cheating sites
CN111797205A (en) * 2020-06-30 2020-10-20 百度在线网络技术(北京)有限公司 Word list retrieval method and device, electronic equipment and storage medium
CN111797205B (en) * 2020-06-30 2024-03-12 百度在线网络技术(北京)有限公司 Vocabulary retrieval method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN104462519A (en) 2015-03-25

Similar Documents

Publication Publication Date Title
US10868827B2 (en) Browser extension for contemporaneous in-browser tagging and harvesting of internet content
US20230185857A1 (en) Method and system for providing context based query suggestions
CN106055574B (en) Method and device for identifying illegal uniform resource identifier (URL)
US9460117B2 (en) Image searching
US8560513B2 (en) Searching for information based on generic attributes of the query
WO2015070673A1 (en) Method for browser-side network search and browser
US10565253B2 (en) Model generation method, word weighting method, device, apparatus, and computer storage medium
US20160125096A1 (en) Context aware query selection
US9767198B2 (en) Method and system for presenting content summary of search results
WO2017143930A1 (en) Method of sorting search results, and device for same
CN107330079B (en) Method and device for presenting rumor splitting information based on artificial intelligence
CN107533567B (en) Image entity identification and response
US9411909B2 (en) Method and apparatus for pushing network information
WO2016101737A1 (en) Search query method and apparatus
CN108112026B (en) WiFi identification method and device
US20160034589A1 (en) Method and system for search term whitelist expansion
CN106202127B (en) Method and device for processing retrieval request by vertical search engine
US20170308519A1 (en) Learning semantic parsing
CN107577667B (en) Entity word processing method and device
CN109960752A (en) Querying method, device, computer equipment and storage medium in application program
CN107609094B (en) Data disambiguation method and device and computer equipment
CN110990701A (en) Book searching method, computing device and computer storage medium
US10296990B2 (en) Verifying compliance of a land parcel to an approved usage
CN109885739B (en) Data processing method, system and storage medium
WO2015139500A1 (en) Website analyzing and identifying method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15871806

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15871806

Country of ref document: EP

Kind code of ref document: A1