US20160026696A1 - Identifying query aspects - Google Patents

Identifying query aspects Download PDF

Info

Publication number
US20160026696A1
US20160026696A1 US14/875,177 US201514875177A US2016026696A1 US 20160026696 A1 US20160026696 A1 US 20160026696A1 US 201514875177 A US201514875177 A US 201514875177A US 2016026696 A1 US2016026696 A1 US 2016026696A1
Authority
US
United States
Prior art keywords
aspects
entity
search results
query
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/875,177
Inventor
Jayant Madhavan
Fei Wu
Alon Yitzchak Halevy
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Priority to US14/875,177 priority Critical patent/US20160026696A1/en
Assigned to GOOGLE INC. reassignment GOOGLE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HALEVY, ALON, MADHAVAN, JAYANT, WU, FEI
Publication of US20160026696A1 publication Critical patent/US20160026696A1/en
Assigned to GOOGLE LLC reassignment GOOGLE LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: GOOGLE INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24575Query processing with adaptation to user needs using context
    • G06F17/30554
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • G06F17/30528
    • G06F17/3053
    • G06F17/30867

Definitions

  • This specification relates to providing, in response to search queries, information identifying aspects of entities identified in the search queries, and using the aspects in presenting information in response to the search queries.
  • Internet search engines provide information about Internet accessible resources (e.g., Web pages, images, text documents, multimedia content) that are responsive to a user's search query and present information about the resources in a manner that is useful to the user.
  • Internet search engines return a set of search results (e.g., as a ranked list of results) in response to a user submitted query.
  • a search result includes, for example, a URL and a snippet of information from a corresponding resource.
  • Conventional search engines are implemented under an assumption that the user's search query can be satisfied by a single result, and work to help the user find that result. Unfortunately, users are not always looking for a single result, but are instead using the query as a starting point for exploration of an unknown space of information about something that they may initially refer to in a generic way.
  • a user may submit a query that names or refers to an entity as a starting point for exploring various aspects associated with that entity.
  • entity refers to text that names or identifies something. This something can be any object that can have associated properties (e.g., an object in the physical, conceptual or mythical world).
  • an entity can refer a location, a person, a fictional character, a state, a thing, an idea, and so on.
  • entity may also be used to refer to the thing itself.
  • aspects are different axes of information along which additional information about an entity can be obtained. For example, for an entity “Hawaii”, possible aspects can include “beaches,” “hotels,” and “weather.”
  • entity when used in reference to operations of an information retrieval system, e.g., a search engine, the term “aspect” refers to text that names the aspect in question, and otherwise, when the meaning is clear from context, the term may also be used to refer to the aspect itself.
  • a single ranked list of results provided by conventional search engines typically fail to provide users an overview of different aspects of the entity. Rather, the single ranked list often provides many results directed to a single or a small number of aspects. Additionally, the presented results typically do not identify the aspects represented.
  • This specification describes technologies relating to identifying aspects associated with entities.
  • one aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving a query in a computer system, the computer system comprising one or more computers, the query including an entity; generating in the computer system a group of candidate aspects for the entity; modifying in the computer system the group of candidate aspects to generate a group of modified candidate aspects comprising combining similar candidate aspects and grouping candidate aspects using one or more aspect classes each associated with one or more candidate aspects; ranking in the computer system one or more modified candidate aspects in the group of modified candidate aspects based on a diversity score and a popularity score; associating in the computer system one or more highest ranked modified candidate aspects with the entity; receiving in the computer system one or more sets of search results; and providing a presentation of the search results in response to the query, the presentation presenting the search results organized according to the aspects associated with the entity.
  • Other embodiments of this aspect include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods, encoded on computer storage devices.
  • the method can further include presenting a summary of information about an entity in accordance with an aspect.
  • the one or more sets of search results can include a set of search results responsive to the query.
  • Each of the one or more sets of search results can correspond to a respective aspect associated with the entity.
  • another aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving data identifying an entity; generating in a computer system a group of candidate aspects for the entity, the computer system comprising one or more computers; modifying in the computer system the group of candidate aspects to generate a group of modified candidate aspects, comprising combining similar candidate aspects and grouping candidate aspects using one or more aspect classes each associated with one or more candidate aspects; ranking in the computer system one or more modified candidate aspects in the group of modified candidate aspects based on a diversity score and a popularity score; and storing an association of one or more of the highest ranked modified candidates aspects with the entity in a data storage device of the computer system.
  • Other embodiments of this aspect include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods, encoded on computer storage devices.
  • the method can further include receiving a query including the entity; identifying one or more aspects associated with the entity; receiving search results responsive to the query; and presenting the search results based on the identified aspects.
  • the method can further include receiving a query including the entity; identifying one or more aspects associated with the entity; receiving one or more sets of search results, each set corresponding to one of the identified aspects; and presenting the search results based on the identified aspects.
  • the method can further include receiving data identifying one or more entity properties, where generating the group of candidate aspects includes using the one or more entity properties; and the one or more highest ranked candidates aspects are associated with both the entity and the entity properties.
  • the method can further include associating the entity with a class, the class having one or more class members including the entity; and where generating the group of candidate aspects includes generating candidate aspects corresponding to the entity and the class.
  • Generating the group of candidate aspects can include analyzing one or more first user search histories to identify queries associated with the entity; and analyzing one or more second user search histories to identify queries associated with a class member other than the entity.
  • Combining candidate aspects can include calculating similarity scores, where each similarity score is an estimate of similarity between two candidate aspects; and combining candidate aspects into a single modified candidate aspect based on the similarity scores.
  • Each candidate aspect can be expressed as text and the similarity score between two candidate aspects can be based on a comparison of the strings of text associated with each candidate aspect.
  • Calculating a similarity score between two candidate aspects can include receiving a respective set of search results for each aspect; and calculating the similarity score based on a comparison of the sets of search results.
  • the comparison of the sets of search results can include a comparison of paths of the search results in one of the sets of search results to paths of the search results in the other one of the sets of search results.
  • the comparison of the sets of search results can include a comparison of titles and snippets of the search results in one of the sets of search results to titles and snippets of the search results in the other one of the sets of search results.
  • Combining candidate aspects based on the similarity scores can further include using a graph partition algorithm to determine which aspects to combine.
  • Grouping candidate aspects using one or more aspect classes can include associating two or more candidate aspects with a respective aspect class; and grouping two or more candidate aspects into a single modified candidate aspect based on their aspect classes.
  • the single modified candidate aspect can be an aspect class.
  • Ranking one or more modified candidate aspects based on a diversity score and a popularity score can include calculating a popularity score for each aspect; ranking the aspect with the highest popularity score the highest; and ranking the remaining aspects by repeating the following steps one or more times: calculating a similarity score for each un-ranked aspect, where the similarity score compares the similarity of the un-ranked aspect to the ranked aspects; and assigning the next highest ranking to the aspect whose popularity score divided by its similarity score is the highest.
  • aspects of an entity in a search query can be identified. Aspects can be presented to make it easy for users to explore the search space along multiple axes. The use of aspects allows a user to explore the search space beyond the scope of his or her original query. The presentation of aspects also allows a user to quickly gain an overview of what the possible axes of search are. The presentation of aspects can allow a user to browse a search space efficiently, for example, by using faceted browsing. Information related to the aspects can be identified and presented to the user. This information can allow a user to quickly gain information he or she needs about multiple aspects of the entity. Mashups can be presented to a user as a way of visualizing information about the aspects of the entity. The mashups present information associated with several aspects in a single integrated interface.
  • FIG. 1 illustrates an example search system for providing search results relevant to submitted queries.
  • FIG. 2 illustrates an example method for associating aspects with an entity.
  • FIG. 3 illustrates an example of combining similar candidate aspects.
  • FIG. 4 illustrates an example of grouping aspects based on their aspect classes.
  • FIG. 5 illustrates an example of ranking an unranked aspect, given a pre-existing group of one or more ranked aspects.
  • FIG. 6 illustrates an example method for receiving a query including one or more terms corresponding to an entity and presenting search results based on the identified aspects of the entity.
  • FIG. 7 illustrates an example mashup displayed after a user submits a search query.
  • FIG. 8 illustrates an example architecture of a system.
  • FIG. 1 illustrates an example search system 114 for providing search results relevant to submitted queries as can be implemented in an internet, an intranet, or another client and server environment.
  • the search system 114 is an example of an information retrieval system in which the systems, components, and techniques described below can be implemented.
  • a user 102 can interact with the search system 114 through a client device 104 .
  • the client 104 can be a computer coupled to the search system 114 through a local area network (LAN) or wide area network (WAN), e.g., the Internet.
  • the search system 114 and the client device 104 can be one machine.
  • a user can install a desktop search application on the client device 104 .
  • the client device 104 will generally include a random access memory (RAM) 106 and a processor 108 .
  • RAM random access memory
  • a user 102 can submit a query 110 to a search engine 130 within a search system 114 .
  • the query 110 is transmitted through a network to the search system 114 .
  • the search system 114 can be implemented as, for example, computer programs running on one or more computers in one or more locations that are coupled to each other through a network.
  • the search system 114 includes an index database 122 and a search engine 130 .
  • the search system 114 responds to the query 110 by generating search results 128 , which are transmitted through the network to the client device 104 in a form that can be presented to the user 102 (e.g., a search results web page to be displayed in a web browser running on the client device 104 ).
  • the search engine 130 identifies resources that match the query 110 .
  • the search engine 130 may also identify a particular “snippet” or section of each resource that is relevant to the query.
  • the search engine 130 will generally include an indexing engine 120 that indexes resources (e.g., web pages, images, or news articles on the Internet) found in a corpus (e.g., a collection or repository of content), an index database 122 that stores the index information, and a ranking engine 152 (or other software) to rank the resources that match the query 110 .
  • the indexing and ranking of the resources can be performed using conventional techniques.
  • the search engine 130 can transmit the search results 128 through the network to the client device 104 , for example, for presentation to the user 102 .
  • the search system 114 may also maintain one or more user search histories based on the queries it receives from a user.
  • a user search history stores a sequence of queries received from a user.
  • User search histories may also include additional information such as which results were selected after a search was performed and how long each selected result was viewed.
  • the search system 114 includes an aspector 140 .
  • the aspector 140 can be implemented in one or more distinct systems coupled to the search system 114 .
  • the aspector 140 associates aspects with particular entities.
  • the aspector 140 can receive the query 110 and, in conjunction with the search engine 130 , provide aspect based search results to the user 102 . Identifying and using aspects will be described in greater detail below.
  • FIG. 2 illustrates an example method 200 for associating aspects with an entity.
  • the example method 200 will be described in reference to a system that performs the method 200 .
  • the system can be, for example, the search system 114 , or a separate system.
  • the system receives an entity (step 202 ).
  • An entity can be any object that can have associated properties (e.g., an object in the physical or conceptual world). For example, an entity can be a location, a person, a thing, an idea, etc.
  • the system can receive the entities from a variety of sources. For example, the system can receive an entity directly from a user or in response to actions performed by the system (e.g., the action of executing a process).
  • An entity can be extracted from a search query received from a user or the search system 114 , for example, by parsing the query and comparing the terms of the query to a database of possible entities. Other sources of an entity are also possible, for example, an entity can be extracted from query data, such as user search histories.
  • the system also receives data identifying one or more properties of the entity.
  • Properties of entities are additional elements associated with an entity that can be used to further refine the entity. For example, “travel” can be a property of the entity “Vietnam” because people travel to Vietnam.
  • the system generates a group of candidate aspects for the entity (step 204 ).
  • the candidate aspects can be generated based on the entity, or alternatively, based on a class associated with the entity.
  • the class is an abstraction of the entity. For example, “chocolate cake” could be associated with the class “food,” because chocolate cake is a type of food. “Daffodil” could be associated with the class “flower,” because a daffodil is a type of flower.
  • the class can have multiple members. Each member is also an entity. For example, the class “flowers” could include many types of flowers, including “tulips,” “alstroemeria,” “roses,” and so on.
  • generating a group of candidate aspects for the entity includes analyzing query data for queries including the entity.
  • the query data can be analyzed, for example, to identify query refinements and query super-strings.
  • a query refinement occurs when a user first issues a query for the entity, and then follows that query with another related query. For example, if a user issues a query for “popcorn” followed by a query for “microwave popcorn,” microwave popcorn can be identified as a query refinement for popcorn.
  • Query refinements do not have to include the original query. For example, if a user issues a query for “computer” followed by a query for “laptop,” laptop can be identified as a query refinement for computer.
  • Query refinements can provide valuable information about an entity, because they indicate how a given user chose to explore the search space for the entity.
  • Query refinements can be generated as follows.
  • One or more user search histories including queries for the entity can identified.
  • Each user search history is then divided into sessions, where each session represents a group of queries issued by a given user for a given information finding task.
  • a session can be measured in a number of ways including, for example, by a specified period of time (for example, thirty minutes), by a specified number of queries (for example, 15 queries), until a specified period of inactivity (for example, ten minutes without performing a search), or while a user is logged-in to a search system.
  • the sessions that do not include a query for the entity can be filtered out.
  • the queries that follow a query for the entity in the remaining sessions are query refinements.
  • Each of the query refinements indicates a potential candidate aspect.
  • a candidate aspect can be the query refinement itself, or the part of the query refinement that does not include the entity.
  • Candidate aspects can also be identified by analyzing the query refinement using linguistic analysis techniques, for example, using dictionaries or statistical analysis to identify the terms in the query refinement that are most likely to be aspects, or by looking the query refinement up in a database that associates query refinements with aspects.
  • Potential candidate aspects can be aggregated across users, and candidate aspects that do not appear more than a threshold number of times can be filtered out.
  • query refinements are generated for a query based on both the entity in the query and the entity's associated properties, instead of just the entity.
  • a query is a super-string of another query when it includes the other query.
  • “Vietnam travel package” is a super-string of “Vietnam travel,” because it includes the text “Vietnam travel.”
  • a query super-string does not have to be sent during the same session as the query for which it is a super-string.
  • Query super-strings can be generated by considering one or more user search histories and identifying queries that include the entity. Each query super-string indicates a potential candidate aspect. For example, a candidate aspect can be the part of the query super-string that does not include the entity. In some implementations, the query super-string is filtered to remove common words such as “a” and “the” before the candidate aspect is identified. Candidate aspects can also be identified from the query super-string using linguistic techniques or a database, as described above. Potential candidate aspects can be aggregated across users, and candidate aspects that do not appear more than a threshold number of times can be filtered out.
  • query super strings are identified for queries that include text naming the entity and its properties, rather than just the entity.
  • the system associates the entity with a class and generates class-based candidate aspects for the entity.
  • the system associates the entity with a class based on a pre-defined database that associates entities with classes.
  • This pre-defined database can be generated, for example, by analyzing knowledge base information (e.g., information from WikipediaTM, run by the Wikimedia Foundation, or FreebaseTM, run by Metaweb Technologies).
  • knowledge base information e.g., information from WikipediaTM, run by the Wikimedia Foundation, or FreebaseTM, run by Metaweb Technologies.
  • a knowledge base is a collection of information for one or more entities.
  • Knowledge bases can specify relationships between entities, such as class relationships, and can also specify features of entities. For example, a knowledge base could specify that “Canada” is in a class called “country” and that one of its features is its “GDP.” Entity-class relationships can be identified from the knowledge base information, and associations based on the relationships can be stored in the database for future use.
  • the pre-defined database can also be generated by querying the search system 114 for Hearst patterns, e.g., if the entity is “Boston,” a query for “X such as Boston” can be issued to the search system. The results can then be analyzed for sentences including “such as Boston” and the resulting class can be identified. For example, if several of the search results included the phrase “cities such as Boston,” then Boston could be associated with a class of “city.”
  • the entity does not have to be a perfect match with an entity in the database in order for an association to be identified. For example, small differences such as whether the entity is singular or plural may be overlooked. For example, if the singular “rose” was stored in the database, but the entity was “roses,” the class information for rose could be used. Other small differences, such as spelling variations may also be overlooked.
  • the system associates the entity with a class on the fly, for example, by accessing knowledge base information (e.g., crawling a website such as WikipediaTM) and identifying a class associated with the received entity, or issuing a query with a Hearst pattern including the entity.
  • knowledge base information e.g., crawling a website such as WikipediaTM
  • Other techniques for associating an entity with a class are also possible.
  • the entity can be classified based on machine learning techniques, such as support vector machines.
  • a user can specify the class that is associated with an entity.
  • Class-based aspects can be generated, for example, by analyzing query data for queries including a class member other than the entity. For example, if the entity was “daffodils” and its class was “flowers,” then query data could be analyzed for queries including “roses,” because “roses” is one of the members of the flowers class.
  • the query data for the class member can be analyzed to identify aspects much as the query data for the entity is analyzed to identify aspects, as described above. When the entity is associated with one or more properties, these properties can be included with each class member for purposes of identifying aspects.
  • class-based aspects are generated only from class members that are sufficiently close to the entity, e.g., within a threshold of time or space or another measure of distance between entities.
  • the threshold can be a number of miles, or a number of days, or other measures of distance. The threshold can be determined empirically.
  • candidate aspects can be generated by analyzing knowledge base information associated with the entity or its class members.
  • Knowledge bases can provide binary relationships between a given entity and its features.
  • WikipediaTM provides an “Infobox” for some entities.
  • the Infobox for Cambodia lists features such as capital, flag, population, area, and GDP. These can provide additional aspects for the entity Cambodia.
  • Candidate aspects can also be retrieved from a database associating entities or class members with potential candidate aspects.
  • the candidate aspects are filtered based on user feedback on aspects that had been previously associated with entities and presented to users.
  • the user feedback can indicate which aspects are useful aspects of an entity, and which aspects are not useful aspects of an entity.
  • the user feedback can be used to directly filter out aspects that users have indicated are not useful.
  • the user feedback can be used as training inputs to train a machine to filter candidate aspects using machine learning techniques.
  • the system modifies the group of candidate aspects (step 206 ).
  • Modifying the group of candidate aspects can include combining similar candidate aspects and grouping candidate aspects based on a class of one or more candidate aspects. This combining and grouping reduces redundant aspects and helps focus the aspects on various axes of search.
  • FIG. 3 illustrates an example of combining similar candidate aspects.
  • An initial group of candidate aspects 302 contains four aspects: Aspect 1 , Aspect 1 ′, Aspect 2 , and Aspect 3 .
  • a similarity score can be calculated for each pair of aspects in the group of candidate aspects 302 .
  • Aspect 1 and Aspect 1 ′ have a similarity score 304 of 0.9.
  • Aspect 1 and aspect 2 have a similarity score 306 of 0.5, and Aspect 1 ′ and Aspect 2 have a similarity score 308 of 0.3.
  • calculating the similarity score for two aspects includes identifying a respective set of search results corresponding to a query for each aspect, and then comparing the search results.
  • the search results can be generated by issuing a query to a search engine (e.g., search engine 130 in FIG. 1 ) for each aspect.
  • the top n search results for each query are then chosen as the set of search results for the respective aspect (where n can be any integer chosen to give a sufficient amount of information for comparison, (e.g., 8 or 10)).
  • D i be the set of search results d i ⁇ D i that correspond to a first aspect
  • D j be the set of search results d j ⁇ D j that correspond to a second aspect being compared to the first aspect.
  • the similarity score for the two sets of search results, and therefore the two aspects, can be calculated as follows.
  • a feature vector is generated for each search result in D i and D j .
  • a feature vector can include one or more features (e.g., terms) and a corresponding statistical measure of the importance of the feature to the user (e.g., a term frequency (tf) weight or a term frequency inverse document (tf-idf) weight for each feature).
  • the terms can be all words in the search result, or a subset of the words of the search result (for example, the title of the result and the snippet identified by a search engine).
  • tf weights are used as statistical measures of the importance of a feature to the user.
  • the tf weights can be used because the importance of a feature to the user can increase proportionally according to the frequency with which the feature occurs (e.g., a term frequency) in a collection of documents, for example, all documents indexed by the search system (e.g., search system 114 in FIG. 1 ), or all documents indexed by the search system that are in the same language as the term.
  • frequency in a search result is the relative frequency that a particular term occurs in the search result, and can be represented as:
  • the term frequency is a number n q,p of occurrences of the particular term t q in a search result (d p ) divided by the number of occurrences of all terms t k in d p .
  • tf-idf weights are used as the statistical measures of the importance of the features to the user.
  • a tf-idf weight can be calculated by multiplying a term frequency with an inverse document frequency (idf).
  • the idf is an estimate of how frequently a term appears in a collection of documents, for example, all documents indexed by the search system, or all documents indexed by the search system that are in the same language as the term.
  • the inverse document frequency can be represented as:
  • idf q log ⁇ ⁇ D ⁇ ⁇ D p ⁇ : ⁇ ⁇ t q ⁇ d p ⁇ ,
  • the Napierian logarithm is used instead of the logarithm of base 10 .
  • a tdf idf weight can be represented as:
  • a similarity score is calculated for each pair of search results ⁇ d i ,d j ⁇ .
  • the similarity score for the two sets of search results, D i and D j , as a whole can be calculated based on the similarity scores between their individual search documents.
  • the similarities for each pair of search results is averaged.
  • the average of the highest similarity scores for each search result is used as follows:
  • sim ⁇ ( D i , D j ) ⁇ i ⁇ sim ⁇ ( d i , D j ) 2 ⁇ ⁇ D i ⁇ + ⁇ j ⁇ sim ⁇ ( d j , D i ) 2 ⁇ ⁇ D j ⁇ ,
  • max k sim(d i ,d k ) is the maximum similarity score of the similarity scores between the search result d i and all search results in D j
  • max k sim(d k ,d j ) is the maximum similarity score of the similarity scores between the search result d and all search results in D i .
  • Similarity measures can also be used, for example, determining a single feature vector for all search results for each aspect and calculating the similarity scores based on the similarity of the two feature vectors, e.g., based on the cosine distance.
  • the similarity score for two aspects can be calculated by comparing the paths (e.g., web addresses, file paths) of the search results for each aspect, for example, by parsing the text of the paths and extracting features, such as a domain name or directory in a file system, and then comparing the extracted features.
  • the similarity score for two aspects can also be calculated by comparing the text of the aspects themselves, for example, by comparing the characters in the text of the two aspects.
  • the similarity scores can be used to identify candidate aspects that should be combined into a single aspect.
  • Various clustering techniques can be used to determine when two candidate aspects should be combined.
  • a graph partition algorithm can be used. The graph partition algorithm creates a graph where the nodes of the graph are the aspects and an edge connects two nodes if they are sufficiently similar (e.g., if their similarity score exceeds a threshold). For example, in FIG. 3 , there is an edge (indicated by a solid line) between Aspect 1 and Aspect 1 ′, because the similarity score between Aspect 1 and Aspect 1 ′ is greater than the threshold value. However, there are no other connected edges in the graph.
  • the threshold value can be determined empirically, for example, based on a set of test aspects.
  • the graph partition algorithm then combines aspects that are connected into a single aspect. For example, in FIG. 3 , the resulting set of aspects 316 lists only Aspect 1 , Aspect 2 , and Aspect 3 . Aspect 1 ′ has been combined with Aspect 1 .
  • Combining two aspects can include keeping one aspect in the group of aspects and removing the other one from the group of aspects.
  • the decision of which aspect to keep can be made, for example, by selecting the aspect with the highest popularity score.
  • Aspect popularity scores are discussed in more detail below.
  • clustering techniques can be used, for example, k-means clustering (where aspects are divided into a pre-defined number of clusters based on the similarity scores), spectral clustering, hierarchical clustering, and star-clustering.
  • the candidate aspects can be grouped based on their classes. Aspect classes can be determined much as entity classes are determined, for example, as described above. In some implementations, determining an aspect class includes determining a synonym for the aspect, and then determining the synonym's class. For example, “New York University” is frequently abbreviated as “NYU.” However, it may be difficult to determine an aspect class for “NYU,” for example, because many knowledge bases only classify one of the possible names for a given entity. Therefore, there may be no data on which to base a classification of “NYU.” However, the more formal “New York University” is more likely to be included in knowledge bases.
  • a class for “NYU” can be determined by associating “NYU” with its synonym “New York University” and then identifying a class for the synonym.
  • Synonyms can be determined, for example, by looking the aspect up in a thesaurus or a dictionary. Synonyms can also be determined, for example, by using redirect web pages of a knowledge base such as WikipediaTM. The redirect pages indicate the mapping of various terms to a synonym that is classified by WikipediaTM.
  • aspects can be different from a similarity score perspective but still related in the sense that they belong to the same class.
  • the aspects can be grouped into the same class.
  • the aspects “New York,” “San Francisco,” and “Washington DC” are different because they point to different cities with different food, culture, streets, etc., yet can all be associated with the class “U.S. cities.”
  • the aspects can be grouped into the class “U.S. cities.”
  • aspects are grouped into a sub-class of their class.
  • “New York” and “Washington DC” are members of the class “U.S. cities” and the sub-class “East coast cities.” Therefore, they could alternatively be grouped together into “East coast cities.”
  • FIG. 4 illustrates an example of grouping aspects based on their aspect classes.
  • a group of aspects 402 is each associated with a respective class.
  • Aspect 1 and Aspect 3 are both in Class 1
  • Aspect 2 is in Class 2 .
  • the new group of aspects 404 includes Aspect 2 and Class 1 .
  • Aspect 2 remains unchanged in the new group of aspects 404 , because its class did not match the class of any other aspects.
  • Aspect 1 and Aspect 3 were combined into a new aspect equal to their class, Class 1 , because they had the same class.
  • some aspects are associated with multiple classes. Determining a class for these ambiguous aspects can be problematic. For example, imagine an entity “Vietnam” and two aspects “food” and “history.” Both of these aspects are ambiguous. In addition to referring to something you can eat, “food” could refer to the “F.O.O.D.” music album. In addition to referring to something in the past, “history” could refer to the “HIStory: Past, Present and Future, Book 1 ” music album. Thus, the two ambiguous aspects could be classified as “album,” and then grouped together into an “album” aspect. Food and history are two distinct aspects for exploring Vietnam, and there is value in keeping them separate. Therefore, they should not be grouped together. In some implementations, ambiguous aspects are not grouped, in order to avoid this potential problem.
  • Ambiguous aspects can be identified, for example, by using a disambiguation database that identifies aspects with multiple meanings
  • Ambiguous aspects can also be identified, for example, by using disambiguation web pages of a website such as WikipediaTM. These disambiguation pages identify multiple meanings for a given aspect.
  • the group is filtered, for example, to remove potentially offensive aspects (e.g., porn filtering).
  • This filtering can be done by comparing the aspects to a list of potentially offensive aspects, and removing any aspects that are on the list.
  • the system ranks one or more of the candidate aspects for the entity (step 280 ).
  • the candidate aspects are ranked based on a diversity score and a popularity score of each aspect.
  • the goal of the ranking is to identify aspects that are both interesting to the user and diverse enough to give a user choices on where to next direct his or her search. Ranking can be performed as follows.
  • the highest ranked aspect is the aspect with a highest popularity score.
  • the popularity score is a measure of how common the aspect is. Popularity scores can be calculated in various ways depending on how the aspect was generated.
  • the popularity score can be based on the frequency with which the query refinement appears, for example, by taking the total number of sessions that the query refinement appears in and dividing by the total number of sessions.
  • q) of a refinement q j of a query q can be calculated as follows:
  • fq(q j ) is the frequency with which the query refinement q j appears in the user search histories.
  • the popularity score can be based on the frequency with which the query super-string appears in the user search histories, for example, by taking the total number of times the super-string appears in the search histories and dividing that by the total number of query super-strings in the user search history plus the total number of times that a query for the entity appears in the user search histories.
  • q) for a given query super string q j can be calculated as follows:
  • fq(q j ) is the frequency with which the super-string query q j appears in the search histories
  • fq(q) is the frequency with which the query for the entity appears in the search histories
  • the popularity score can also be calculated by dividing the total number of times the super-string appears in the search histories by the total number of query super-strings in the search histories, for example:
  • the popularity score for that aspect can be determined in a number of ways, including, for example, taking the higher of the two scores, taking the average of the two scores, or taking the lower of the two scores.
  • the popularity score for class-based aspects can be adjusted so that aspects associated with the class do not overwhelm aspects associated with the specific entity. Rarer entities require the class-based aspects in order to have a sufficient number and variety of aspects. However, more popular entities may have entity-based aspects that are more important than the class-based aspects. A balance can be struck by weighting the scores of the aspects.
  • a candidate aspect a i of a query q which contains an entity of class C can be assigned a weighted score p(a i
  • K is a design parameter controlling the relative importance of the individual score of the aspect and the class score and can be determined empirically and
  • count(a) is the number of queries in the query log that included the aspect a
  • is the number of entities in the class C.
  • the popularity score for a class-based candidate aspect can also reflect how close the entity is to the class member the aspect is based on, e.g., from a time or space or other perspective. For example, if the entity is “November,” an aspect based on the class member “December” might have a better score than an aspect based on the class member “May,” because November is closer to December than May in the order of months. As another example, if the entity is “San Francisco,” an aspect based on “Los Angeles” might have a better score than an aspect based on “New York,” because San Francisco is closer to Los Angeles than New York from a distance perspective.
  • the popularity score can be based on the click through rate for a given aspect, e.g., the number of times users selected a search result after issuing a query for the aspect (or the entity and the aspect), divided by the total number of times users issued queries for the aspect.
  • the popularity score can also be based on the dwell time associated with one or more of the search results corresponding to a query for the aspect or the aspect and the entity. Dwell time is the amount of time a user spends viewing a search result.
  • Dwell time can be a continuous number, such as the number of seconds a user spends viewing a search result, or it can be a discrete interval, for example “short clicks” corresponding to clicks of less than thirty seconds, “medium clicks” corresponding to clicks of more than thirty seconds but less than one minute, and “long clicks” corresponding to clicks of more than one minute.
  • a longer dwell time of one or more results is associated with a higher popularity score. The score is higher because users found the results with a longer dwell time useful enough to view for a longer period of time.
  • the diversity score for an un-ranked aspect can be generated, for example, by calculating a similarity score between the un-ranked aspect and each ranked aspect, and then taking the minimum, maximum, or average of the scores.
  • FIG. 5 illustrates an example of ranking an unranked aspect 502 , given a pre-existing group of one or more ranked aspects 508 .
  • a popularity score 506 is generated for the unranked aspect 502 using a popularity score generator 504 .
  • the popularity score generator generates a popularity score for the aspect, for example, as described above.
  • a diversity score 512 is then generated for the unranked aspect 502 by the diversity score generator 510 .
  • the diversity score 512 is an estimate of how similar the unranked aspect 502 is to the ranked aspects 508 .
  • the diversity score between the unranked aspect 502 and the set of ranked aspects 508 can be determined by calculating the similarity score between the unranked aspect 502 and each ranked aspect in the set 508 , for example as described above, and then using the minimum, maximum, average, or sum of the scores as the diversity score.
  • the overall score generator 514 generates an overall score 516 based on the popularity score 506 and the diversity score 612 , for example, by dividing the popularity score 506 by the diversity score 512 .
  • the highest ranked candidate aspect can be chosen based on the popularity score, and all subsequent aspects can be chosen based on the diversity score (for example, by choosing the aspect with the lowest diversity score).
  • the candidate aspects can also be ranked based just on their popularity scores or just on their diversity scores.
  • the association is stored in a location accessible to the system, for example, in a database that associates a given entity with its aspects.
  • FIG. 6 illustrates an example method 600 for receiving a query including one or more terms corresponding to an entity and presenting search results based on the identified aspects of the entity.
  • the example method 600 will be described with reference to a system (e.g., the search system 114 of FIG. 1 or another system) that performs the method 600 .
  • the method can be performed in conjunction with the method described above in reference to FIG. 2 .
  • the system receives a query including one or more terms corresponding to an entity (step 602 ).
  • the query can be received, for example, from a user or from the search system 114 .
  • the system and the search system 114 are the same system.
  • the system identifies aspects associated with the entity (step 604 ).
  • the query includes an entity and its properties, and the system can identify aspects associated with the entity and its properties. For example, if the query is “Hawaii vacation” then “Hawaii” could be identified as the entity, and “vacation” could be identified as a property of the entity “Hawaii.”
  • the aspects can be identified as described above in reference to FIG. 2 , or can be retrieved, for example, from a database including ranked aspects generated using the method described above in reference to FIG. 2 .
  • the system can identify all aspects associated with the entity. When the aspects are ranked, the system can alternatively identify a top k number of the ranked aspects, where k is the number of aspects that are going to be presented to the user.
  • the system receives one or more sets of search results (step 606 ).
  • Each set of search results corresponds to an entity and one of the identified aspects. For example, if the entity was “Hawaii” and the identified aspects were “beaches,” “hotels,” “weather,” and “food,” separate sets of search results could be received for “Hawaii beaches,” “Hawaii hotels,” “Hawaii weather,” and “Hawaii food.”
  • the search results can be received in response to a query issued to the search engine 130 for the entity and an aspect.
  • FIG. 7 illustrates an example mashup displayed after a user submits a search query 702 for “mount bachelor” by clicking on the search button 704 .
  • Search results and other information corresponding to aspects for Mount Bachelor e.g., “weather,” “hotels” “community college” and “mountains” are labeled in accordance with the aspect and presented to the user in the boxes 706 , 708 , 710 , and 712 .
  • the presentation of information can be tailored to the aspect. For example, a ski and snow report is presented in box 706 for users interested in the “weather” aspect.
  • Search results corresponding to “hotels” are presented in box 708
  • search results corresponding to “community college” are presented in box 710
  • search results corresponding to “mountains” are presented in box 712 .
  • all search results for a given aspect are not necessarily presented for that aspect. For example, more search results for the “hotels” aspect than the two search results that are presented can be received.
  • the search results that are presented are chosen from the search results that are received, for example, by taking a top number of search results based on a ranking of the search results (e.g., a ranking provided by the search system 114 ). The number can be determined, for example, based on the number of aspects for the entity and/or the space available for presentation of the search results. Search results do not have to be presented for all identified aspects.
  • a summary of the entity in accordance with one of the aspects is presented.
  • a summary of an entity in accordance with an aspect is a direct presentation of information that is available through search results corresponding to the entity and the aspect.
  • the ski and snow report presented in box 706 is a summary of information for the entity “mount bachelor” and the aspect “weather.”
  • a user interested in the “weather” aspect is likely interested in knowing the current weather, so rather than requiring the user to click on a search result to see weather information, the system can instead directly present information on the weather.
  • the entity is “University of Southern California football team” and the aspect is “season record,” a summary of the team's season record can be presented.
  • the summary is associated with an aspect and an entity in advance and stored, for example, in a database. The system can then retrieve the summary when needed.
  • the system can create a separate web page for the search results corresponding to each aspect. Links to the web pages corresponding to the identified aspects can be presented along with search results for the original query. Alternatively, the links to the web pages can be presented as a separate web page.
  • the system can present the aspects as “related search” options for the user, and then present the search results corresponding to a given aspect once a user selects the aspect.
  • the query includes terms corresponding to multiple entities.
  • the system can identify the aspects associated with each query, and then combine the identified aspects based on their rank (e.g., based on a popularity score and a diversity score of each aspect). Search results for the top ranked aspects can then be received and presented to the user. Alternatively, the system can present search results for the aspects corresponding to each of the entities separately.
  • the system receives search results corresponding to the entity, rather than the entity and an aspect (for example, from the search system 114 ).
  • the search results can be grouped based on the aspects, for example, by sorting the search results based on the aspects, or using clustering techniques to cluster search results around the aspects.
  • the search results can be presented based on the aspects as described above.
  • FIG. 8 illustrates an example architecture of a system 800 .
  • the system generally includes a data processing apparatus 802 and a user device 828 .
  • the data processing apparatus 802 and user device 828 are connected through a network 826 .
  • the user device 828 and the data processing apparatus 802 are the same device.
  • the data processing apparatus 802 runs a number of modules, for example, processes, e.g. executable software programs. In various implementations, these processes include an entity-class associator 804 , aspect generator 806 , aspect combiner 808 , aspect grouper 810 , aspect ranker 812 , and aspect associator 814 .
  • processes include an entity-class associator 804 , aspect generator 806 , aspect combiner 808 , aspect grouper 810 , aspect ranker 812 , and aspect associator 814 .
  • the entity-class associator 804 associates a given entity with a class, for example, based on a pre-defined database that associates entities with classes or by accessing knowledgebase information for the entity.
  • the aspect generator 806 generates aspects for a given entity, for example, as described above in reference to FIG. 2 , by analyzing user search histories to identify query refinements and query superstrings for the entity, its class members, or the entity and its class members.
  • the aspect combiner 808 combines aspects, for example, as described above in reference to FIGS. 2 and 3 , based on their similarity scores.
  • the aspect combiner 808 may also calculate similarity scores for pairs of aspects as described above in reference to FIGS. 2 and 3 .
  • the aspect grouper 810 groups aspects based on their class, for example, as described above in reference to FIGS. 2 and 4 .
  • the aspect combiner 808 and the aspect grouper 810 are the same process.
  • the aspect ranker 812 ranks aspects based on a popularity score and a diversity score of each aspect, for example, as described above in reference to FIGS. 2 and 5 .
  • the aspect associator 814 associates one or more aspects with a given entity or a given entity and its properties, for example, as described above in reference to FIG. 2 .
  • the data processing apparatus 802 stores one or more of an entity-class database associating a given entity with its class, an aspect-class database associating a given aspect with its class, user search histories, and an entity-aspect database associating a given entity with one or more aspects.
  • the entity-class database and the aspect-class database are the same database.
  • the data is stored on a computer readable medium 820 . In some implementations, the data is stored on the additional device(s) 818 .
  • the data processing apparatus 802 may also have hardware or firmware devices including one or more processors 816 , one or more additional devices 818 , computer readable medium 820 , a communication interface 822 , and one or more user interface devices 824 .
  • the processor(s) 816 are capable of processing instructions for execution. In one implementation, at least one of the processor(s) 816 is a single-threaded processor. In another implementation, at least one of the processor(s) 816 is a multi-threaded processor.
  • the processor(s) 816 are capable of processing instructions stored in memory or on a storage device to display graphical information for a user interface on the user interface device(s) 824 .
  • User interface device(s) 824 can include, for example, a display, a camera, a speaker, a microphone, or a tactile feedback device.
  • the data processing apparatus 802 communicates with user device 828 using its communication interface 822 .
  • the user device 828 can be any data processing apparatus, for example, a user's computer.
  • a user uses the user device 828 to submit search queries through the network 826 to the data processing apparatus 802 and receive search results from the data processing apparatus 802 , for example, through a web-browser run on the user device, for example, FirefoxTM, available from the Mozilla Project in Mountain View, Calif.
  • the user device 828 may present the search results to the user, for example, by displaying the results on a display device, transmitting sound corresponding to the results, or providing tactile feedback corresponding to the results.
  • the search results may be organized according to aspects associated with the entity. When a user uses his or her computer to select a search result to view, information regarding the user selection can be sent to the data processing apparatus 802 and used to generate user search history data.
  • the user device 828 runs one or more of the modules 804 , 806 , 808 , 810 , 812 , and 814 instead of or in addition to the data processing apparatus 802 running the modules.
  • the user search histories can be received from a population of users, and not necessarily from the same user device 828 used to receive search results organized based on aspects of an entity in the search query.
  • Embodiments of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
  • Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a computer storage media for execution by, or to control the operation of, data processing apparatus.
  • the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
  • the computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them.
  • the operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
  • the term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or combinations of them.
  • the apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
  • the apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, e.g., a virtual machine, or a combination of one or more of them.
  • the apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.
  • a computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • a computer program does not necessarily correspond to a file in a file system.
  • a program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
  • a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
  • the processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
  • processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
  • a processor will receive instructions and data from a read-only memory or a random access memory or both.
  • the essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
  • mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
  • a computer need not have such devices.
  • a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few.
  • Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described is this specification, or any combination of one or more such back-end, middleware, or front-end components.
  • the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
  • LAN local area network
  • WAN wide area network
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Abstract

Methods, systems, and apparatus, including computer program products, for generating aspects associated with entities. In some implementations, a method includes receiving data identifying an entity; generating a group of candidate aspects for the entity; modifying the group of candidate aspects to generate a group of modified candidate aspects comprising combining similar candidate aspects and grouping candidate aspects using one or more aspect classes each associated with one or more candidate aspects; ranking one or more modified candidate aspects in the group of modified candidate aspects based on a diversity score and a popularity score; and storing an association between one or more highest ranked modified candidate aspects and the entity. The aspects can be used to organize and present search results in response to queries for the entity.

Description

    BACKGROUND
  • This specification relates to providing, in response to search queries, information identifying aspects of entities identified in the search queries, and using the aspects in presenting information in response to the search queries.
  • Internet search engines provide information about Internet accessible resources (e.g., Web pages, images, text documents, multimedia content) that are responsive to a user's search query and present information about the resources in a manner that is useful to the user. Internet search engines return a set of search results (e.g., as a ranked list of results) in response to a user submitted query. A search result includes, for example, a URL and a snippet of information from a corresponding resource. Conventional search engines are implemented under an assumption that the user's search query can be satisfied by a single result, and work to help the user find that result. Unfortunately, users are not always looking for a single result, but are instead using the query as a starting point for exploration of an unknown space of information about something that they may initially refer to in a generic way.
  • For example, a user may submit a query that names or refers to an entity as a starting point for exploring various aspects associated with that entity. When used in reference to operations of an information retrieval system, e.g., a search engine, the term “entity” refers to text that names or identifies something. This something can be any object that can have associated properties (e.g., an object in the physical, conceptual or mythical world). For example, an entity can refer a location, a person, a fictional character, a state, a thing, an idea, and so on. When the meaning is clear from context, and to avoid unnecessary verbiage, the term “entity” may also be used to refer to the thing itself.
  • Aspects are different axes of information along which additional information about an entity can be obtained. For example, for an entity “Hawaii”, possible aspects can include “beaches,” “hotels,” and “weather.” As with the term “entity”, when used in reference to operations of an information retrieval system, e.g., a search engine, the term “aspect” refers to text that names the aspect in question, and otherwise, when the meaning is clear from context, the term may also be used to refer to the aspect itself.
  • A single ranked list of results provided by conventional search engines typically fail to provide users an overview of different aspects of the entity. Rather, the single ranked list often provides many results directed to a single or a small number of aspects. Additionally, the presented results typically do not identify the aspects represented.
  • SUMMARY
  • This specification describes technologies relating to identifying aspects associated with entities.
  • In general, one aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving a query in a computer system, the computer system comprising one or more computers, the query including an entity; generating in the computer system a group of candidate aspects for the entity; modifying in the computer system the group of candidate aspects to generate a group of modified candidate aspects comprising combining similar candidate aspects and grouping candidate aspects using one or more aspect classes each associated with one or more candidate aspects; ranking in the computer system one or more modified candidate aspects in the group of modified candidate aspects based on a diversity score and a popularity score; associating in the computer system one or more highest ranked modified candidate aspects with the entity; receiving in the computer system one or more sets of search results; and providing a presentation of the search results in response to the query, the presentation presenting the search results organized according to the aspects associated with the entity. Other embodiments of this aspect include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods, encoded on computer storage devices.
  • These and other embodiments can each optionally include one or more of the following features. The method can further include presenting a summary of information about an entity in accordance with an aspect. The one or more sets of search results can include a set of search results responsive to the query. Each of the one or more sets of search results can correspond to a respective aspect associated with the entity.
  • In general, another aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving data identifying an entity; generating in a computer system a group of candidate aspects for the entity, the computer system comprising one or more computers; modifying in the computer system the group of candidate aspects to generate a group of modified candidate aspects, comprising combining similar candidate aspects and grouping candidate aspects using one or more aspect classes each associated with one or more candidate aspects; ranking in the computer system one or more modified candidate aspects in the group of modified candidate aspects based on a diversity score and a popularity score; and storing an association of one or more of the highest ranked modified candidates aspects with the entity in a data storage device of the computer system. Other embodiments of this aspect include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods, encoded on computer storage devices.
  • These and other embodiments can each optionally include one or more of the following features. The method can further include receiving a query including the entity; identifying one or more aspects associated with the entity; receiving search results responsive to the query; and presenting the search results based on the identified aspects. The method can further include receiving a query including the entity; identifying one or more aspects associated with the entity; receiving one or more sets of search results, each set corresponding to one of the identified aspects; and presenting the search results based on the identified aspects.
  • The method can further include receiving data identifying one or more entity properties, where generating the group of candidate aspects includes using the one or more entity properties; and the one or more highest ranked candidates aspects are associated with both the entity and the entity properties. The method can further include associating the entity with a class, the class having one or more class members including the entity; and where generating the group of candidate aspects includes generating candidate aspects corresponding to the entity and the class. Generating the group of candidate aspects can include analyzing one or more first user search histories to identify queries associated with the entity; and analyzing one or more second user search histories to identify queries associated with a class member other than the entity.
  • Combining candidate aspects can include calculating similarity scores, where each similarity score is an estimate of similarity between two candidate aspects; and combining candidate aspects into a single modified candidate aspect based on the similarity scores. Each candidate aspect can be expressed as text and the similarity score between two candidate aspects can be based on a comparison of the strings of text associated with each candidate aspect. Calculating a similarity score between two candidate aspects can include receiving a respective set of search results for each aspect; and calculating the similarity score based on a comparison of the sets of search results. The comparison of the sets of search results can include a comparison of paths of the search results in one of the sets of search results to paths of the search results in the other one of the sets of search results. The comparison of the sets of search results can include a comparison of titles and snippets of the search results in one of the sets of search results to titles and snippets of the search results in the other one of the sets of search results. Combining candidate aspects based on the similarity scores can further include using a graph partition algorithm to determine which aspects to combine.
  • Grouping candidate aspects using one or more aspect classes can include associating two or more candidate aspects with a respective aspect class; and grouping two or more candidate aspects into a single modified candidate aspect based on their aspect classes. The single modified candidate aspect can be an aspect class.
  • Ranking one or more modified candidate aspects based on a diversity score and a popularity score can include calculating a popularity score for each aspect; ranking the aspect with the highest popularity score the highest; and ranking the remaining aspects by repeating the following steps one or more times: calculating a similarity score for each un-ranked aspect, where the similarity score compares the similarity of the un-ranked aspect to the ranked aspects; and assigning the next highest ranking to the aspect whose popularity score divided by its similarity score is the highest.
  • Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages. Aspects of an entity in a search query can be identified. Aspects can be presented to make it easy for users to explore the search space along multiple axes. The use of aspects allows a user to explore the search space beyond the scope of his or her original query. The presentation of aspects also allows a user to quickly gain an overview of what the possible axes of search are. The presentation of aspects can allow a user to browse a search space efficiently, for example, by using faceted browsing. Information related to the aspects can be identified and presented to the user. This information can allow a user to quickly gain information he or she needs about multiple aspects of the entity. Mashups can be presented to a user as a way of visualizing information about the aspects of the entity. The mashups present information associated with several aspects in a single integrated interface.
  • The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the invention will become apparent from the description, the drawings, and the claims.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 illustrates an example search system for providing search results relevant to submitted queries.
  • FIG. 2 illustrates an example method for associating aspects with an entity.
  • FIG. 3 illustrates an example of combining similar candidate aspects.
  • FIG. 4 illustrates an example of grouping aspects based on their aspect classes.
  • FIG. 5 illustrates an example of ranking an unranked aspect, given a pre-existing group of one or more ranked aspects.
  • FIG. 6 illustrates an example method for receiving a query including one or more terms corresponding to an entity and presenting search results based on the identified aspects of the entity.
  • FIG. 7 illustrates an example mashup displayed after a user submits a search query.
  • FIG. 8 illustrates an example architecture of a system.
  • Like reference numbers and designations in the various drawings indicate like elements.
  • DETAILED DESCRIPTION
  • FIG. 1 illustrates an example search system 114 for providing search results relevant to submitted queries as can be implemented in an internet, an intranet, or another client and server environment. The search system 114 is an example of an information retrieval system in which the systems, components, and techniques described below can be implemented.
  • A user 102 can interact with the search system 114 through a client device 104. For example, the client 104 can be a computer coupled to the search system 114 through a local area network (LAN) or wide area network (WAN), e.g., the Internet. In some implementations, the search system 114 and the client device 104 can be one machine. For example, a user can install a desktop search application on the client device 104. The client device 104 will generally include a random access memory (RAM) 106 and a processor 108.
  • A user 102 can submit a query 110 to a search engine 130 within a search system 114. When the user 102 submits a query 110, the query 110 is transmitted through a network to the search system 114. The search system 114 can be implemented as, for example, computer programs running on one or more computers in one or more locations that are coupled to each other through a network. The search system 114 includes an index database 122 and a search engine 130. The search system 114 responds to the query 110 by generating search results 128, which are transmitted through the network to the client device 104 in a form that can be presented to the user 102 (e.g., a search results web page to be displayed in a web browser running on the client device 104).
  • When the query 110 is received by the search engine 130, the search engine 130 identifies resources that match the query 110. The search engine 130 may also identify a particular “snippet” or section of each resource that is relevant to the query. The search engine 130 will generally include an indexing engine 120 that indexes resources (e.g., web pages, images, or news articles on the Internet) found in a corpus (e.g., a collection or repository of content), an index database 122 that stores the index information, and a ranking engine 152 (or other software) to rank the resources that match the query 110. The indexing and ranking of the resources can be performed using conventional techniques. The search engine 130 can transmit the search results 128 through the network to the client device 104, for example, for presentation to the user 102.
  • The search system 114 may also maintain one or more user search histories based on the queries it receives from a user. Generally speaking, a user search history stores a sequence of queries received from a user. User search histories may also include additional information such as which results were selected after a search was performed and how long each selected result was viewed.
  • In some implementations, the search system 114 includes an aspector 140. Alternatively, the aspector 140 can be implemented in one or more distinct systems coupled to the search system 114. The aspector 140 associates aspects with particular entities. Additionally, the aspector 140 can receive the query 110 and, in conjunction with the search engine 130, provide aspect based search results to the user 102. Identifying and using aspects will be described in greater detail below.
  • FIG. 2 illustrates an example method 200 for associating aspects with an entity. For convenience, the example method 200 will be described in reference to a system that performs the method 200. The system can be, for example, the search system 114, or a separate system.
  • The system receives an entity (step 202). An entity can be any object that can have associated properties (e.g., an object in the physical or conceptual world). For example, an entity can be a location, a person, a thing, an idea, etc. The system can receive the entities from a variety of sources. For example, the system can receive an entity directly from a user or in response to actions performed by the system (e.g., the action of executing a process). An entity can be extracted from a search query received from a user or the search system 114, for example, by parsing the query and comparing the terms of the query to a database of possible entities. Other sources of an entity are also possible, for example, an entity can be extracted from query data, such as user search histories.
  • In some implementations, the system also receives data identifying one or more properties of the entity. Properties of entities are additional elements associated with an entity that can be used to further refine the entity. For example, “travel” can be a property of the entity “Vietnam” because people travel to Vietnam.
  • The system generates a group of candidate aspects for the entity (step 204). The candidate aspects can be generated based on the entity, or alternatively, based on a class associated with the entity. The class is an abstraction of the entity. For example, “chocolate cake” could be associated with the class “food,” because chocolate cake is a type of food. “Daffodil” could be associated with the class “flower,” because a daffodil is a type of flower. The class can have multiple members. Each member is also an entity. For example, the class “flowers” could include many types of flowers, including “tulips,” “alstroemeria,” “roses,” and so on.
  • In some implementations, both entity-based aspects and class-based aspects are used. Reliance on both entity-based aspects and class-based aspects can result in a more robust set of aspects. For example, some entities are so rare that there will be a small amount of data to base the aspects on. For these entities, relying on class based aspects can increase the number of candidate aspects. However, some entities are very popular and may have entity-specific aspects that can be identified, for example, from user search histories. Therefore, also including entity-based aspects can be useful for these more popular entities.
  • In some implementations, generating a group of candidate aspects for the entity includes analyzing query data for queries including the entity. The query data can be analyzed, for example, to identify query refinements and query super-strings.
  • A query refinement occurs when a user first issues a query for the entity, and then follows that query with another related query. For example, if a user issues a query for “popcorn” followed by a query for “microwave popcorn,” microwave popcorn can be identified as a query refinement for popcorn. Query refinements do not have to include the original query. For example, if a user issues a query for “computer” followed by a query for “laptop,” laptop can be identified as a query refinement for computer. Query refinements can provide valuable information about an entity, because they indicate how a given user chose to explore the search space for the entity.
  • Query refinements can be generated as follows. One or more user search histories including queries for the entity can identified. Each user search history is then divided into sessions, where each session represents a group of queries issued by a given user for a given information finding task. A session can be measured in a number of ways including, for example, by a specified period of time (for example, thirty minutes), by a specified number of queries (for example, 15 queries), until a specified period of inactivity (for example, ten minutes without performing a search), or while a user is logged-in to a search system.
  • The sessions that do not include a query for the entity can be filtered out. The queries that follow a query for the entity in the remaining sessions are query refinements. Each of the query refinements indicates a potential candidate aspect. For example, a candidate aspect can be the query refinement itself, or the part of the query refinement that does not include the entity. Candidate aspects can also be identified by analyzing the query refinement using linguistic analysis techniques, for example, using dictionaries or statistical analysis to identify the terms in the query refinement that are most likely to be aspects, or by looking the query refinement up in a database that associates query refinements with aspects. Potential candidate aspects can be aggregated across users, and candidate aspects that do not appear more than a threshold number of times can be filtered out.
  • In some implementations, query refinements are generated for a query based on both the entity in the query and the entity's associated properties, instead of just the entity.
  • Generally speaking, a query is a super-string of another query when it includes the other query. For example, “Vietnam travel package” is a super-string of “Vietnam travel,” because it includes the text “Vietnam travel.” Unlike query-refinements, a query super-string does not have to be sent during the same session as the query for which it is a super-string.
  • Query super-strings can be generated by considering one or more user search histories and identifying queries that include the entity. Each query super-string indicates a potential candidate aspect. For example, a candidate aspect can be the part of the query super-string that does not include the entity. In some implementations, the query super-string is filtered to remove common words such as “a” and “the” before the candidate aspect is identified. Candidate aspects can also be identified from the query super-string using linguistic techniques or a database, as described above. Potential candidate aspects can be aggregated across users, and candidate aspects that do not appear more than a threshold number of times can be filtered out.
  • In some implementations, query super strings are identified for queries that include text naming the entity and its properties, rather than just the entity.
  • In some implementations, the system associates the entity with a class and generates class-based candidate aspects for the entity.
  • In some implementations, the system associates the entity with a class based on a pre-defined database that associates entities with classes. This pre-defined database can be generated, for example, by analyzing knowledge base information (e.g., information from Wikipedia™, run by the Wikimedia Foundation, or Freebase™, run by Metaweb Technologies). Generally speaking, a knowledge base is a collection of information for one or more entities. Knowledge bases can specify relationships between entities, such as class relationships, and can also specify features of entities. For example, a knowledge base could specify that “Canada” is in a class called “country” and that one of its features is its “GDP.” Entity-class relationships can be identified from the knowledge base information, and associations based on the relationships can be stored in the database for future use. The pre-defined database can also be generated by querying the search system 114 for Hearst patterns, e.g., if the entity is “Boston,” a query for “X such as Boston” can be issued to the search system. The results can then be analyzed for sentences including “such as Boston” and the resulting class can be identified. For example, if several of the search results included the phrase “cities such as Boston,” then Boston could be associated with a class of “city.” In some implementations, the entity does not have to be a perfect match with an entity in the database in order for an association to be identified. For example, small differences such as whether the entity is singular or plural may be overlooked. For example, if the singular “rose” was stored in the database, but the entity was “roses,” the class information for rose could be used. Other small differences, such as spelling variations may also be overlooked.
  • In some implementations, the system associates the entity with a class on the fly, for example, by accessing knowledge base information (e.g., crawling a website such as Wikipedia™) and identifying a class associated with the received entity, or issuing a query with a Hearst pattern including the entity. Other techniques for associating an entity with a class are also possible. For example, the entity can be classified based on machine learning techniques, such as support vector machines. Alternatively, a user can specify the class that is associated with an entity.
  • Class-based aspects can be generated, for example, by analyzing query data for queries including a class member other than the entity. For example, if the entity was “daffodils” and its class was “flowers,” then query data could be analyzed for queries including “roses,” because “roses” is one of the members of the flowers class. The query data for the class member can be analyzed to identify aspects much as the query data for the entity is analyzed to identify aspects, as described above. When the entity is associated with one or more properties, these properties can be included with each class member for purposes of identifying aspects. In some implementations, class-based aspects are generated only from class members that are sufficiently close to the entity, e.g., within a threshold of time or space or another measure of distance between entities. For example, “Canada” “Belgium” and “France” are all in the class “country”. However, Belgium and France are neighboring countries. Therefore, if the entity is “Belgium” the system can identify class-based aspects based on the class member “France” but not the class member “Canada,” because Canada is too far away from Belgium. The threshold can be a number of miles, or a number of days, or other measures of distance. The threshold can be determined empirically.
  • Other methods of generating candidate aspects are also possible, for example, candidate aspects can be generated by analyzing knowledge base information associated with the entity or its class members. Knowledge bases can provide binary relationships between a given entity and its features. For example, Wikipedia™ provides an “Infobox” for some entities. The Infobox for Cambodia lists features such as capital, flag, population, area, and GDP. These can provide additional aspects for the entity Cambodia. Candidate aspects can also be retrieved from a database associating entities or class members with potential candidate aspects.
  • In some implementations, the candidate aspects are filtered based on user feedback on aspects that had been previously associated with entities and presented to users. The user feedback can indicate which aspects are useful aspects of an entity, and which aspects are not useful aspects of an entity. The user feedback can be used to directly filter out aspects that users have indicated are not useful. Alternatively, the user feedback can be used as training inputs to train a machine to filter candidate aspects using machine learning techniques.
  • The system modifies the group of candidate aspects (step 206). Modifying the group of candidate aspects can include combining similar candidate aspects and grouping candidate aspects based on a class of one or more candidate aspects. This combining and grouping reduces redundant aspects and helps focus the aspects on various axes of search.
  • Often similar aspects are generated. For example, for the query “Vietnam travel” the aspects “package” “packages” and “deal” could all be generated. All of these aspects refer to the same basic concept—a product bundling various aspects of a trip into one package. Consequently, these aspects can be combined into a single aspect.
  • FIG. 3 illustrates an example of combining similar candidate aspects. An initial group of candidate aspects 302 contains four aspects: Aspect 1, Aspect 1′, Aspect 2, and Aspect 3.
  • A similarity score can be calculated for each pair of aspects in the group of candidate aspects 302. For example, Aspect 1 and Aspect 1′ have a similarity score 304 of 0.9. Aspect 1 and aspect 2 have a similarity score 306 of 0.5, and Aspect 1′ and Aspect 2 have a similarity score 308 of 0.3.
  • In some implementations, calculating the similarity score for two aspects includes identifying a respective set of search results corresponding to a query for each aspect, and then comparing the search results. The search results can be generated by issuing a query to a search engine (e.g., search engine 130 in FIG. 1) for each aspect. The top n search results for each query are then chosen as the set of search results for the respective aspect (where n can be any integer chosen to give a sufficient amount of information for comparison, (e.g., 8 or 10)). For purposes of illustration, let Di be the set of search results diεDi that correspond to a first aspect, and let Dj be the set of search results djεDj that correspond to a second aspect being compared to the first aspect. The similarity score for the two sets of search results, and therefore the two aspects, can be calculated as follows.
  • A feature vector is generated for each search result in Di and Dj. For example, a feature vector can include one or more features (e.g., terms) and a corresponding statistical measure of the importance of the feature to the user (e.g., a term frequency (tf) weight or a term frequency inverse document (tf-idf) weight for each feature). The terms can be all words in the search result, or a subset of the words of the search result (for example, the title of the result and the snippet identified by a search engine).
  • In some implementations, tf weights are used as statistical measures of the importance of a feature to the user. The tf weights can be used because the importance of a feature to the user can increase proportionally according to the frequency with which the feature occurs (e.g., a term frequency) in a collection of documents, for example, all documents indexed by the search system (e.g., search system 114 in FIG. 1), or all documents indexed by the search system that are in the same language as the term.
  • The term frequency in a search result is the relative frequency that a particular term occurs in the search result, and can be represented as:
  • tf q , p = n q , p k n k , p ,
  • where the term frequency is a number nq,p of occurrences of the particular term tq in a search result (dp) divided by the number of occurrences of all terms tk in dp.
  • In some implementations, tf-idf weights are used as the statistical measures of the importance of the features to the user. A tf-idf weight can be calculated by multiplying a term frequency with an inverse document frequency (idf).
  • The idf is an estimate of how frequently a term appears in a collection of documents, for example, all documents indexed by the search system, or all documents indexed by the search system that are in the same language as the term. The inverse document frequency can be represented as:
  • idf q = log D D p : t q d p ,
  • where the number D of all documents in the corpus of documents is divided by a number Dp of documents dp containing the term tq. In some implementations, the Napierian logarithm is used instead of the logarithm of base 10.
  • A tdf idf weight can be represented as:

  • tf idf q,p =tf q,p ·idf q,p.
  • A similarity score is calculated for each pair of search results {di,dj}. The similarity score for each pair can be calculated by determining the distance between the feature vectors for the two results. For example, if the a search result di has a feature vector of X=(x1, x2, x3) and a search result dj has a feature vector of Y=(y1, y2, y3), sim(di, dj) and a search result dj has a feature vector of Y=(y1, y2, y3), sim(di, dj) can be represented as a cosine distance:
  • sim ( d i , d j ) = cosine distance = X · Y X · Y = x 1 · y 1 + x 2 · y 2 + x 3 · y 3 x 1 2 + x 2 2 + x 3 2 · y 1 2 + y 2 2 + y 3 2 .
  • The similarity score for the two sets of search results, Di and Dj, as a whole can be calculated based on the similarity scores between their individual search documents. In some implementations, the similarities for each pair of search results is averaged. In some implementations, the average of the highest similarity scores for each search result is used as follows:
  • sim ( D i , D j ) = Σ i sim ( d i , D j ) 2 D i + Σ j sim ( d j , D i ) 2 D j ,
  • where
  • sim(di,Dj)=maxksim(di,dk) and sim(dj,Di)=maxksim(dk,dj),
  • and where maxksim(di,dk) is the maximum similarity score of the similarity scores between the search result di and all search results in Dj, and maxksim(dk,dj) is the maximum similarity score of the similarity scores between the search result d and all search results in Di.
  • Other similarity measures can also be used, for example, determining a single feature vector for all search results for each aspect and calculating the similarity scores based on the similarity of the two feature vectors, e.g., based on the cosine distance.
  • Alternatively, the similarity score for two aspects can be calculated by comparing the paths (e.g., web addresses, file paths) of the search results for each aspect, for example, by parsing the text of the paths and extracting features, such as a domain name or directory in a file system, and then comparing the extracted features. The similarity score for two aspects can also be calculated by comparing the text of the aspects themselves, for example, by comparing the characters in the text of the two aspects.
  • Once the similarity scores for each pair of aspects are identified, the similarity scores can be used to identify candidate aspects that should be combined into a single aspect. Various clustering techniques can be used to determine when two candidate aspects should be combined. For example, a graph partition algorithm can be used. The graph partition algorithm creates a graph where the nodes of the graph are the aspects and an edge connects two nodes if they are sufficiently similar (e.g., if their similarity score exceeds a threshold). For example, in FIG. 3, there is an edge (indicated by a solid line) between Aspect 1 and Aspect 1′, because the similarity score between Aspect 1 and Aspect 1′ is greater than the threshold value. However, there are no other connected edges in the graph. The threshold value can be determined empirically, for example, based on a set of test aspects. The graph partition algorithm then combines aspects that are connected into a single aspect. For example, in FIG. 3, the resulting set of aspects 316 lists only Aspect 1, Aspect 2, and Aspect 3. Aspect 1′ has been combined with Aspect 1.
  • Combining two aspects can include keeping one aspect in the group of aspects and removing the other one from the group of aspects. The decision of which aspect to keep can be made, for example, by selecting the aspect with the highest popularity score. Aspect popularity scores are discussed in more detail below.
  • Other clustering techniques can be used, for example, k-means clustering (where aspects are divided into a pre-defined number of clusters based on the similarity scores), spectral clustering, hierarchical clustering, and star-clustering.
  • The candidate aspects can be grouped based on their classes. Aspect classes can be determined much as entity classes are determined, for example, as described above. In some implementations, determining an aspect class includes determining a synonym for the aspect, and then determining the synonym's class. For example, “New York University” is frequently abbreviated as “NYU.” However, it may be difficult to determine an aspect class for “NYU,” for example, because many knowledge bases only classify one of the possible names for a given entity. Therefore, there may be no data on which to base a classification of “NYU.” However, the more formal “New York University” is more likely to be included in knowledge bases. Therefore, a class for “NYU” can be determined by associating “NYU” with its synonym “New York University” and then identifying a class for the synonym. Synonyms can be determined, for example, by looking the aspect up in a thesaurus or a dictionary. Synonyms can also be determined, for example, by using redirect web pages of a knowledge base such as Wikipedia™. The redirect pages indicate the mapping of various terms to a synonym that is classified by Wikipedia™.
  • Aspects can be different from a similarity score perspective but still related in the sense that they belong to the same class. When this occurs, the aspects can be grouped into the same class. For example, the aspects “New York,” “San Francisco,” and “Washington DC” are different because they point to different cities with different food, culture, streets, etc., yet can all be associated with the class “U.S. cities.” Thus, the aspects can be grouped into the class “U.S. cities.” In some implementations, aspects are grouped into a sub-class of their class. For example, “New York” and “Washington DC” are members of the class “U.S. cities” and the sub-class “East coast cities.” Therefore, they could alternatively be grouped together into “East coast cities.”
  • FIG. 4 illustrates an example of grouping aspects based on their aspect classes. A group of aspects 402 is each associated with a respective class. Aspect 1 and Aspect 3 are both in Class 1, while Aspect 2 is in Class 2. When the aspects are grouped based on their class, the new group of aspects 404 includes Aspect 2 and Class 1. Aspect 2 remains unchanged in the new group of aspects 404, because its class did not match the class of any other aspects. Aspect 1 and Aspect 3 were combined into a new aspect equal to their class, Class 1, because they had the same class.
  • In some implementations, some aspects are associated with multiple classes. Determining a class for these ambiguous aspects can be problematic. For example, imagine an entity “Vietnam” and two aspects “food” and “history.” Both of these aspects are ambiguous. In addition to referring to something you can eat, “food” could refer to the “F.O.O.D.” music album. In addition to referring to something in the past, “history” could refer to the “HIStory: Past, Present and Future, Book 1” music album. Thus, the two ambiguous aspects could be classified as “album,” and then grouped together into an “album” aspect. Food and history are two distinct aspects for exploring Vietnam, and there is value in keeping them separate. Therefore, they should not be grouped together. In some implementations, ambiguous aspects are not grouped, in order to avoid this potential problem.
  • Ambiguous aspects can be identified, for example, by using a disambiguation database that identifies aspects with multiple meanings Ambiguous aspects can also be identified, for example, by using disambiguation web pages of a website such as Wikipedia™. These disambiguation pages identify multiple meanings for a given aspect.
  • In some implementations, once the modified group of candidate aspects is determined, the group is filtered, for example, to remove potentially offensive aspects (e.g., porn filtering). This filtering can be done by comparing the aspects to a list of potentially offensive aspects, and removing any aspects that are on the list.
  • As shown in FIG. 2, the system ranks one or more of the candidate aspects for the entity (step 280). The candidate aspects are ranked based on a diversity score and a popularity score of each aspect. The goal of the ranking is to identify aspects that are both interesting to the user and diverse enough to give a user choices on where to next direct his or her search. Ranking can be performed as follows.
  • The highest ranked aspect is the aspect with a highest popularity score. The popularity score is a measure of how common the aspect is. Popularity scores can be calculated in various ways depending on how the aspect was generated.
  • When the aspect was generated as a query refinement, the popularity score can be based on the frequency with which the query refinement appears, for example, by taking the total number of sessions that the query refinement appears in and dividing by the total number of sessions.
  • For example, a popularity score pr(qj|q) of a refinement qj of a query q can be calculated as follows:
  • p r ( q j | q ) = fq ( q j ) Σ j fq ( q j ) ,
  • where fq(qj) is the frequency with which the query refinement qj appears in the user search histories.
  • When the aspect was generated as a query super-string, the popularity score can be based on the frequency with which the query super-string appears in the user search histories, for example, by taking the total number of times the super-string appears in the search histories and dividing that by the total number of query super-strings in the user search history plus the total number of times that a query for the entity appears in the user search histories.
  • For example, the popularity score pss(qj|q) for a given query super string qj can be calculated as follows:
  • p ss ( q j | q ) = fq ( q j ) fq ( q ) + Σ j fq ( q j ) ,
  • where fq(qj) is the frequency with which the super-string query qj appears in the search histories, and fq(q) is the frequency with which the query for the entity appears in the search histories.
  • The popularity score can also be calculated by dividing the total number of times the super-string appears in the search histories by the total number of query super-strings in the search histories, for example:
  • p ss ( q j | q ) = fq ( q j ) Σ j fq ( q j )
  • When a query refinement and a query super-string are both identified as candidate aspects, the two can be combined into a single aspect. The popularity score for that aspect can be determined in a number of ways, including, for example, taking the higher of the two scores, taking the average of the two scores, or taking the lower of the two scores.
  • For example, the score pinst(qj|q) for an aspect associated with given query qj which is both a query refinement and a query super-string can be calculated as follows:

  • p inst(q j |q)=max(p(q j |q),p ss(q j |q)).
  • When an aspect is identified by analyzing query log data for other class member entities in the same class as the entity, the popularity score can be generated for the aspect as described above, e.g.,

  • p inst(a i |q)=max(p(a i |q),p ss(a i |q))
  • The popularity score for class-based aspects can be adjusted so that aspects associated with the class do not overwhelm aspects associated with the specific entity. Rarer entities require the class-based aspects in order to have a sufficient number and variety of aspects. However, more popular entities may have entity-based aspects that are more important than the class-based aspects. A balance can be struck by weighting the scores of the aspects.
  • For example, a candidate aspect ai, of a query q which contains an entity of class C can be assigned a weighted score p(ai|q) as follows:
  • p ( a i | q ) = p inst ( a i | q ) + K × p class ( a i | C ) Σ j p inst ( a j | q ) + K × Σ j p class ( a j | C ) ,
  • where K is a design parameter controlling the relative importance of the individual score of the aspect and the class score and can be determined empirically and
  • p class ( a | C ) = Count ( a ) C ,
  • where count(a) is the number of queries in the query log that included the aspect a, and |C| is the number of entities in the class C.
  • The popularity score for a class-based candidate aspect can also reflect how close the entity is to the class member the aspect is based on, e.g., from a time or space or other perspective. For example, if the entity is “November,” an aspect based on the class member “December” might have a better score than an aspect based on the class member “May,” because November is closer to December than May in the order of months. As another example, if the entity is “San Francisco,” an aspect based on “Los Angeles” might have a better score than an aspect based on “New York,” because San Francisco is closer to Los Angeles than New York from a distance perspective.
  • Other popularity scores are also envisioned. For example, the popularity score can be based on the click through rate for a given aspect, e.g., the number of times users selected a search result after issuing a query for the aspect (or the entity and the aspect), divided by the total number of times users issued queries for the aspect. The popularity score can also be based on the dwell time associated with one or more of the search results corresponding to a query for the aspect or the aspect and the entity. Dwell time is the amount of time a user spends viewing a search result. Dwell time can be a continuous number, such as the number of seconds a user spends viewing a search result, or it can be a discrete interval, for example “short clicks” corresponding to clicks of less than thirty seconds, “medium clicks” corresponding to clicks of more than thirty seconds but less than one minute, and “long clicks” corresponding to clicks of more than one minute. In some implementations, a longer dwell time of one or more results is associated with a higher popularity score. The score is higher because users found the results with a longer dwell time useful enough to view for a longer period of time.
  • Once the first aspect is ranked, subsequent aspects are ranked based on their popularity scores and a diversity score, e.g., a measure of how similar they are to the already ranked aspects. The diversity score for an un-ranked aspect can be generated, for example, by calculating a similarity score between the un-ranked aspect and each ranked aspect, and then taking the minimum, maximum, or average of the scores.
  • FIG. 5 illustrates an example of ranking an unranked aspect 502, given a pre-existing group of one or more ranked aspects 508.
  • A popularity score 506 is generated for the unranked aspect 502 using a popularity score generator 504. The popularity score generator generates a popularity score for the aspect, for example, as described above. A diversity score 512 is then generated for the unranked aspect 502 by the diversity score generator 510. The diversity score 512 is an estimate of how similar the unranked aspect 502 is to the ranked aspects 508. The diversity score between the unranked aspect 502 and the set of ranked aspects 508 can be determined by calculating the similarity score between the unranked aspect 502 and each ranked aspect in the set 508, for example as described above, and then using the minimum, maximum, average, or sum of the scores as the diversity score.
  • Once the popularity score 506 and the diversity score 512 are generated, they are passed to an overall score generator 514. The overall score generator 514 generates an overall score 516 based on the popularity score 506 and the diversity score 612, for example, by dividing the popularity score 506 by the diversity score 512.
  • Other methods of ranking the candidate aspects are also envisioned. For example, the highest ranked candidate aspect can be chosen based on the popularity score, and all subsequent aspects can be chosen based on the diversity score (for example, by choosing the aspect with the lowest diversity score). The candidate aspects can also be ranked based just on their popularity scores or just on their diversity scores.
  • Returning to FIG. 2, the system then associates a number of the highest ranked candidate aspects with the entity, or the entity and its properties (step 210). Any number of candidate aspects can be associated with the entity (and its properties), based on the needs of the system and the storage capabilities of the system. For example, if the system will present the aspects to the user in a graphical environment where only a few aspects can be displayed at a time, the number of aspects may be small. In contrast, if the system might provide a large number of aspects to a user or process, the number of candidate aspects may be larger.
  • Once the number of highest ranked candidate aspects are associated with the entity (and its properties), the association is stored in a location accessible to the system, for example, in a database that associates a given entity with its aspects.
  • FIG. 6 illustrates an example method 600 for receiving a query including one or more terms corresponding to an entity and presenting search results based on the identified aspects of the entity. For convenience, the example method 600 will be described with reference to a system (e.g., the search system 114 of FIG. 1 or another system) that performs the method 600. The method can be performed in conjunction with the method described above in reference to FIG. 2.
  • The system receives a query including one or more terms corresponding to an entity (step 602). The query can be received, for example, from a user or from the search system 114. In some implementations, the system and the search system 114 are the same system.
  • The system identifies aspects associated with the entity (step 604). In some implementations, the query includes an entity and its properties, and the system can identify aspects associated with the entity and its properties. For example, if the query is “Hawaii vacation” then “Hawaii” could be identified as the entity, and “vacation” could be identified as a property of the entity “Hawaii.” The aspects can be identified as described above in reference to FIG. 2, or can be retrieved, for example, from a database including ranked aspects generated using the method described above in reference to FIG. 2. The system can identify all aspects associated with the entity. When the aspects are ranked, the system can alternatively identify a top k number of the ranked aspects, where k is the number of aspects that are going to be presented to the user.
  • The system receives one or more sets of search results (step 606). Each set of search results corresponds to an entity and one of the identified aspects. For example, if the entity was “Hawaii” and the identified aspects were “beaches,” “hotels,” “weather,” and “food,” separate sets of search results could be received for “Hawaii beaches,” “Hawaii hotels,” “Hawaii weather,” and “Hawaii food.” The search results can be received in response to a query issued to the search engine 130 for the entity and an aspect.
  • The system presents the search results based on the identified aspects (step 608). In some implementations, the search results are presented in a “mashup,” where relevant results and other information for one or more of the aspects are presented in one display, organized according to aspect.
  • FIG. 7 illustrates an example mashup displayed after a user submits a search query 702 for “mount bachelor” by clicking on the search button 704. Search results and other information corresponding to aspects for Mount Bachelor (e.g., “weather,” “hotels” “community college” and “mountains”) are labeled in accordance with the aspect and presented to the user in the boxes 706, 708, 710, and 712. The presentation of information can be tailored to the aspect. For example, a ski and snow report is presented in box 706 for users interested in the “weather” aspect. Search results corresponding to “hotels” are presented in box 708, search results corresponding to “community college” are presented in box 710, and search results corresponding to “mountains” are presented in box 712.
  • As FIG. 7 illustrates, all search results for a given aspect are not necessarily presented for that aspect. For example, more search results for the “hotels” aspect than the two search results that are presented can be received. The search results that are presented are chosen from the search results that are received, for example, by taking a top number of search results based on a ranking of the search results (e.g., a ranking provided by the search system 114). The number can be determined, for example, based on the number of aspects for the entity and/or the space available for presentation of the search results. Search results do not have to be presented for all identified aspects.
  • In some implementations, a summary of the entity in accordance with one of the aspects is presented. A summary of an entity in accordance with an aspect is a direct presentation of information that is available through search results corresponding to the entity and the aspect. For example, the ski and snow report presented in box 706 is a summary of information for the entity “mount bachelor” and the aspect “weather.” A user interested in the “weather” aspect is likely interested in knowing the current weather, so rather than requiring the user to click on a search result to see weather information, the system can instead directly present information on the weather. As another example, if the entity is “University of Southern California football team” and the aspect is “season record,” a summary of the team's season record can be presented. As yet another example, if the entity is a particular movie, and the aspect is movie reviews, then multiple reviews can be presented side by side. In some implementations, the summary is associated with an aspect and an entity in advance and stored, for example, in a database. The system can then retrieve the summary when needed.
  • Other methods of presenting the search results based on the aspects are also envisioned. For example, the system can create a separate web page for the search results corresponding to each aspect. Links to the web pages corresponding to the identified aspects can be presented along with search results for the original query. Alternatively, the links to the web pages can be presented as a separate web page. The system can present the aspects as “related search” options for the user, and then present the search results corresponding to a given aspect once a user selects the aspect.
  • In some implementations, the query includes terms corresponding to multiple entities. When the query includes multiple entities, the system can identify the aspects associated with each query, and then combine the identified aspects based on their rank (e.g., based on a popularity score and a diversity score of each aspect). Search results for the top ranked aspects can then be received and presented to the user. Alternatively, the system can present search results for the aspects corresponding to each of the entities separately.
  • In some implementations, the system receives search results corresponding to the entity, rather than the entity and an aspect (for example, from the search system 114). In these implementations, the search results can be grouped based on the aspects, for example, by sorting the search results based on the aspects, or using clustering techniques to cluster search results around the aspects. In these implementations, the search results can be presented based on the aspects as described above.
  • FIG. 8 illustrates an example architecture of a system 800. The system generally includes a data processing apparatus 802 and a user device 828. The data processing apparatus 802 and user device 828 are connected through a network 826. In some implementations, the user device 828 and the data processing apparatus 802 are the same device.
  • While the data processing apparatus in 802 is shown as a single data processing apparatus, a plurality of data processing apparatus may be used. The data processing apparatus 802 runs a number of modules, for example, processes, e.g. executable software programs. In various implementations, these processes include an entity-class associator 804, aspect generator 806, aspect combiner 808, aspect grouper 810, aspect ranker 812, and aspect associator 814.
  • The entity-class associator 804 associates a given entity with a class, for example, based on a pre-defined database that associates entities with classes or by accessing knowledgebase information for the entity.
  • The aspect generator 806 generates aspects for a given entity, for example, as described above in reference to FIG. 2, by analyzing user search histories to identify query refinements and query superstrings for the entity, its class members, or the entity and its class members.
  • The aspect combiner 808 combines aspects, for example, as described above in reference to FIGS. 2 and 3, based on their similarity scores. The aspect combiner 808 may also calculate similarity scores for pairs of aspects as described above in reference to FIGS. 2 and 3.
  • The aspect grouper 810 groups aspects based on their class, for example, as described above in reference to FIGS. 2 and 4. In some implementations, the aspect combiner 808 and the aspect grouper 810 are the same process.
  • The aspect ranker 812 ranks aspects based on a popularity score and a diversity score of each aspect, for example, as described above in reference to FIGS. 2 and 5.
  • The aspect associator 814 associates one or more aspects with a given entity or a given entity and its properties, for example, as described above in reference to FIG. 2.
  • In some implementations, the data processing apparatus 802 stores one or more of an entity-class database associating a given entity with its class, an aspect-class database associating a given aspect with its class, user search histories, and an entity-aspect database associating a given entity with one or more aspects. In some implementations, the entity-class database and the aspect-class database are the same database. In some implementations, the data is stored on a computer readable medium 820. In some implementations, the data is stored on the additional device(s) 818.
  • The data processing apparatus 802 may also have hardware or firmware devices including one or more processors 816, one or more additional devices 818, computer readable medium 820, a communication interface 822, and one or more user interface devices 824. The processor(s) 816 are capable of processing instructions for execution. In one implementation, at least one of the processor(s) 816 is a single-threaded processor. In another implementation, at least one of the processor(s) 816 is a multi-threaded processor. The processor(s) 816 are capable of processing instructions stored in memory or on a storage device to display graphical information for a user interface on the user interface device(s) 824. User interface device(s) 824 can include, for example, a display, a camera, a speaker, a microphone, or a tactile feedback device.
  • The data processing apparatus 802 communicates with user device 828 using its communication interface 822.
  • The user device 828 can be any data processing apparatus, for example, a user's computer. A user uses the user device 828 to submit search queries through the network 826 to the data processing apparatus 802 and receive search results from the data processing apparatus 802, for example, through a web-browser run on the user device, for example, Firefox™, available from the Mozilla Project in Mountain View, Calif. The user device 828 may present the search results to the user, for example, by displaying the results on a display device, transmitting sound corresponding to the results, or providing tactile feedback corresponding to the results. The search results may be organized according to aspects associated with the entity. When a user uses his or her computer to select a search result to view, information regarding the user selection can be sent to the data processing apparatus 802 and used to generate user search history data.
  • In some implementations, the user device 828 runs one or more of the modules 804, 806, 808, 810, 812, and 814 instead of or in addition to the data processing apparatus 802 running the modules.
  • While the system 800 of FIG. 8 envisions a user who submits a search query through his or her computer, the search query does not have to be received from a user or a user's computer, but can be received from any data processing apparatus, process, or person, for example a computer or a process run on a computer, with or without direct user input. Similarly, the results and aspects do not have to be presented to the user's computer but can be presented to any data processing apparatus, process, or person. The user search histories can be received from a population of users, and not necessarily from the same user device 828 used to receive search results organized based on aspects of an entity in the search query.
  • Embodiments of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a computer storage media for execution by, or to control the operation of, data processing apparatus.
  • Alternatively or in addition, the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. The computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them.
  • The operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
  • The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or combinations of them. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, e.g., a virtual machine, or a combination of one or more of them. The apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.
  • A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
  • The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
  • Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described is this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
  • The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • While this specification contains many specifics, these should not be construed as limitations on the scope of the invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of the invention. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
  • Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
  • Thus, particular embodiments of the invention have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results.

Claims (19)

What is claimed is:
1. A method comprising:
receiving a query in a computer system, the computer system comprising one or more computers and the query including one or more terms corresponding to an entity;
identifying, by the computer system, a plurality of aspects associated with the entity in at least one database;
identifying, by the computer system, a plurality of search results, the search results including a first set of the search results based on the entity and a first aspect of the aspects and a second set of the search results based on the entity and a second aspect of the aspects; and
providing, in response to the query, a presentation of the search results in one display, the presentation including a plurality of visually distinct aspect areas with each of the aspect areas being for a corresponding one of the aspects and including a corresponding label, wherein providing the presentation of the search results comprises:
presenting at least one of the search results of the first set in a first aspect area of the aspect areas, the first aspect area corresponding to the first aspect; and
presenting at least one of the search results of the second set in a second aspect area of the aspect areas, the second aspect area corresponding to the second aspect.
2. The method of claim 1, wherein a first search result of the first search results is provided in the first aspect area and wherein the first search result is a summary of information about the entity in accordance with the first aspect.
3. The method of claim 1, wherein the search results further include a third set of search results responsive to the query and providing the presentation of the search results further comprises presenting at least some of the third set of search results in the one display.
4. The method of claim 3, wherein identifying the first set of search results comprises receiving the first set of search results in response to issuing a first aspect query that is based on the entity and the first aspect and wherein identifying the second set of search results comprises receiving the second set of search results in response to issuing a second aspect query that is based on the entity and the second aspect.
5. The method of claim 4, wherein the first aspect query includes the one or more terms corresponding to the entity and at least one first aspect term corresponding to the first aspect and wherein the second aspect query includes the one or more terms corresponding to the entity and at least second aspect term corresponding to the second aspect.
6. The method of claim 1, wherein the search results of the first set that are presented in the first aspect area include a first search result that includes a first link to a first web page.
7. The method of claim 1, further comprising:
generating the association of the plurality of aspects to the entity in the database.
8. The method of claim 7, wherein generating the association of the plurality of aspects to the entity in the database comprises:
generating a group of candidate aspects for the entity;
for each pair of one or more pairs of candidate aspects, calculating a similarity score for the pair based on identifying respective aspect sets of search results corresponding to respective queries of candidate aspects in the pair of candidate aspects and comparing the aspect sets of search results;
modifying the group of candidate aspects to generate a group of modified candidate aspects based on the similarity score for the candidate aspects; and
selecting one or more of the modified candidate aspects as the aspects to associate with the entity in the database.
9. The method of claim 8, wherein selecting one or more of the modified candidate aspects as the aspects to associate with the entity in the database comprises:
ranking the modified candidate aspects based on a diversity score and a popularity score; and
selecting the one or more of the modified candidate aspects as the aspects to associate with the entity in the database based on the ranking.
10. A system comprising:
one or more processors; and
a computer storage medium including instructions, which, when executed by the processors, cause the processors to perform operations comprising:
receiving a query in a computer system, the computer system comprising one or more computers and the query including one or more terms corresponding to an entity;
identifying, by the computer system, a plurality of aspects associated with the entity in at least one database;
identifying, by the computer system, a plurality of search results, the search results including a first set of the search results based on the entity and a first aspect of the aspects and a second set of the search results based on the entity and a second aspect of the aspects; and
providing, in response to the query, a presentation of the search results in one display, the presentation including a plurality of visually distinct aspect areas with each of the aspect areas being for a corresponding one of the aspects and including a corresponding label, wherein providing the presentation of the search results comprises:
presenting at least one of the search results of the first set in a first aspect area of the aspect areas, the first aspect area corresponding to the first aspect; and
presenting at least one of the search results of the second set in a second aspect area of the aspect areas, the second aspect area corresponding to the second aspect.
11. The system of claim 10, wherein a first search result of the first search results is provided in the first aspect area and wherein the first search result is a summary of information about the entity in accordance with the first aspect.
12. The system of claim 10, wherein the search results further include a third set of search results responsive to the query and providing the presentation of the search results further comprises presenting at least some of the third set of search results in the one display.
13. The system of claim 12, wherein identifying the first set of search results comprises receiving the first set of search results in response to issuing a first aspect query that is based on the entity and the first aspect and wherein identifying the second set of search results comprises receiving the second set of search results in response to issuing a second aspect query that is based on the entity and the second aspect.
14. The system of claim 13, wherein the first aspect query includes the one or more terms corresponding to the entity and at least one first aspect term corresponding to the first aspect and wherein the second aspect query includes the one or more terms corresponding to the entity and at least second aspect term corresponding to the second aspect.
15. The system of claim 10, wherein the search results of the first set that are presented in the first aspect area include a first search result that includes a first link to a first web page.
16. The system of claim 10, wherein the instructions further include instructions that, when executed by the processors, cause the processors to perform an operation comprising:
generating the association of the plurality of aspects to the entity in the database.
17. The system of claim 16, wherein generating the association of the plurality of aspects to the entity in the database comprises:
generating a group of candidate aspects for the entity;
for each pair of one or more pairs of candidate aspects, calculating a similarity score for the pair based on identifying respective aspect sets of search results corresponding to respective queries of candidate aspects in the pair of candidate aspects and comparing the aspect sets of search results;
modifying the group of candidate aspects to generate a group of modified candidate aspects based on the similarity score for the candidate aspects; and
selecting one or more of the modified candidate aspects as the aspects to associate with the entity in the database.
18. The system of claim 17, wherein selecting one or more of the modified candidate aspects as the aspects to associate with the entity in the database comprises:
ranking the modified candidate aspects based on a diversity score and a popularity score; and
selecting the one or more of the modified candidate aspects as the aspects to associate with the entity in the database based on the ranking.
19. A non-transitory computer storage device comprising instructions that when executed by an apparatus cause the apparatus to perform operations comprising:
receiving a query in a computer system, the computer system comprising one or more computers and the query including one or more terms corresponding to an entity;
identifying, by the computer system, a plurality of aspects associated with the entity in at least one database;
identifying, by the computer system, a plurality of search results, the search results including a first set of the search results based on the entity and a first aspect of the aspects and a second set of the search results based on the entity and a second aspect of the aspects; and
providing, in response to the query, a presentation of the search results in one display, the presentation including a plurality of visually distinct aspect areas with each of the aspect areas being for a corresponding one of the aspects and including a corresponding label, wherein providing the presentation of the search results comprises:
presenting at least one of the search results of the first set in a first aspect area of the aspect areas, the first aspect area corresponding to the first aspect; and
presenting at least one of the search results of the second set in a second aspect area of the aspect areas, the second aspect area corresponding to the second aspect
US14/875,177 2009-01-30 2015-10-05 Identifying query aspects Abandoned US20160026696A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/875,177 US20160026696A1 (en) 2009-01-30 2015-10-05 Identifying query aspects

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US14889709P 2009-01-30 2009-01-30
US12/512,908 US8458171B2 (en) 2009-01-30 2009-07-30 Identifying query aspects
US13/908,456 US9152676B2 (en) 2009-01-30 2013-06-03 Identifying query aspects
US14/875,177 US20160026696A1 (en) 2009-01-30 2015-10-05 Identifying query aspects

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/908,456 Continuation US9152676B2 (en) 2009-01-30 2013-06-03 Identifying query aspects

Publications (1)

Publication Number Publication Date
US20160026696A1 true US20160026696A1 (en) 2016-01-28

Family

ID=42132655

Family Applications (3)

Application Number Title Priority Date Filing Date
US12/512,908 Active 2030-08-06 US8458171B2 (en) 2009-01-30 2009-07-30 Identifying query aspects
US13/908,456 Active 2029-08-18 US9152676B2 (en) 2009-01-30 2013-06-03 Identifying query aspects
US14/875,177 Abandoned US20160026696A1 (en) 2009-01-30 2015-10-05 Identifying query aspects

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US12/512,908 Active 2030-08-06 US8458171B2 (en) 2009-01-30 2009-07-30 Identifying query aspects
US13/908,456 Active 2029-08-18 US9152676B2 (en) 2009-01-30 2013-06-03 Identifying query aspects

Country Status (9)

Country Link
US (3) US8458171B2 (en)
EP (1) EP2391959A1 (en)
JP (1) JP5623431B2 (en)
KR (2) KR101775061B1 (en)
CN (1) CN102349072B (en)
AU (1) AU2010208318B2 (en)
BR (1) BRPI1007939B1 (en)
CA (1) CA2751172C (en)
WO (1) WO2010088299A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180074891A1 (en) * 2016-09-13 2018-03-15 Sandisk Technologies Llc Storage System and Method for Reducing XOR Recovery Time
CN107832439A (en) * 2017-11-16 2018-03-23 百度在线网络技术(北京)有限公司 Method, system and the terminal device of more wheel state trackings
US11003731B2 (en) * 2018-01-17 2021-05-11 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for generating information
US11036746B2 (en) 2018-03-01 2021-06-15 Ebay Inc. Enhanced search system for automatic detection of dominant object of search query

Families Citing this family (73)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8577909B1 (en) 2009-05-15 2013-11-05 Google Inc. Query translation using bilingual search refinements
US8572109B1 (en) 2009-05-15 2013-10-29 Google Inc. Query translation quality confidence
US8577910B1 (en) 2009-05-15 2013-11-05 Google Inc. Selecting relevant languages for query translation
US8538957B1 (en) * 2009-06-03 2013-09-17 Google Inc. Validating translations using visual similarity between visual media search results
US9454606B2 (en) * 2009-09-11 2016-09-27 Lexisnexis Risk & Information Analytics Group Inc. Technique for providing supplemental internet search criteria
US20110270819A1 (en) * 2010-04-30 2011-11-03 Microsoft Corporation Context-aware query classification
US7933859B1 (en) 2010-05-25 2011-04-26 Recommind, Inc. Systems and methods for predictive coding
US20110307482A1 (en) * 2010-06-10 2011-12-15 Microsoft Corporation Search result driven query intent identification
US9158846B2 (en) 2010-06-10 2015-10-13 Microsoft Technology Licensing, Llc Entity detection and extraction for entity cards
US9043296B2 (en) 2010-07-30 2015-05-26 Microsoft Technology Licensing, Llc System of providing suggestions based on accessible and contextual information
US8799260B2 (en) * 2010-12-17 2014-08-05 Yahoo! Inc. Method and system for generating web pages for topics unassociated with a dominant URL
US9684690B2 (en) 2011-01-12 2017-06-20 Google Inc. Flights search
US9781091B2 (en) 2011-03-14 2017-10-03 Verisign, Inc. Provisioning for smart navigation services
US10185741B2 (en) * 2011-03-14 2019-01-22 Verisign, Inc. Smart navigation services
US9811599B2 (en) 2011-03-14 2017-11-07 Verisign, Inc. Methods and systems for providing content provider-specified URL keyword navigation
US9646100B2 (en) 2011-03-14 2017-05-09 Verisign, Inc. Methods and systems for providing content provider-specified URL keyword navigation
US9298776B2 (en) 2011-06-08 2016-03-29 Ebay Inc. System and method for mining category aspect information
US9298816B2 (en) 2011-07-22 2016-03-29 Open Text S.A. Methods, systems, and computer-readable media for semantically enriching content and for semantic navigation
CA2844065C (en) 2011-08-04 2018-04-03 Google Inc. Providing knowledge panels with search results
US8756218B1 (en) * 2011-08-16 2014-06-17 Google Inc. Query classification based on search engine results
EP2568396A1 (en) 2011-09-08 2013-03-13 Axel Springer Digital TV Guide GmbH Method and apparatus for generating a sorted list of items
US9053087B2 (en) * 2011-09-23 2015-06-09 Microsoft Technology Licensing, Llc Automatic semantic evaluation of speech recognition results
US20130110830A1 (en) * 2011-10-31 2013-05-02 Microsoft Corporation Ranking of entity properties and relationships
US9665643B2 (en) 2011-12-30 2017-05-30 Microsoft Technology Licensing, Llc Knowledge-based entity detection and disambiguation
US9864817B2 (en) 2012-01-28 2018-01-09 Microsoft Technology Licensing, Llc Determination of relationships between collections of disparate media types
WO2013126808A1 (en) 2012-02-22 2013-08-29 Google Inc. Related entities
US9424353B2 (en) 2012-02-22 2016-08-23 Google Inc. Related entities
US20140047089A1 (en) * 2012-08-10 2014-02-13 International Business Machines Corporation System and method for supervised network clustering
US8533148B1 (en) * 2012-10-01 2013-09-10 Recommind, Inc. Document relevancy analysis within machine learning systems including determining closest cosine distances of training examples
US9430571B1 (en) 2012-10-24 2016-08-30 Google Inc. Generating travel queries in response to free text queries
US9047278B1 (en) 2012-11-09 2015-06-02 Google Inc. Identifying and ranking attributes of entities
US10095692B2 (en) * 2012-11-29 2018-10-09 Thornson Reuters Global Resources Unlimited Company Template bootstrapping for domain-adaptable natural language generation
US9558275B2 (en) * 2012-12-13 2017-01-31 Microsoft Technology Licensing, Llc Action broker
US20140201203A1 (en) * 2013-01-15 2014-07-17 Prafulla Krishna System, method and device for providing an automated electronic researcher
GB2510346A (en) * 2013-01-30 2014-08-06 Imagini Holdings Ltd Network method and apparatus redirects a request for content based on a user profile.
US9183062B2 (en) * 2013-02-25 2015-11-10 International Business Machines Corporation Automated application reconfiguration
US10061851B1 (en) * 2013-03-12 2018-08-28 Google Llc Encouraging inline person-to-person interaction
JP6056610B2 (en) * 2013-03-29 2017-01-11 株式会社Jvcケンウッド Text information processing apparatus, text information processing method, and text information processing program
US10057207B2 (en) 2013-04-07 2018-08-21 Verisign, Inc. Smart navigation for shortened URLs
CN103279504B (en) * 2013-05-10 2019-11-05 百度在线网络技术(北京)有限公司 A kind of searching method and device based on ambiguity resolution
US9646062B2 (en) * 2013-06-10 2017-05-09 Microsoft Technology Licensing, Llc News results through query expansion
US9305307B2 (en) 2013-07-15 2016-04-05 Google Inc. Selecting content associated with a collection of entities
US9336332B2 (en) 2013-08-28 2016-05-10 Clipcard Inc. Programmatic data discovery platforms for computing applications
US9569525B2 (en) * 2013-09-17 2017-02-14 International Business Machines Corporation Techniques for entity-level technology recommendation
US20150088648A1 (en) * 2013-09-24 2015-03-26 Google Inc. Determining commercial intent
CN105706078B (en) * 2013-10-09 2021-08-03 谷歌有限责任公司 Automatic definition of entity collections
US10134053B2 (en) * 2013-11-19 2018-11-20 Excalibur Ip, Llc User engagement-based contextually-dependent automated pricing for non-guaranteed delivery
US9489461B2 (en) * 2014-03-03 2016-11-08 Ebay Inc. Search ranking diversity based on aspect affinity
US20150309987A1 (en) 2014-04-29 2015-10-29 Google Inc. Classification of Offensive Words
US20150317314A1 (en) * 2014-04-30 2015-11-05 Linkedln Corporation Content search vertical
US10838995B2 (en) * 2014-05-16 2020-11-17 Microsoft Technology Licensing, Llc Generating distinct entity names to facilitate entity disambiguation
US9740985B2 (en) * 2014-06-04 2017-08-22 International Business Machines Corporation Rating difficulty of questions
RU2014125471A (en) 2014-06-24 2015-12-27 Общество С Ограниченной Ответственностью "Яндекс" SEARCH QUERY PROCESSING METHOD AND SERVER
US10290125B2 (en) * 2014-07-02 2019-05-14 Microsoft Technology Licensing, Llc Constructing a graph that facilitates provision of exploratory suggestions
US10353964B2 (en) * 2014-09-15 2019-07-16 Google Llc Evaluating semantic interpretations of a search query
CN105786936A (en) * 2014-12-23 2016-07-20 阿里巴巴集团控股有限公司 Search data processing method and device
GB2549240A (en) * 2015-01-06 2017-10-18 What3Words Ltd A method for suggesting one or more multi-word candidates based on an input string received at an electronic device
CN104615680B (en) 2015-01-21 2016-11-02 广州神马移动信息科技有限公司 The method for building up of web page quality model and device
US20160314205A1 (en) * 2015-04-24 2016-10-27 Ebay Inc. Generating a discovery page depicting item aspects
US10140880B2 (en) * 2015-07-10 2018-11-27 Fujitsu Limited Ranking of segments of learning materials
US10242112B2 (en) 2015-07-15 2019-03-26 Google Llc Search result filters from resource content
US20170097967A1 (en) * 2015-10-05 2017-04-06 Quixey, Inc. Automated Customization of Display Component Data for Search Results
US10437868B2 (en) * 2016-03-04 2019-10-08 Microsoft Technology Licensing, Llc Providing images for search queries
US11314791B2 (en) * 2016-03-23 2022-04-26 Ebay Inc. Smart match autocomplete system
KR102017853B1 (en) * 2016-09-06 2019-09-03 주식회사 카카오 Method and apparatus for searching
US10268688B2 (en) * 2017-05-03 2019-04-23 International Business Machines Corporation Corpus-scoped annotation and analysis
CN108009215B (en) * 2017-11-17 2018-11-06 山东师范大学 A kind of search results pages user behavior pattern assessment method, apparatus and system
CN108614897B (en) * 2018-05-10 2021-04-27 四川长虹电器股份有限公司 Content diversification searching method for natural language
JP7003020B2 (en) * 2018-09-18 2022-01-20 ヤフー株式会社 Information processing equipment, information processing methods, and programs
CN109871428B (en) * 2019-01-30 2022-02-18 北京百度网讯科技有限公司 Method, apparatus, device and medium for determining text relevance
US11288320B2 (en) * 2019-06-05 2022-03-29 International Business Machines Corporation Methods and systems for providing suggestions to complete query sessions
CN111538894B (en) * 2020-06-19 2020-10-23 腾讯科技(深圳)有限公司 Query feedback method and device, computer equipment and storage medium
IL308086A (en) * 2021-05-24 2023-12-01 Liveperson Inc Data-driven taxonomy for annotation resolution

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050246328A1 (en) * 2004-04-30 2005-11-03 Microsoft Corporation Method and system for ranking documents of a search result to improve diversity and information richness
US20060031214A1 (en) * 2004-07-14 2006-02-09 Microsoft Corporation Method and system for adaptive categorial presentation of search results
US20060059143A1 (en) * 2004-09-10 2006-03-16 Eran Palmon User interface for conducting a search directed by a hierarchy-free set of topics
US20080091672A1 (en) * 2006-10-17 2008-04-17 Gloor Peter A Process for analyzing interrelationships between internet web sited based on an analysis of their relative centrality
US20080235187A1 (en) * 2007-03-23 2008-09-25 Microsoft Corporation Related search queries for a webpage and their applications
US20090070325A1 (en) * 2007-09-12 2009-03-12 Raefer Christopher Gabriel Identifying Information Related to a Particular Entity from Electronic Sources
US20090177644A1 (en) * 2008-01-04 2009-07-09 Ronald Martinez Systems and methods of mapping attention
US20090240672A1 (en) * 2008-03-18 2009-09-24 Cuill, Inc. Apparatus and method for displaying search results with a variety of display paradigms
US20090313220A1 (en) * 2008-06-13 2009-12-17 International Business Machines Corporation Expansion of Search Result Information
US20090327267A1 (en) * 2008-06-27 2009-12-31 Microsoft Corporation Basing search results on metadata of prior results
US20090327223A1 (en) * 2008-06-26 2009-12-31 Microsoft Corporation Query-driven web portals
US8103659B1 (en) * 2005-06-06 2012-01-24 A9.Com, Inc. Perspective-based item navigation
US8341143B1 (en) * 2004-09-02 2012-12-25 A9.Com, Inc. Multi-category searching
US8386453B2 (en) * 2004-09-30 2013-02-26 Google Inc. Providing search information relating to a document
US8417569B2 (en) * 2005-11-30 2013-04-09 John Nicholas and Kristin Gross Trust System and method of evaluating content based advertising
US8554768B2 (en) * 2008-11-25 2013-10-08 Microsoft Corporation Automatically showing additional relevant search results based on user feedback
US9092523B2 (en) * 2005-02-28 2015-07-28 Search Engine Technologies, Llc Methods of and systems for searching by incorporating user-entered information

Family Cites Families (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5278980A (en) * 1991-08-16 1994-01-11 Xerox Corporation Iterative technique for phrase query formation and an information retrieval system employing same
US6625595B1 (en) * 2000-07-05 2003-09-23 Bellsouth Intellectual Property Corporation Method and system for selectively presenting database results in an information retrieval system
US7185001B1 (en) * 2000-10-04 2007-02-27 Torch Concepts Systems and methods for document searching and organizing
SE520533C2 (en) * 2001-03-13 2003-07-22 Picsearch Ab Method, computer programs and systems for indexing digitized devices
US7676452B2 (en) * 2002-07-23 2010-03-09 International Business Machines Corporation Method and apparatus for search optimization based on generation of context focused queries
US6947930B2 (en) * 2003-03-21 2005-09-20 Overture Services, Inc. Systems and methods for interactive search query refinement
US7577655B2 (en) * 2003-09-16 2009-08-18 Google Inc. Systems and methods for improving the ranking of news articles
US7219105B2 (en) * 2003-09-17 2007-05-15 International Business Machines Corporation Method, system and computer program product for profiling entities
WO2005029362A1 (en) * 2003-09-22 2005-03-31 Eurekster, Inc. Enhanced search engine
US7617176B2 (en) * 2004-07-13 2009-11-10 Microsoft Corporation Query-based snippet clustering for search result grouping
WO2006014683A2 (en) 2004-07-21 2006-02-09 Glycofi, Inc. Immunoglobulins comprising predominantly a gal2glcnac2man3glcnac2 glycoform
CN1609859A (en) * 2004-11-26 2005-04-27 孙斌 Search result clustering method
US7739270B2 (en) 2004-12-07 2010-06-15 Microsoft Corporation Entity-specific tuned searching
US20060149710A1 (en) * 2004-12-30 2006-07-06 Ross Koningstein Associating features with entities, such as categories of web page documents, and/or weighting such features
US7870147B2 (en) * 2005-03-29 2011-01-11 Google Inc. Query revision using known highly-ranked queries
US7415461B1 (en) * 2005-08-03 2008-08-19 At&T Corp Apparatus and method for merging results of approximate matching operations
US7996396B2 (en) * 2006-03-28 2011-08-09 A9.Com, Inc. Identifying the items most relevant to a current query based on user activity with respect to the results of similar queries
JP4810609B2 (en) 2006-06-13 2011-11-09 マイクロソフト コーポレーション Search engine dashboard
US9396269B2 (en) 2006-06-28 2016-07-19 Microsoft Technology Licensing, Llc Search engine that identifies and uses social networks in communications, retrieval, and electronic commerce
US7624103B2 (en) * 2006-07-21 2009-11-24 Aol Llc Culturally relevant search results
US8510298B2 (en) * 2006-08-04 2013-08-13 Thefind, Inc. Method for relevancy ranking of products in online shopping
US20080215416A1 (en) * 2007-01-31 2008-09-04 Collarity, Inc. Searchable interactive internet advertisements
US20080243830A1 (en) * 2007-03-30 2008-10-02 Fatdoor, Inc. User suggested ordering to influence search result ranking
WO2009003050A2 (en) * 2007-06-26 2008-12-31 Endeca Technologies, Inc. System and method for measuring the quality of document sets
KR20090012467A (en) * 2007-07-30 2009-02-04 한국과학기술정보연구원 System and method for providing integrated search using uniform resource identifier database
US20090125502A1 (en) * 2007-11-13 2009-05-14 Yahoo! Inc. System and methods for generating diversified vertical search listings
US7769740B2 (en) * 2007-12-21 2010-08-03 Yahoo! Inc. Systems and methods of ranking attention
US20090254512A1 (en) * 2008-04-03 2009-10-08 Yahoo! Inc. Ad matching by augmenting a search query with knowledge obtained through search engine results
US7970808B2 (en) * 2008-05-05 2011-06-28 Microsoft Corporation Leveraging cross-document context to label entity
US8126908B2 (en) * 2008-05-07 2012-02-28 Yahoo! Inc. Creation and enrichment of search based taxonomy for finding information from semistructured data
US8024324B2 (en) * 2008-06-30 2011-09-20 International Business Machines Corporation Information retrieval with unified search using multiple facets
US9460212B2 (en) * 2008-12-03 2016-10-04 Paypal, Inc. System and method for personalized search
US8150813B2 (en) * 2008-12-18 2012-04-03 International Business Machines Corporation Using relationships in candidate discovery
US8315849B1 (en) * 2010-04-09 2012-11-20 Wal-Mart Stores, Inc. Selecting terms in a document

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050246328A1 (en) * 2004-04-30 2005-11-03 Microsoft Corporation Method and system for ranking documents of a search result to improve diversity and information richness
US20060031214A1 (en) * 2004-07-14 2006-02-09 Microsoft Corporation Method and system for adaptive categorial presentation of search results
US8341143B1 (en) * 2004-09-02 2012-12-25 A9.Com, Inc. Multi-category searching
US20060059143A1 (en) * 2004-09-10 2006-03-16 Eran Palmon User interface for conducting a search directed by a hierarchy-free set of topics
US8386453B2 (en) * 2004-09-30 2013-02-26 Google Inc. Providing search information relating to a document
US9092523B2 (en) * 2005-02-28 2015-07-28 Search Engine Technologies, Llc Methods of and systems for searching by incorporating user-entered information
US8103659B1 (en) * 2005-06-06 2012-01-24 A9.Com, Inc. Perspective-based item navigation
US8417569B2 (en) * 2005-11-30 2013-04-09 John Nicholas and Kristin Gross Trust System and method of evaluating content based advertising
US20080091672A1 (en) * 2006-10-17 2008-04-17 Gloor Peter A Process for analyzing interrelationships between internet web sited based on an analysis of their relative centrality
US20080235187A1 (en) * 2007-03-23 2008-09-25 Microsoft Corporation Related search queries for a webpage and their applications
US20090070325A1 (en) * 2007-09-12 2009-03-12 Raefer Christopher Gabriel Identifying Information Related to a Particular Entity from Electronic Sources
US20090177644A1 (en) * 2008-01-04 2009-07-09 Ronald Martinez Systems and methods of mapping attention
US20090240672A1 (en) * 2008-03-18 2009-09-24 Cuill, Inc. Apparatus and method for displaying search results with a variety of display paradigms
US20090313220A1 (en) * 2008-06-13 2009-12-17 International Business Machines Corporation Expansion of Search Result Information
US20090327223A1 (en) * 2008-06-26 2009-12-31 Microsoft Corporation Query-driven web portals
US20090327267A1 (en) * 2008-06-27 2009-12-31 Microsoft Corporation Basing search results on metadata of prior results
US8554768B2 (en) * 2008-11-25 2013-10-08 Microsoft Corporation Automatically showing additional relevant search results based on user feedback

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180074891A1 (en) * 2016-09-13 2018-03-15 Sandisk Technologies Llc Storage System and Method for Reducing XOR Recovery Time
CN107832439A (en) * 2017-11-16 2018-03-23 百度在线网络技术(北京)有限公司 Method, system and the terminal device of more wheel state trackings
US10664755B2 (en) 2017-11-16 2020-05-26 Baidu Online Network Technology (Beijing) Co., Ltd. Searching method and system based on multi-round inputs, and terminal
US11003731B2 (en) * 2018-01-17 2021-05-11 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for generating information
US11036746B2 (en) 2018-03-01 2021-06-15 Ebay Inc. Enhanced search system for automatic detection of dominant object of search query
US11829375B2 (en) 2018-03-01 2023-11-28 Ebay Inc. Enhanced search system for automatic detection of dominant object of search query

Also Published As

Publication number Publication date
US9152676B2 (en) 2015-10-06
KR20160123398A (en) 2016-10-25
BRPI1007939B1 (en) 2020-08-04
US8458171B2 (en) 2013-06-04
BRPI1007939A2 (en) 2016-02-23
CN102349072B (en) 2014-12-24
CA2751172A1 (en) 2010-08-05
KR101669191B1 (en) 2016-10-25
CA2751172C (en) 2020-07-07
KR101775061B1 (en) 2017-09-05
JP2012516512A (en) 2012-07-19
WO2010088299A1 (en) 2010-08-05
US20130268517A1 (en) 2013-10-10
CN102349072A (en) 2012-02-08
US20100198837A1 (en) 2010-08-05
AU2010208318B2 (en) 2015-03-05
AU2010208318A1 (en) 2011-08-18
KR20110139681A (en) 2011-12-29
JP5623431B2 (en) 2014-11-12
EP2391959A1 (en) 2011-12-07

Similar Documents

Publication Publication Date Title
US9152676B2 (en) Identifying query aspects
US20230205828A1 (en) Related entities
US8185526B2 (en) Dynamic keyword suggestion and image-search re-ranking
US8156120B2 (en) Information retrieval using user-generated metadata
US20100235311A1 (en) Question and answer search
US20150310099A1 (en) System And Method For Generating Labels To Characterize Message Content
US9864795B1 (en) Identifying entity attributes
Ojha et al. Metadata driven semantically aware medical query expansion
Nikas et al. Open domain question answering over knowledge graphs using keyword search, answer type prediction, SPARQL and pre-trained neural models
Kato et al. Query by analogical example: relational search using web search engine indices
Zhang et al. Semantic table retrieval using keyword and table queries
Chen et al. Learning to evaluate and recommend query in restaurant search systems
Zhang Smart Image Search System Using Personalized Semantic Search Method
Aletras Interpreting document collections with topic models
Mittal et al. ARAGOG semantic search engine: working, implementation and comparison with keyword-based search engines
Kumar Design of novelty detection techniques for optimized search engine results
Šimko et al. State-of-the-art: Semantics acquisition and crowdsourcing
Oliveirinha et al. Acquiring semantic context for events from online resources
Trani Improving the Efficiency and Effectiveness of Document Understanding in Web Search.
Vandamme et al. CROEQS: Contemporaneous Role Ontology-based Expanded Query Search-Implementation and Evaluation
Garg et al. ARAGOG SEMANTIC SEARCH ENGINE
Agarwal et al. Data Collection and Evaluation

Legal Events

Date Code Title Description
AS Assignment

Owner name: GOOGLE INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MADHAVAN, JAYANT;WU, FEI;HALEVY, ALON;REEL/FRAME:036738/0656

Effective date: 20091006

AS Assignment

Owner name: GOOGLE LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044129/0001

Effective date: 20170929

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION