WO2009095746A1 - Method to search for a user generated content web page - Google Patents

Method to search for a user generated content web page Download PDF

Info

Publication number
WO2009095746A1
WO2009095746A1 PCT/IB2008/051310 IB2008051310W WO2009095746A1 WO 2009095746 A1 WO2009095746 A1 WO 2009095746A1 IB 2008051310 W IB2008051310 W IB 2008051310W WO 2009095746 A1 WO2009095746 A1 WO 2009095746A1
Authority
WO
WIPO (PCT)
Prior art keywords
lexical units
expressions
subset
web page
lexical
Prior art date
Application number
PCT/IB2008/051310
Other languages
French (fr)
Inventor
Eric De Barry
Bertrand Wolf
Original Assignee
Alterbuzz
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alterbuzz filed Critical Alterbuzz
Priority to PCT/IB2008/051310 priority Critical patent/WO2009095746A1/en
Priority to EP08737749A priority patent/EP2245553A1/en
Publication of WO2009095746A1 publication Critical patent/WO2009095746A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • the present invention concerns a method to search for a user generated content web page and a software to practice the same.
  • the usual methods consists of choosing some words suited to the object of the search. These words are inputted into the query page of an internet engine such as those proposed by Google Inc. or Yahoo Inc.
  • the search engine lists a set of web pages by their title and a automatically generated short abstract. A link gives access to the web page.
  • Search engines contain some internal, and often secret, algorithms to sort the list of web pages and to show to the user the most pertinent, hopefully, web pages at the beginning of the list.
  • a method to search for an user generated content web page comprises
  • the method further comprises a preliminary step of creating a database of lexical units comprising at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions and the preparation of the set of lexical units comprises the assembly of at least one expression of each subset.
  • the method has the advantage to select preferentially web pages containing opinions about the selected matter.
  • the database comprises phonetic transcriptions and misspelled versions of said expressions; subsets of feeling expressions and action expressions are arranged as thesaurus in the database of lexical units;
  • each lexical units set of the list is inputted into the internet search engine, and the results for all sets are consolidated into a weighted list of the web pages ;
  • the weight of each web pages is a combination of the order of appearance of the web page in each result and the number of occurrence of the web page in all results .
  • a device to search for a web page comprises
  • means for consolidating results of said internet search engine wherein means for storing lexical units comprises a database of lexical units having at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions and means for inputting said set of lexical units is adapted to input set of lexical units comprising at least one expression of each subset.
  • means for storing lexical units comprises means for storing phonetic transcription and misspelled versions of said expressions .
  • a computer program product to search for a web page comprises program instructions to execute the steps of the hereabove method when the computer program product is executed on a computer.
  • Fig. 1 is a schematic view of a terminal connected to internet to practice an embodiment of the invention
  • FIG. 2 is a flowchart of a method according to an embodiment of the invention.
  • - Fig.3 is a functional view of a terminal practicing an embodiment of the invention.
  • a computer is connected to internet network 3. Through the network 3, the computer 1 is connected to a server 5 on which a search engine is running.
  • the server 5 symbolized the infrastructure of search companies such as Google Inc. or Yahoo Inc. In fact, these companies use server farms containing hundred of computers dispatched around the world.
  • a server 7 is also connected to the internet network 3 and contains a web page which is of interest for the user of the computer 1 but its address is not known by the computer 1.
  • the web page contains opinion on a product/service of interest for the user of the computer
  • I is a user generated content web page such as a blog, wiki or forum page.
  • the computer 1 is a classical personal computer. It comprises interface means such as a display 9, a keyboard
  • the storage means 15 contains a computer software product which, when executed by the processing means 17, makes the computer 1 execute the steps of a method to search for a web page according to an embodiment of the invention .
  • the method starts with the creation, step 20, of a database of lexical units to search for.
  • the database is stored in the storage means 15.
  • the database comprises at least three subsets of lexical units:
  • Feeling expressions mean lexical units which are related to the mood or feeling of a human being. For instance, words such as “trouble”, “happy/unhappy”, “unpleasant/pleasant”, etc. define a certain state of mind. Generally, they are the reasons for which a user has posted a message.
  • Action expressions mean lexical units which define action of a user such as "online reservation”, “booking”, “sales”, “offers”, etc.
  • Context expressions mean lexical units which are used to define the context or the specificity of the search. For instance, if the search concerns the comments of travellers having crossed the Channel, the context expressions includes terms like "ferry (ies)", "Channel
  • the subsets of feeling expressions and action expressions are structured in a form of thesaurus to allow a user to grab easily a set of words with similar meanings.
  • Each lexical unit is preferably stored in the database with all its lexical variations such as singular /plural, and with some misspelled forms.
  • the most usual misspelled forms are stored in the database as the searched web pages are edited by normal user with different cultural levels or who practices a foreign language. Therefore, it may be useful to be able to select pages ever if the words are misspelled.
  • a particular misspelled form which is often used by teenagers accustomed to the short messages of the mobile phone is the phonetic form.
  • a set of lexical units is prepared at step 22.
  • the set includes at least one lexical unit of each subset.
  • a set of lexical units is, for instance, "unhappy experience in channel crossing” where "unhappy” belongs to the feeling subset, "experience” belongs to the action subset and "channel crossing” belongs to the context subset.
  • the preparation is, advantageously, a guided automatic process in which the user selects some groups of lexical units in each subset and all the sets combining elements of these groups as well as their different forms are automatically generated, forming a list.
  • each set of lexical units is inputted into an internet search engine, such as Google engine (www . google . com) or Yahoo engine (www. yahoo . com) .
  • This step is done automatically either by building HTTP request with the adapted syntax or by using some Application Specific Interface (API) provided by these companies to automatize the internet searches with their engine .
  • API Application Specific Interface
  • the computer 1 receives classically the results of the requests as a list of links to web pages. For each request, i.e. for each set of lexical units, the search engine returns at least one web page containing a list of links. Therefore, the computer 1 may receive a huge number of links .
  • these lists of links are consolidated. Identical links found on different lists are merged but with, advantageously, an increased weight associated to the link as the fact that a page is selected through different search requests may be an indicia of relevance.
  • the internet search engines sort the found pages to present to the user on top of the list the "most relevant" page.
  • Each internet engine has its own algorithm to sort the pages and this algorithm is often kept secret.
  • the sorted list is also used to give to the "most relevant" pages in the sense of the internet search engine a superior weight.
  • a combination of weights coming from the internet search engine sort and from the number of occurrences is used to sort the consolidated list by decreasing weight.
  • the sorted list contains page addresses as hyperlink fields. Therefore, each page can be addressed by the user to be read.
  • computer 1 comprises, Fig. 3, from a functional point of view, means 30 for storing lexical units.
  • means 30 for storing lexical units comprises a database 32 of lexical units having at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions.
  • Computer 1 comprises also means 34 for inputting a set of lexical units into an internet search engine, the set of lexical units comprising at least one expression of each subset.
  • And computer 1 comprises means 35 for consolidating results send by the internet search engine.

Abstract

A method to search for an user generated content web page comprises • preparing (22) a set of lexical units, • inputting (24) said set of lexical units into an internet search engine, • consolidating (28) results of said internet search engine. The method further comprises a preliminary step of creating (20) a database of lexical units comprising at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions, and the preparation of the set of lexical units comprises the assembly of at least one expression of each subset.

Description

METHOD TO SEARCH FOR A USER GENERATED CONTENT WEB PAGE
Field of the invention
The present invention concerns a method to search for a user generated content web page and a software to practice the same. Background of the invention
Nowadays, to search for a web page, the usual methods consists of choosing some words suited to the object of the search. These words are inputted into the query page of an internet engine such as those proposed by Google Inc. or Yahoo Inc.
As a result, the search engine lists a set of web pages by their title and a automatically generated short abstract. A link gives access to the web page.
Search engines contain some internal, and often secret, algorithms to sort the list of web pages and to show to the user the most pertinent, hopefully, web pages at the beginning of the list.
However, this method is not very efficient when the search concerns user' s opinion about a product or a service .
Indeed, when a user would like to buy a product or a service it is nowadays a common practice to search for the opinions of the prior buyers or users. This information may be found on blogs, wikis, forums and any other web site where a "standard" user may post a message. These opinions around a product or service generate a buzz which has a positive, or negative, impact on the success of the product/service.
It is therefore important for marketing department as well as for users to dispose of a method which is efficient to find the web pages containing opinions on a defined product or service while leaving aside "classical" web pages concerned by the product/service such as pages of merchant web site, of price comparators, etc . Summary of the invention
To better address one or more concerns, in a first aspect of the invention, a method to search for an user generated content web page comprises
• preparing a set of lexical units,
• inputting said set of lexical units into an internet search engine,
• consolidating results of said internet search engine .
The method further comprises a preliminary step of creating a database of lexical units comprising at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions and the preparation of the set of lexical units comprises the assembly of at least one expression of each subset.
Therefore, the method has the advantage to select preferentially web pages containing opinions about the selected matter.
In particular embodiments :
- the database comprises phonetic transcriptions and misspelled versions of said expressions; subsets of feeling expressions and action expressions are arranged as thesaurus in the database of lexical units;
- based on a first set of selected lexical units, a list of lexical units sets to be inputted as prepared, combining various phonetic transcriptions, misspelled versions and synonyms of each selected expressions;
- each lexical units set of the list is inputted into the internet search engine, and the results for all sets are consolidated into a weighted list of the web pages ;
- the weight of each web pages is a combination of the order of appearance of the web page in each result and the number of occurrence of the web page in all results .
Aspects of these embodiments may be combined or modified as appropriate or desired, however.
In a second aspect of the invention a device to search for a web page comprises
• means for storing lexical units,
• means for inputting a set of lexical units into an internet search engine,
• means for consolidating results of said internet search engine, wherein means for storing lexical units comprises a database of lexical units having at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions and means for inputting said set of lexical units is adapted to input set of lexical units comprising at least one expression of each subset.
In a particular embodiment, means for storing lexical units comprises means for storing phonetic transcription and misspelled versions of said expressions .
In a third aspect of the invention, a computer program product to search for a web page comprises program instructions to execute the steps of the hereabove method when the computer program product is executed on a computer.
These and other aspects of the invention will be apparent from and elucidated with reference to the embodiment described hereafter where:
Fig. 1 is a schematic view of a terminal connected to internet to practice an embodiment of the invention;
- Fig. 2 is a flowchart of a method according to an embodiment of the invention; and
- Fig.3 is a functional view of a terminal practicing an embodiment of the invention.
Detailed description
In reference to Fig. 1, a computer is connected to internet network 3. Through the network 3, the computer 1 is connected to a server 5 on which a search engine is running. The man skilled in the art understands that the server 5 symbolized the infrastructure of search companies such as Google Inc. or Yahoo Inc. In fact, these companies use server farms containing hundred of computers dispatched around the world.
A server 7 is also connected to the internet network 3 and contains a web page which is of interest for the user of the computer 1 but its address is not known by the computer 1. The web page contains opinion on a product/service of interest for the user of the computer
I and is a user generated content web page such as a blog, wiki or forum page.
The computer 1 is a classical personal computer. It comprises interface means such as a display 9, a keyboard
II and a mouse 13 or the like. It comprises also storage means 15 and processing means 17 such as, for instance, hard disk drives and motherboard.
The storage means 15 contains a computer software product which, when executed by the processing means 17, makes the computer 1 execute the steps of a method to search for a web page according to an embodiment of the invention .
In reference to Fig. 2, the method starts with the creation, step 20, of a database of lexical units to search for.
The database is stored in the storage means 15.
The database comprises at least three subsets of lexical units:
- a subset of feeling expressions;
- a subset of action expressions; and a subset of context expressions.
Feeling expressions mean lexical units which are related to the mood or feeling of a human being. For instance, words such as "trouble", "happy/unhappy", "unpleasant/pleasant", etc. define a certain state of mind. Generally, they are the reasons for which a user has posted a message.
Action expressions mean lexical units which define action of a user such as "online reservation", "booking", "sales", "offers", etc.
Context expressions mean lexical units which are used to define the context or the specificity of the search. For instance, if the search concerns the comments of travellers having crossed the Channel, the context expressions includes terms like "ferry (ies)", "Channel
, etc . Advantageously, the subsets of feeling expressions and action expressions are structured in a form of thesaurus to allow a user to grab easily a set of words with similar meanings.
Each lexical unit is preferably stored in the database with all its lexical variations such as singular /plural, and with some misspelled forms. Specifically, the most usual misspelled forms are stored in the database as the searched web pages are edited by normal user with different cultural levels or who practices a foreign language. Therefore, it may be useful to be able to select pages ever if the words are misspelled. A particular misspelled form which is often used by teenagers accustomed to the short messages of the mobile phone is the phonetic form.
After the creation of the database, at least, a set of lexical units is prepared at step 22. The set includes at least one lexical unit of each subset.
In the example of the Channel crossing, a set of lexical units is, for instance, "unhappy experience in channel crossing" where "unhappy" belongs to the feeling subset, "experience" belongs to the action subset and "channel crossing" belongs to the context subset.
The preparation is, advantageously, a guided automatic process in which the user selects some groups of lexical units in each subset and all the sets combining elements of these groups as well as their different forms are automatically generated, forming a list.
At step 24, each set of lexical units is inputted into an internet search engine, such as Google engine (www . google . com) or Yahoo engine (www. yahoo . com) . This step is done automatically either by building HTTP request with the adapted syntax or by using some Application Specific Interface (API) provided by these companies to automatize the internet searches with their engine .
At step 26, the computer 1 receives classically the results of the requests as a list of links to web pages. For each request, i.e. for each set of lexical units, the search engine returns at least one web page containing a list of links. Therefore, the computer 1 may receive a huge number of links .
At step 28, these lists of links are consolidated. Identical links found on different lists are merged but with, advantageously, an increased weight associated to the link as the fact that a page is selected through different search requests may be an indicia of relevance.
As known by the man of skilled in the art, the internet search engines sort the found pages to present to the user on top of the list the "most relevant" page. Each internet engine has its own algorithm to sort the pages and this algorithm is often kept secret.
During the consolidation step 28, the sorted list is also used to give to the "most relevant" pages in the sense of the internet search engine a superior weight.
A combination of weights coming from the internet search engine sort and from the number of occurrences is used to sort the consolidated list by decreasing weight.
The sorted list contains page addresses as hyperlink fields. Therefore, each page can be addressed by the user to be read.
By using the sorted consolidated list, the user has a great chance to read web pages which are relevant to his/her search of user's opinion on a product /service. Indeed, computer 1 comprises, Fig. 3, from a functional point of view, means 30 for storing lexical units. Typically, means 30 for storing lexical units comprises a database 32 of lexical units having at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions.
Computer 1 comprises also means 34 for inputting a set of lexical units into an internet search engine, the set of lexical units comprising at least one expression of each subset.
And computer 1 comprises means 35 for consolidating results send by the internet search engine.
While the invention has been illustrated and described in details in the drawings and foregoing description, such illustration and description are to be considered illustrative and exemplary and not restrictive, the invention is not limited to the disclosed embodiment.
Other variations to the disclosed embodiment can be understood and effected by these skilled in the art in practising the claimed invention, from a study of the drawings, the disclosure and the appended claims. In the claims, the word "comprising" does not exclude other elements and the indefinite article "a" "or" "an" does not exclude a plurality.

Claims

1. Method to search for a user generated content web page comprising
• preparing (22) a set of lexical units,
• inputting (24) said set of lexical units into an internet search engine,
• consolidating (28) results of said internet search engine wherein the method comprises a preliminary step of creating (20) a database of lexical units comprising at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions, and the preparation of the set of lexical units comprises the assembly of at least one expression of each subset.
2. Method according to claim 1, wherein the database comprises phonetic transcriptions and misspelled versions of said expressions.
3. Method according to claims 1, 2, wherein subsets of feeling expressions and action expressions are arranged as thesaurus in the database of lexical units.
4. Method according to claims 2, 3, wherein, based on a first set of selected lexical units, a list of lexical units sets to be inputted as prepared, combining various phonetic transcriptions, misspelled versions and synonyms of each selected expressions.
5. Method according to claim 4, wherein each lexical units set of the list is inputted into the internet search engine, and the results for all sets are consolidated into a weighted list of the web pages.
6. Method according to claim 5, wherein the weight of each web pages is a combination of the order of appearance of the web page in each result and the number of occurrence of the web page in all results.
7. Device to search for a user generated content web page comprising
• means (30) for storing lexical units,
• means (34) for inputting a set of lexical units into an internet search engine,
• means (36) for consolidating results of said internet search engine, wherein means for storing lexical units comprises a database (32) of lexical units having at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions and means for inputting said set of lexical units is adapted to input set of lexical units comprising at least one expression of each subset.
8. Device according to claim 7, wherein means for storing lexical units comprises means for storing phonetic transcription and misspelled versions of said expressions .
9. A computer program product to search for an user generated content web page comprising program instructions to execute the steps of the method according to any one of claims 1 to 6 when said computer program product is executed on a computer.
PCT/IB2008/051310 2008-01-29 2008-01-29 Method to search for a user generated content web page WO2009095746A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/IB2008/051310 WO2009095746A1 (en) 2008-01-29 2008-01-29 Method to search for a user generated content web page
EP08737749A EP2245553A1 (en) 2008-01-29 2008-01-29 Method to search for a user generated content web page

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2008/051310 WO2009095746A1 (en) 2008-01-29 2008-01-29 Method to search for a user generated content web page

Publications (1)

Publication Number Publication Date
WO2009095746A1 true WO2009095746A1 (en) 2009-08-06

Family

ID=39710949

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2008/051310 WO2009095746A1 (en) 2008-01-29 2008-01-29 Method to search for a user generated content web page

Country Status (2)

Country Link
EP (1) EP2245553A1 (en)
WO (1) WO2009095746A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0751471A1 (en) * 1995-06-30 1997-01-02 Massachusetts Institute Of Technology Method and apparatus for item recommendation using automated collaborative filtering
WO2001059625A1 (en) * 2000-02-10 2001-08-16 Involve Technology, Llc System for creating and maintaining a database of information utilizing user opinions
WO2005059772A1 (en) * 2003-12-09 2005-06-30 Swiss Reinsurance Company System and method for the aggregation and monitoring of multimedia data that are stored in a decentralized manner
EP1736902A1 (en) * 2005-06-24 2006-12-27 Agilent Technologies, Inc. Systems methods and computer readable media for performing a domain-specific metasearch and visualizing search results therefrom
WO2007142998A2 (en) * 2006-05-31 2007-12-13 Kaava Corp. Dynamic content analysis of collected online discussions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0751471A1 (en) * 1995-06-30 1997-01-02 Massachusetts Institute Of Technology Method and apparatus for item recommendation using automated collaborative filtering
WO2001059625A1 (en) * 2000-02-10 2001-08-16 Involve Technology, Llc System for creating and maintaining a database of information utilizing user opinions
WO2005059772A1 (en) * 2003-12-09 2005-06-30 Swiss Reinsurance Company System and method for the aggregation and monitoring of multimedia data that are stored in a decentralized manner
EP1736902A1 (en) * 2005-06-24 2006-12-27 Agilent Technologies, Inc. Systems methods and computer readable media for performing a domain-specific metasearch and visualizing search results therefrom
WO2007142998A2 (en) * 2006-05-31 2007-12-13 Kaava Corp. Dynamic content analysis of collected online discussions

Also Published As

Publication number Publication date
EP2245553A1 (en) 2010-11-03

Similar Documents

Publication Publication Date Title
US11176575B2 (en) Dynamic content aggregation
CN102246167B (en) Providing search results
US7849081B1 (en) Document analyzer and metadata generation and use
KR101171405B1 (en) Personalization of placed content ordering in search results
US8306962B1 (en) Generating targeted paid search campaigns
US9245022B2 (en) Context-based person search
US8799260B2 (en) Method and system for generating web pages for topics unassociated with a dominant URL
KR101215791B1 (en) Using reputation measures to improve search relevance
US20070255702A1 (en) Search Engine
US20100306249A1 (en) Social network systems and methods
US20050222989A1 (en) Results based personalization of advertisements in a search engine
US20130179423A1 (en) Computer-generated sentiment-based knowledge base
US20120254149A1 (en) Brand results ranking process based on degree of positive or negative comments about brands related to search request terms
US20130226950A1 (en) Generalized edit distance for queries
JP2009508267A (en) Ranking blog documents
KR20060059986A (en) Methods and systems for determining a meaning of a document to match the document to conte
WO2011062598A1 (en) System and method for automated filtering of reviews for marketability
US20070233563A1 (en) Web-page sorting apparatus, web-page sorting method, and computer product
US8380745B1 (en) Natural language search for audience
JP4859893B2 (en) Advertisement distribution apparatus, advertisement distribution method, and advertisement distribution control program
JP5151368B2 (en) Information processing apparatus and information processing program
JP2002269106A (en) Device for introducing book
Ibrahim et al. Scientometric re-ranking approach to improve search results
JP2010123036A (en) Document retrieval device, document retrieval method and document retrieval program
JP2007500900A (en) Method and system for determining the meaning of a document and matching the document with the content

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08737749

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2008737749

Country of ref document: EP